Hello!
I am using the Academic Research track to collect a large amount of tweets.
My code looks like this:
config = dict(
search_tweets_v2=dict(
endpoint="https://api.twitter.com/2/tweets/search/all",
account_type='premium',
consumer_key=API_KEY,
consumer_secret=API_SECRET_KEY,
bearer_token=BEARER_TOKEN
)
)
with open('twitter_keys.yaml', 'w') as config_file:
yaml.dump(config, config_file, default_flow_style=False)
search_args = load_credentials(filename="twitter_keys.yaml",
yaml_key="search_tweets_v2",
env_overwrite=False)
rule = gen_request_parameters(SEARCH_QUERY,
results_per_call= 500,
start_time= '2020-4-1'
end_time= '2021-3-31'
tweet_fields="created_at,author_id,text,lang"
)
rs = ResultStream(request_parameters=rule,
max_results= 500,
output_format="a",
**search_args)
with open(FILENAME, 'a', encoding='utf-8') as f:
n = 0
for tweet in rs.stream():
n += 1
if n % PRINT_AFTER_X == 0:
print('{0}: {1}'.format(str(n), tweet['created_at']))
json.dump(tweet, f)
f.write('\n')
print('done')
Each time I run the code, I only receive approx. 1000 tweets. I would like to iterate over the result stream so that I can collect all the tweets. Is that somehow possible?
Any help will be much appreciated! Thank you!
I would double check that you definitely have the v2 package, and not the v1.1 premium version:
pip uninstall searchtweets
pip install searchtweets-v2
account_type=‘premium’,
I don’t think this is needed. and specifying:
consumer_key=API_KEY,
consumer_secret=API_SECRET_KEY,
is also not needed if specifying a bearer_token.
results_per_call= 500
This (Correctly) sets the results per call in the API, while:
max_results= 500
I don’t think this is valid - there’s max_tweets and max_requests but no max_results
What’s the SEARCH_QUERY query you’re trying? It’s also possible that there aren’t results in that time range.
In the past, I had success creating a list and iterating through a list in a similar way to what you had in your code sample.
tweets = rs.stream()
list_tweets = list(tweets)
I hope this does the trick!
Jessica
1 Like
Thanks! I removed all the unnecessary specifications from my code and double checked my searchtweets package - it’s v2.
My Search Query looks like this:
SEARCH_QUERY="(#coronahysterie OR #coronaluege OR #wirwerdenalledasein) lang:de"
Thanks Jessica!
The loop is working in this way. Unfortunately, now I am always requesting the same page and thus, my results do not change. Do you know how to fix that? Thank you so much for your help!
I can recommend an alternative - twarc can also do this as a library like so:
Or as a command line tool like:
Install and configure:
pip install twarc2
twarc2 configure
Get all results:
twarc2 search --archive --start-time 2020-04-01 --end-time 2021-03-31 "(#coronahysterie OR #coronaluege OR #wirwerdenalledasein) lang:de" results.jsonl
Get 1 tweet result per line (equivalent to output_format="a")
twarc2 flatten results.jsonl tweets.jsonl
Get a CSV file of results:
pip install twarc-csv
twarc2 csv tweets.jsonl tweets.csv
1 Like
Thank you so much! It’s working and pretty easy 
2 Likes