Hello!

I am using the Academic Research track to collect a large amount of tweets.
My code looks like this:

config = dict(
    search_tweets_v2=dict(
        endpoint="https://api.twitter.com/2/tweets/search/all",
        account_type='premium',
        consumer_key=API_KEY,
        consumer_secret=API_SECRET_KEY,
        bearer_token=BEARER_TOKEN
    )
)

with open('twitter_keys.yaml', 'w') as config_file:
    yaml.dump(config, config_file, default_flow_style=False)



search_args = load_credentials(filename="twitter_keys.yaml",
                                    yaml_key="search_tweets_v2",
                                    env_overwrite=False)

rule = gen_request_parameters(SEARCH_QUERY,
                        results_per_call= 500,
                        start_time= '2020-4-1'
                        end_time= '2021-3-31'
                        tweet_fields="created_at,author_id,text,lang"
                        )

rs = ResultStream(request_parameters=rule,
                  max_results= 500,
                  output_format="a",
                  **search_args)

with open(FILENAME, 'a', encoding='utf-8') as f:
    n = 0
    for tweet in rs.stream():
        n += 1
        if n % PRINT_AFTER_X == 0:
            print('{0}: {1}'.format(str(n), tweet['created_at']))
        json.dump(tweet, f)
        f.write('\n')
print('done')

Each time I run the code, I only receive approx. 1000 tweets. I would like to iterate over the result stream so that I can collect all the tweets. Is that somehow possible?

Any help will be much appreciated! Thank you!

I would double check that you definitely have the v2 package, and not the v1.1 premium version:

pip uninstall searchtweets
pip install searchtweets-v2

account_type=‘premium’,

I don’t think this is needed. and specifying:

   consumer_key=API_KEY,
   consumer_secret=API_SECRET_KEY,

is also not needed if specifying a bearer_token.

results_per_call= 500

This (Correctly) sets the results per call in the API, while:

max_results= 500

I don’t think this is valid - there’s max_tweets and max_requests but no max_results

What’s the SEARCH_QUERY query you’re trying? It’s also possible that there aren’t results in that time range.

In the past, I had success creating a list and iterating through a list in a similar way to what you had in your code sample.

tweets = rs.stream()
list_tweets = list(tweets)

I hope this does the trick!

Jessica

1 Like

Thanks! I removed all the unnecessary specifications from my code and double checked my searchtweets package - it’s v2.
My Search Query looks like this:

SEARCH_QUERY="(#coronahysterie OR #coronaluege OR #wirwerdenalledasein) lang:de"

Thanks Jessica!
The loop is working in this way. Unfortunately, now I am always requesting the same page and thus, my results do not change. Do you know how to fix that? Thank you so much for your help!

I can recommend an alternative - twarc can also do this as a library like so:

Or as a command line tool like:

Install and configure:

pip install twarc2
twarc2 configure

Get all results:

twarc2 search --archive --start-time 2020-04-01 --end-time 2021-03-31 "(#coronahysterie OR #coronaluege OR #wirwerdenalledasein) lang:de" results.jsonl

Get 1 tweet result per line (equivalent to output_format="a")

twarc2 flatten results.jsonl tweets.jsonl

Get a CSV file of results:

pip install twarc-csv

twarc2 csv tweets.jsonl tweets.csv
1 Like

Thank you so much! It’s working and pretty easy :slightly_smiling_face:

2 Likes