How to respect rate limits using Twitter's official python library? Error code 429

python

#1

I am using python and Twitter’s official searchtweets library, currently in the Sandbox 30-Day endpoint.

When collecting results on a search query over a specific date range with ~7,500 anticipated results (estimated using REST API in R), I got an error indicating I exceeded rate limits:

(Note: my dashboard, after this error, indicates I have 220 remaining requests for my monthly period)

retrying request; current status code: 429
retrying request; current status code: 429
retrying request; current status code: 429
HTTP Error code: 429: Exceeded rate limit
Rule payload: {'query': 'crispr', 'maxResults': 100, 'toDate': '201807080000', 'fromDate': '201807010000', 'next': '[removed for brevity on forum]'}
Traceback (most recent call last):
  File "twitter-mining.py", line 155, in <module>
    main()
  File "twitter-mining.py", line 127, in main
    result_stream_args=premium_search_args) #these 'args' are authentication/config from load_credentials()
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 301, in collect_results
    return list(rs.stream())
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 216, in stream
    self.execute_request()
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 253, in execute_request
    rule_payload=self.rule_payload)
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 101, in retried_func
    raise requests.exceptions.HTTPError
requests.exceptions.HTTPError

My understanding from the above is that I made too many requests, too quickly, and exceeded the rate limits — I’ve searched the forums, how can I ‘slow down my code’ when using the searchtweets library? The library doesn’t handle rate limit pacing?

I am collecting results in a for loop, using:

#build search rule, gen_rule_payload() and collect_results() are from Twitter's library
rule = gen_rule_payload(term, from_date=since, to_date=until, results_per_call=results_percall) #results_percall set to 100 to respect Sandbox limit
		
#collect results, tweets will be a list of tweet JSON objects
tweets = collect_results(rule,
				max_results=totalresults, #set arbitrarily high to 900000
				result_stream_args=premium_search_args) #these 'args' are authentication/config from load_credentials()

I’m a newbie in programming/python/APIs , and I understand I can slow down the iterations of my for loop, but it appears I exceeded the rate limits with a single call to collect_results().

Help is much appreciated! I need to figure out how to query the API responsibly (rate limits), and to save my data (e.g. whatever data was collected before the 429 error was of course not saved by the subsequent steps in my for loop , similar to this post) before I can upgrade to full Premium access and begin collecting data for our research.

Thanks!


#3

In fact, it’s very cool when you can install all the dependencies of the application, including those collected from the sources, into one command. As soon as you begin to collect a middle hand project with dependencies, some of which are linked statically, and the other dynamically, then hell begins.