I am using python and Twitter’s official searchtweets library, currently in the Sandbox 30-Day endpoint.
When collecting results on a search query over a specific date range with ~7,500 anticipated results (estimated using REST API in R), I got an error indicating I exceeded rate limits:
(Note: my dashboard, after this error, indicates I have 220 remaining requests for my monthly period)
retrying request; current status code: 429
retrying request; current status code: 429
retrying request; current status code: 429
HTTP Error code: 429: Exceeded rate limit
Rule payload: {'query': 'crispr', 'maxResults': 100, 'toDate': '201807080000', 'fromDate': '201807010000', 'next': '[removed for brevity on forum]'}
Traceback (most recent call last):
File "twitter-mining.py", line 155, in <module>
main()
File "twitter-mining.py", line 127, in main
result_stream_args=premium_search_args) #these 'args' are authentication/config from load_credentials()
File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 301, in collect_results
return list(rs.stream())
File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 216, in stream
self.execute_request()
File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 253, in execute_request
rule_payload=self.rule_payload)
File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 101, in retried_func
raise requests.exceptions.HTTPError
requests.exceptions.HTTPError
My understanding from the above is that I made too many requests, too quickly, and exceeded the rate limits — I’ve searched the forums, how can I ‘slow down my code’ when using the searchtweets library? The library doesn’t handle rate limit pacing?
I am collecting results in a for loop, using:
#build search rule, gen_rule_payload() and collect_results() are from Twitter's library
rule = gen_rule_payload(term, from_date=since, to_date=until, results_per_call=results_percall) #results_percall set to 100 to respect Sandbox limit
#collect results, tweets will be a list of tweet JSON objects
tweets = collect_results(rule,
max_results=totalresults, #set arbitrarily high to 900000
result_stream_args=premium_search_args) #these 'args' are authentication/config from load_credentials()
I’m a newbie in programming/python/APIs , and I understand I can slow down the iterations of my for loop, but it appears I exceeded the rate limits with a single call to collect_results().
Help is much appreciated! I need to figure out how to query the API responsibly (rate limits), and to save my data (e.g. whatever data was collected before the 429 error was of course not saved by the subsequent steps in my for loop , similar to this post) before I can upgrade to full Premium access and begin collecting data for our research.
Thanks!