How to respect rate limits using Twitter's official python library? Error code 429

python

#1

I am using python and Twitter’s official searchtweets library, currently in the Sandbox 30-Day endpoint.

When collecting results on a search query over a specific date range with ~7,500 anticipated results (estimated using REST API in R), I got an error indicating I exceeded rate limits:

(Note: my dashboard, after this error, indicates I have 220 remaining requests for my monthly period)

retrying request; current status code: 429
retrying request; current status code: 429
retrying request; current status code: 429
HTTP Error code: 429: Exceeded rate limit
Rule payload: {'query': 'crispr', 'maxResults': 100, 'toDate': '201807080000', 'fromDate': '201807010000', 'next': '[removed for brevity on forum]'}
Traceback (most recent call last):
  File "twitter-mining.py", line 155, in <module>
    main()
  File "twitter-mining.py", line 127, in main
    result_stream_args=premium_search_args) #these 'args' are authentication/config from load_credentials()
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 301, in collect_results
    return list(rs.stream())
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 216, in stream
    self.execute_request()
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 253, in execute_request
    rule_payload=self.rule_payload)
  File "C:\Python\Python36-32\lib\site-packages\searchtweets\result_stream.py", line 101, in retried_func
    raise requests.exceptions.HTTPError
requests.exceptions.HTTPError

My understanding from the above is that I made too many requests, too quickly, and exceeded the rate limits — I’ve searched the forums, how can I ‘slow down my code’ when using the searchtweets library? The library doesn’t handle rate limit pacing?

I am collecting results in a for loop, using:

#build search rule, gen_rule_payload() and collect_results() are from Twitter's library
rule = gen_rule_payload(term, from_date=since, to_date=until, results_per_call=results_percall) #results_percall set to 100 to respect Sandbox limit
		
#collect results, tweets will be a list of tweet JSON objects
tweets = collect_results(rule,
				max_results=totalresults, #set arbitrarily high to 900000
				result_stream_args=premium_search_args) #these 'args' are authentication/config from load_credentials()

I’m a newbie in programming/python/APIs , and I understand I can slow down the iterations of my for loop, but it appears I exceeded the rate limits with a single call to collect_results().

Help is much appreciated! I need to figure out how to query the API responsibly (rate limits), and to save my data (e.g. whatever data was collected before the 429 error was of course not saved by the subsequent steps in my for loop , similar to this post) before I can upgrade to full Premium access and begin collecting data for our research.

Thanks!


#4

Hi @page_archive! No, the library does not automatically handle rate limiting issues, though it is something we could potentially implement or would happily accept at pull request for.

I will discuss it with the team and add a note to the docs about it.

Also, so I can understand, are you using the the collect_results function in a loop?


#5

Thank you for your reply @binary_aaron – yes I was using collect_results in a loop.

Digging deeper I see now that collect_results is a “utility function” for the ResultStream object – do I need to work more directly with ResultStream to manage my usage and keep within rate limits?

Can you provide any guidance or examples on how to responsibly manage usage for queries that return thousands of results? I read through the searchtweets package here and the examples here. It seems like the use case I’m describing is very common, and it’d be useful to have an example for us less experienced users.

I see the “Working with the ResultStream” example, but what if we want a10,000 tweet limit vs 500 in the example?

Thanks!


#6

Digging deeper again, I ended up adding time.sleep(3) to searchtweets.ResultsStream.execute_request() in my local copy of the library’s result_stream.py and this has resolved the issue, by waiting 3 seconds between requests.

I chose 3 seconds to be overly safe, given the Sandbox’s rate limit of 30 requests per minute.


#7

@page_archive - Thank you for the suggestions about handling rate limits. We had not considered including rate-limit handling in the program, particularly for the sandbox environment, but will do so.
I’m glad that you found a solution, and I will make a note to update the documentation and discuss changes to the package around handling rate limits.


#8

I had the same problem, but i solved this easly. Just call a sleeper after “x” tweets. In my code it was like this.

rs = searchtweets.ResultStream(…)
cont = 0
for tweet in rs.stream():
cont +=1
(do what you want)
if cont % 3000 = 0:
time.sleeep(90)

I expect tha helps you.