Twitter data cannot be crawled after certain period of time


#1

I am developing a twitter app that crawl user ids of a specified user’s friend list. It works well for sometime, then It shows 429 error (Rate limit exceeded). I know this is a common error and it is due to the 15 min window time, during which it can serve at max of 15 request from a single app. After 15 mins the request is again serviced. I have read the doumentation of Get friends/ids . I have a long list of users whose friend lists have to be crawled. The app works well for about 100 users but after that it shows the 429 error(Rate limit exceeded) and it can not recover from that error even after 15 mins. I have also tried to put my program to sleep for 1 min before it sends another request. Please suggest where I am going wrong.


#2

Sounds like you hit rate limit too often maybe. I think there are special limits in place if you continue to make requests even if you are rate limite. But that is only a guess. There is no way to circumvent those rate limits, just respect them. Use the rate limit headers to determine how many requests you have left, if you determine that you exhausted the limit, wait until the rate-limit-reset time and continue only after that.


#3

Thanks for taking interest in my question. I have developed a python code to crawl the friend lists of a particular user. I have used the twython library to make all request. Since I am new in making api calls so I don’t know how to call for the rate limit headers. I have also searched the twython documentations but I can not find anything helpful. Any help from your side would be much appreciated.


#4

The headers I am speaking of are just usual HTTP headers. I am not sure if twython exposes them.

Here is an example how the headers look like:

X-Rate-Limit-Limit: 180
X-Rate-Limit-Remaining: 179
X-Rate-Limit-Reset: 1437138863

#5

I have tried this code but it is not working print(twitter.get_application_rate_limit_status(resources='friends'))
I have found this from this link https://media.readthedocs.org/pdf/twython/latest/twython.pdf but it is not of much worth.


#6

That’s not how you should do this. It’s correct that there is an endpoint for this but you really should use the x-rate-limit headers. If the library doesn’t give you access to these headers, you should look for another one.