Hi there,
I want to pull tweets from 2006 until the end of 2020, and it means I request many tweets. I’m using my own code and it works, but after a few hours it fails and an error about hitting the allowed rate limit for a time slot is received. About 500,000 tweets are documented in a JSON file (retweets included, I was requested to get them too. Also each request calls for 500 tweets and is repeated).
This not so tiny amount of tweets I desire to get will take days to get. I went over forum discussions that are related to the problem I have but couldn’t figure how to solve it and keep the program on until all tweets are pulled.
How did you solve a similar problem in your program and made sure it waits until it can send requests again?
I’ll appreciate any help, thank you,
Noam
Hi @_pinkrabbit
Based on your question it sounds like you are using the v2 of the Twitter API and using the academic research product track and using the full-archive search - is that correct?
Also, can you share the code you are using? To get more than 500 Tweets using the full-archive search, you will have to use the next_token and pass it as a parameter to your next API call until you no longer have a next token or if you have hit the desired number of Tweets you want.
Hi @suhemparack , thanks for replying,
Yes, I use the v2 API, with the academic research track, in the full archive search.
This is my code (it’s quite long so I uploaded it to Github, hopefully, that’ll be more convenient).
I created a loop that’s supposed to end after writing 10 JSON files for this case. I expected each file will include 249,000 tweets’ data (500 tweets from 499 requests). I use an internal loop for this, and it uses the next_token it gets from each call (to check I lowered the call number of calls from 999 to 499 but it didn’t matter, but the script failed.)
Also, I thought of using the sleep() method to force the script to wait for a while (if I got this right, every 15 minutes I’m allowed to make 60 requests per minute, so maybe I’m only allowed to request 900 times?)
and just to make sure, the max_results field indicates how many tweets I can get in each request, not the whole process?
I’m starting to think I calculated this wrong, but I could use the help 