First of all, thank you for opening the new full-archive tweet counts endpoint, it seems that Twitter listens to its community (Getting query result numbers without downloading tweets) which is something incredibly valuable.
I started to collect data from this endpoint, however, I noticed that it struggles with the number of queries I’m doing (around 50,000). I added latency of 25 seconds and seems to work, but it will require many days (assuming everything goes well).
In addition, looks like you get the data aggregated per day (maximum) and I’m interested in counts for a specific period (one year).
Therefore, here are my questions:
-
Is there something I am missing or not doing correctly? Is there any way to get this information quicker?
-
Is there any way to get total counts for the specific period instead of getting them on a daily basis?
Thank you in advance,
1 Like
You can get the total count for a year, by specifying a start_time and end_time and getting just 1 request.
The meta part of the response i think is what you need to use to get what you want - that way any arbitrary time period will work. You can leave the granularity as day and only make 1 request without pagination to get the totals.
For example in twarc i can run:
twarc2 counts --archive --start-time "2019-01-01" --end-time "2020-01-01" --granularity day --limit 1 "coronavirus" | jq ".meta.total_tweet_count"
gives 915
twarc2 counts --archive --start-time "2020-01-01" --end-time "2021-01-01" --granularity day --limit 1 "coronavirus" | jq ".meta.total_tweet_count"
gives 12781520
etc.
2 Likes