Hi All,
My team purchased around 5000 requests of full archive premium API, so we’re expecting 2.5 million tweets (500 tweets x 5000 requests/queries). But after scrapping we only got around 1.5 million tweets. Is it a normal/usual thing to happen or not?

Thank you

Yeah it’s 500 tweets per call, or, 30 days worth of tweets, whichever comes first. So in practice it’s always less, depends highly on the query. Also sometimes it’s a big in the code, that doesn’t request 500 tweets but 100 instead, that came up in an R library I think.

Thank you for your reply.
Okay so it seems a usual thing to happen. And if I have 2,240 queries and run it twice for each 2,500 request, will I get same/duplicate tweets?

Yes if you’re running multiple queries you could definitely get duplicates.

As a general rule, it’s always better to try and combine everything into 1 giant query. As big of a query as it will allow with (... OR ...) to get the max tweets per call.

Thank you,
Can I ask you 1 last question?

So my project was supposed to collect information of Indonesia’s Policies in 7 difference themes in 35 regions by year from 2013-2020. So we were conducting queries by 35 region name, 8 themes, and 8 year. So in total we have 2,240 queries to be used in scrapping.

In total we got around 1.5 million tweets. If we break down the data, we got tweet in every month for each region and each year. But unfortunately, we didn’t get fair distribution for the theme. For example: DKI Jakarta in January-May 2013, we only got Theme1. In January-June 2013 we also got only Theme1. But for other regions we could get more fair theme distribution. Is it because the real condition of tweets or we need to improve our script while scrapping? and were the queries very sensitive in uppercase or lowercase letter? Could it affect the results?

Upper / lowercase doesn’t matter in the API. If you’re using geo queries, like place: or something - this will drastically reduce your results, as very few tweets will have geo information.

Unfortunately, without experimenting more with queries it’s hard to say why some queries may get more results than others - it could be that it’s just the way the data is, or it could be a bug in the code - if you have a list of the queries it could be possible to check.

We didn’t use geo location because we thought not all account/tweet turned their location on. So we just write the region name in our queries and combined it with other keywords.


And this is the example of one of our queries