Hi. I have a research project in which I gather data in real time using the Streaming API, for months at a time. I use some track filter, such as a few hashtags, to identify tweets about some subject. When Streaming disconnects or I have some kind of crashes, I reconnect and I attempt to recover the lost data from that crash period using GET search/tweets with the same filter I use for the stream. Note that I usually (99% of the time) do not hit the limits of the Streaming 1% endpoint, and this question does not refer to hitting that limit.
My problem is that I cannot be certain that I am recovering all the missing data, since the REST API documentation does not guarantee that, and furthermore it also does not give any kind of measure of how many tweets I did not retrieve, such as the {limit:“45”} messages you sometimes get with the Streaming API.
The documentation says I can retrieve tweets that are 6-9 days old, but that means that if a tweet is 7 days old, there is a chance that I do not retrieve it. At the same time, I cannot be certain that if a tweet is 6 days old it will be retrieved, as it seems that this retrieval process is a continuum rather than a hard limit. What is for sure is that if a tweet is 10 days old, it is forever lost to me.
How many days old can a tweet be at maximum, that I can be certain that I can retreive it with a REST call?
Also, if you have a better method for recovering lost data from Streaming disconnections, please mention that instead. PLEASE DO NOT offer the solution of using paid services, I want to know the most efficient way to do this using the Public API!
If recovering the lost data is not possible, or the guaranteed recovery of a tweet cannot be made based on its age, it would be enough for me to know exactly how much data I lost in a specific time frame. Is there any way to find out how many tweets can be retrieved from the last X days using a certain query (such as a bunch of hashtags), even if I don’t actually retrieve those tweets?
Many thanks in advance!
PS: I could not find any of this information on any online documentation or forum, this post is a last resort.