Data quality of REST API


#1

In the documentation of Twitter REST API ( https://dev.twitter.com/rest/public/search) it is written that: " the Search API is focused on relevance and not completeness. This means that some Tweets and users may be missing from search results. If you want to match for completeness you should consider using a Streaming API instead."
Considering a query I file for the past 7 days: what would be the percentage of data I would get via REST API compared to Streaming API?
What is the process by which Twitter determines “relevance”?


#2

The streaming API is a 1% sample of all the data on the Twitter firehose at any one time. If you were to use the filter endpoint and track a term that was under 1% of the Twitter firehose then you’d receive all of the Tweets, depending on load.

The search API has a limited index and may not include, for example, Tweets that are withheld in certain jurisdictions; Tweets from very new users; Tweets and/or users, hashtags, or from source apps that have been marked as spammy or abusive either by our algorithms.

So, it is difficult to provide a specific percentage variance based on the question you’re asking, but that hopefully clarifies the difference.

The 30-day and full archive search options from Gnip are not subject to all of the same constraints, nor is Powertrack. These APIs are commercial offerings and offer complete access to the data.


#3

Thank you very much, your answer helped a lot!