Standard Search API - Sampling Rate

search

#1

Hi,

I know that the standard search API provides sampled results. The sampling is done before the filtering.

I have two questions :

  • Am I right ?
  • What is the sampling ratio ? Even if it varies, do we at least know it’s order of magnitude ? (I’ve understood that Twitter doesn’t want to communicate on how exactly is the sampling done)

To give an example : Using the standard search API I’ve found 1000 tweets concerning a certain subject during a certain period of time. I would like to have an estimation of the total number of tweets concerning this subject during the same period of time (is it more like 10k or 100k or something else ?).

Thank you !


#2

This is not exactly a correct way of looking at what is going on here - this is not a sample with any fixed ratio, but it is based on a limited and incomplete index, and we describe that index as “focused on relevance and not completeness” (yes, the language in the documentation does use the word “sampling”, but as mentioned this is not something that is fixed, it is intended to describe the limitation on the index).

For example, some brand new user accounts, and certain low-quality results, are unlikely to show up in the index. It’s not that “Twitter doesn’t want to communicate on how exactly…” - it’s just very very difficult to provide you with a fixed multiplier that will get you to the answer you’re requesting. This API is around 10 years old and has never had a complete index - this is not something that you should expect to be retrofitted at this time.

The premium Search is built on brand new technology and it includes both a more comprehensive Tweet index, as well as a specific counts endpoint where you can query for the number of times a term matches within given windows over a period of time. If you need higher degrees of accuracy and reliability, I’d strongly recommend looking at the new APIs.