POST statuses/filter - What downsampling is occurring?


I am currently using the POST statuses/filter API to collect tweets for academic research.

My question is what percentage of the total Twitter volume is being tested against the user specified filters? I am assuming that not the entire Twitter volume is being tested against all the user specified filters in the streaming API.



You’ll get 1% of the total public firehose. If your filter terms cover less than that, you’ll effectively get all of the Tweets currently using those terms; for any more than that, you’ll get what we describe in the docs as a “representative sample”. I’m afraid we don’t publish the specific algorithm used to reduce the stream down to the 1% sample, though.