Randomness of streaming sample




I’d like to know if the specific tweets that come through the public streaming API are chosen before I make my query, or after. For example: if I choose to track the keyword “alpha” then will the streaming API provide ~1% of all tweets that contain the word “alpha” or will it provide all tweets that contain the word “alpha” up to 1% of the firehouse volume? I suspect it is the former, but need to be sure.



As far as I know, if you’re filtering on keywords - you will get all tweets that contain “alpha” - if the volume of tweets exceeds 1% of the firehose, you will start getting track limit notices in the stream: https://dev.twitter.com/streaming/overview/messages-types#limit_notices

This might be of interest: http://www.public.asu.edu/~fmorstat/paperpdfs/SampsonHT2015.pdf


Interesting, @IgorBrigadir. Thanks for that info. Wish the API devs would clarify this because the language in the docs you linked is still ambiguous.


@IgorBrigadir is correct.

You will receive everything up to 1% of total firehose volume for your filter. Unless you are filtering for something like Justin Bieber birthday Tweets, you will likely be consuming less than 1%.