Public Stream with complete coverage of data


Since the Site Streams API of Twitter is in closed beta and gives no more access to new users it seems the only way is the (5k users) limited Public Stream I am using with Phirehose. The main purpose is to get new tweets from specific Twitter users in realtime.

Is it guaranteed to recieve every tweet of the users I only specify by the follow parameter? I read if you get too much data in it could drop as low as 1% - is that different if I’m not doing a search but instead follow specific users? On average I would have 400 tweets per minute coming in. Also what happens if I have to keep track of more than 5k Twitter users (which happened in the past)? Gnip isn’t really an option as I offer my service for free to the Twitter community.


The streaming API provides access to up to 1% of the public firehose. If you are following a set of users whose Tweets amount to less than 1% of the firehose in total, you should be able to receive all of the Tweets (note that this is not “guaranteed” as there is no specific service level defined on the public API). Using the follow parameter you’ll also replies and retweets to the user from other users, so this may in theory push the activity above a 1% threshold if the accounts are very popular, for example, in which case you would receive limit notices in the stream.


Thank you @andypiper for your helpful answer! Finally I have an official answer :slight_smile:

However, one more important question:

Okay, I don’t need all this “overhead” of replies/retweets to that user for my use case which will likely make it impossible to use the streaming solution then (which would fit apart from that 1% threshold better). Did I understand it correctly that I am allowed to do REST API requests on behalf of the granted user tokens then as much as I want (like for example at a peak of total ~1000 req/min) from my server as long as I don’t make more than the 900 requests in a 15min time window for each of the users for statuses/user_timeline? Of course I would do each of these requests only if it’s really necessary (after sending very few lookup requests to know which requests need to be made).


That should work.