Which Twitter Streaming should I use:public, User or Site?


My use case is the following:

I have to ‘listen’ to the tweets and retweets of a unknown number of users (the users can register in this feature whenever they want) that mention some known users or whenever they use a predefined hashtag. When such a tweet is made, I have to register this action in my database, but I just need to know whenever this happens, I don’t care about the content of the tweet.

One important point is that I shouldn’t lose any of these tweet events.

Reading through the Twitter API I’ve found the Streaming API that seems what I should use for my use case, but I get lost when I start reading about limits of users, 1% of tweets, firehose etc

In theory I think I should be using the Site Streaming API, but I’m not really sure and also it is in limited beta. Is it difficult to get access to this Site Streaming? Which are its limitations? Would I be able to receive, for example, tweets for 200.000 users without losing any?

Is there any other approach I should look at?


One question to ask yourself is if all the data you want to track “belongs” to users who would authorize your application for access – or would the data you want to track belong to any possible user on Twitter, and that what data is being tracked depends on your users’ interests.

Site Streams beta is difficult to get into right now, as its constrained.

The public streaming API allows you track tweets “around” specific users that haven’t necessarily given your app access to work on their behalf. (This is the follow command). It also lets you track specific terms, which could be provided by your end users (this is called track).

The public streams though have a 1% cap on them such that if the total volume of tweets that you might receive in a given moment exceed more than 1% of the total possible volume of the firehose at that moment, you’ll only get the matching tweets up to that 1% cap – and you won’t necessarily know which tweets you missed during that period.

For the kind of volume you’re thinking of, you might find it wiser to go to a certified partner provider of streaming data like Topsy, Gnip, or Datasift.

