Advice on using steaming api vs firehose on a buisness application


I would like some advice on my scenario. My questions may be quite specific and I would be grateful for some suggestions.

The scenario: we are experimenting an idea of online policing for some government bodies, and this requires monitoring twitter stream in real-time. However, this will be limited by some keywords (hence ‘filters’ if the streaming api is used). It is very hard to estimate the percentage of the filtered content in the complete twitter stream. Therefore, we are not sure whether we should use the free streaming API, or paid service from Gnip and even some 3rd party data providers (see below).

Potential solutions we have considered:

  • streaming API: this is free and in theory tops 1% of the entire twitter stream. However, I understand that if filter is used, it may well be the case that we will capture all conversations just under the 1% threshold, as per this post. But we cannot estimate whether this is always the case. Plus that in this scenario, the client will want to ensure almost full coverage, and that the amount of filtered data will fluctuate, particularly during some events, which makes the estimation even harder.

  • paid firehose: as far as I understand, this means that we need to paid to use the 100% coverage firehose. But again, due to the nature of filtering, we may not need 100% of data. I notice that the Decahose offered by Gnip provides 10% of all data, which may be just a good comprise. In any case, I am trying to ask pricing plans for using the firehose, but I cannot find good information. I found the following sources:

  1., whereby I must submit an application form to get in touch with representatives from Twitter. Should I do this?
  2., using Gnip’s powertrack 2.0 api, or the decahose api. But the website does not explain pricing
  3. A post by ‘DiscoverText’ shares some information about the cost of using Gnip, and I notice a figure of ‘$5/10,000 tweets’. It also has 2 different subscription plans which I cannot understand how does it work, particularly the term the used ‘storage units’. Also, I do not know what is DiscoverText’s relation to Gnip and Twitter?

So this is all the research I have done so far. I would appreciate some suggestions on this, many thanks in advance!


Our position on the use of the Twitter APIs, Twitter data, and Twitter developer platform and tools for all purposes related to government surveillance and monitoring is clear.

Without entering into the differences between the various access methods, you should carefully consider the purpose of your interest in our API and data. It sounds as though your usage would not meet the requirements of our developer agreement and policy whether you do so through a commercial arrangement or otherwise. Twitter’s data partners and customers like the one you mention are also subject to exactly the same terms and may not sell or source data for these purposes.