Volume drop in streaming API


I’ve been collecting geotagged tweets through the streaming API for some time now. As of February, there has been a significant drop in the number of tweets received. This looks to be related to:

When I plot the number of hashtags tweeted per day as a function of time does, this is what I get:

which shows a massive drop off, both in mean and standard deviation. But is this behavior expected? Daily volume varies and so I would expect that my 1% share should also vary—at this scale, it appears mostly flat and makes me wonder if I’m being capped at a fixed total rather than a percentage. Could someone explain the massive drop, and also whether what I am seeing is still consistent with 1%? Is the “volume” that Twitter uses to determine 1% averaged over days, and thus, relatively constant?

Geotagged tweets from atream api

I wonder if this is related to this issue. Twitter has begun “emphasizing” place instead of coordinates.


I would suggest that this is exactly the reason - users who want to share their location on Tweets are now more likely to select a place from the UI than to share coordinates.

Inconsistencies in number of tweets return from a stream
Two identical streams - one returns only 50% of tweets

Don’t you think that was a bit of a mistake to emphasize places? A user can say that they are at any place, that isn’t that accurate…

The graph really makes it look like twitter is filtering/cleaning the data. No up and downs based on the day/night cycle… just all level. However, I don’t think that twitter is filtering. I inquired with a friend who uses gnip and they are seeing the same drop.

Kinda disappointing.


Thanks for the feedback. The decision to make this geo change was complex, and I realise that it may be disappointing to you. You will indeed see the same results in the Gnip API as the data from the firehose is just the same (but on a larger scale) that you will see through the public API.

Current Streaming API sampling rate?