I’ve been collecting geotagged tweets through the streaming API for some time now. As of February, there has been a significant drop in the number of tweets received. This looks to be related to:

When I plot the number of hashtags tweeted per day as a function of time does, this is what I get:


which shows a massive drop off, both in mean and standard deviation. But is this behavior expected? Daily volume varies and so I would expect that my 1% share should also vary—at this scale, it appears mostly flat and makes me wonder if I’m being capped at a fixed total rather than a percentage. Could someone explain the massive drop, and also whether what I am seeing is still consistent with 1%? Is the “volume” that Twitter uses to determine 1% averaged over days, and thus, relatively constant?

I wonder if this is related to this issue. Twitter has begun “emphasizing” place instead of coordinates.

I would suggest that this is exactly the reason - users who want to share their location on Tweets are now more likely to select a place from the UI than to share coordinates.

Don’t you think that was a bit of a mistake to emphasize places? A user can say that they are at any place, that isn’t that accurate…

The graph really makes it look like twitter is filtering/cleaning the data. No up and downs based on the day/night cycle… just all level. However, I don’t think that twitter is filtering. I inquired with a friend who uses gnip and they are seeing the same drop.

Kinda disappointing.

Thanks for the feedback. The decision to make this geo change was complex, and I realise that it may be disappointing to you. You will indeed see the same results in the Gnip API as the data from the firehose is just the same (but on a larger scale) that you will see through the public API.

We want to focus the experience around the location that the user is geotagging vs. the user’s precise coordinates. Now that users can geotag many more granular locations

I gather tweets all over the world and I get a very low percentage of precise location. The volume of precise geolocated tweets have been divided by 5 since august 2015 until now. The place tag doesn’t help at all, most of the points are at the city level:

image

We are really missing a filter to get tweets with precise location only. Sampling 1% of the tweets and then in this case, filtering again at 13%, it’s like a sampling at 0.13% !

Searching for Geo tweets will definitely be more effective than trying to extract them from the sample stream: Filtering Tweets by location | Docs | Twitter Developer Platform

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.