Streaming Sample() Has Virtually No Sub-City Places



Hi there-

I’ve been recording the sample() stream for the past week gathering over 7 million English-language tweets. I was exploring at how Twitter users label their tweets for place, and I noticed some unexpected results. I was wondering if I could get confirmation of these findings or suggestion that my streaming program is somehow broken.

I took a subset of 1,000,000 tweets for analysis. 2.5% are tagged with place. 0.3% are tagged with precise latitude and longitude. For the 2.5% that have place information, I found that 82% were city-level, and 99.2% were city-level, admin (state) level, or country level. This really surprised me. I would have expected more people to check in into sub-city venues, neighborhoods, points of interest, etc.

I’m curious if others have seen similar patterns.