Proportion of Tweets Captured


In order to overcome the limitations of geocoded searches I turned to the streaming API to do sentiment analysis of voters on election day this past week in LA county.

I’m curious if there is any way to answer this question: during the 14-hour window we were pulling tweets from the streaming API (June 7th, 8am-10pm PST), how many unique users tweeted in LA County, and how many tweets were there? I’d like to get a sense of what proportion of the total population of tweets my sample represents.


Unfortunately it may be impossible to tell how many total tweets there were in LA County - because location isn’t available for all tweets.

A small % of tweets have geo codes, and you’re restricting your area to just LA County, so you’d expect to get pretty much all geo coded tweets from that area. You can check if you’ve missed any geo coded tweets if you saw any track limit notices.