I am using Tweepy to scrape the API for tweets mapped to particular cities that contain specific keywords. I am then conducting statistical analysis to see where and when keyword mentions in these cities spike. So it’s really important that my Python program is turning up the entire population of tweets I am looking for. I have run into a problem, however, that makes my analysis difficult to do.

My program has two queries: One fetches me the number of tweets from 2015 until now that are mapped to a bounding box enclosing a given city and contain my desired keyword. The second query is just a search that fetches me the total number of tweets from 2015 until now that are mapped to the bounding box. From there, I organize the data and run some functions to calculate keyword frequency over time in each city. Here’s the code for the queries:

client = tweepy.Client(bearer_token="REDACTED",
                      consumer_secret=SecretKey, access_token=AccessToken, access_token_secret=AccessTokenSecret, wait_on_rate_limit=True)

# search query
query = 'KEYWORD bounding_box:[-87.296643 14.018294 -87.116257 14.164161]'

# fromDate

start_time = '2015-01-01T00:00:00Z'

#endDate

end_time = '2022-08-01T00:00:00Z'


#make a list of tweets

KeywordTG=[]

for tweet in tweepy.Paginator(client.get_all_tweets_count, query=query, 
                              start_time=start_time, end_time=end_time, granularity="day").flatten(limit=5000):
    KeywordTG.append(tweet)

**QUERY NO. 2**

client = tweepy.Client(bearer_token="REDACTED",
                      consumer_secret=SecretKey, access_token=AccessToken, access_token_secret=AccessTokenSecret, wait_on_rate_limit=True)

# search query

query = 'bounding_box:[-87.296643 14.018294 -87.116257 14.164161]'

start_time = '2015-01-01T00:00:00Z'

end_time = '2022-08-01T00:00:00Z'

#make a list of tweets

AllTweets=[]

for tweet in tweepy.Paginator(client.get_all_tweets_count, query=query, 
                              start_time=start_time, end_time=end_time, granularity="day").flatten(limit=5000):
    AllTweets.append(tweet)

I am finding that for two of the cities I am studying – Tegucigalpa, Honduras and Guatemala City, Guatemala – the second query (which gets me total number of tweets) is turning up lower numbers of tweets over time. So while the first few months of 2015 turn up over 100,000 total tweets per month, by summer 2022 that number is down to a little over 1,000 per month. What’s perplexing is that this is not a problem for my query in San Salvador, El Salvador. I am using the same exact code for my query (although of course with a different bounding box), and the total number of tweets mapping to El Salvador does not decline over time.

I know that the bounding_box parameter matches tweets that a) have user-provided exact coordinates falling within the box or b) a user-provided location whose coordinates fall within the box. Is this issue I’m running into a problem with the bounding_box parameter? Is it reflective of the fact that Twitter phased out geotagging? Or might there be some local phenomenon in Guatemala City and Tegucigalpa, wherein Twitter users began to stop providing information about their location? I have also deduced that the number of tweets that match has:geo has declined over time in Guatemala City and Tegucigalpa, but not in El Salvador. Very weird.

Any help would be greatly appreciated, as my statistical analysis is much less valid if this problem remains.

For what it’s worth, because of how few tweets have geo tags in the first place, and the fact that geo coordinate UX changed over time (location is off by default now, exact location was removed at some point etc.) accounting for this is hard.

It’s difficult to attribute exactly what may be the cause - but possible ones are: Twitter search indexes are flaky and failed to update, or a UI change meant fewer people posting, like it did at some point in April 2015 GitHub - igorbrigadir/twitter-history: Tracking significant changes to the Twitter API or platform as a whole

Also, most “exact location” geo tweets after that UI change come from Instagram, so changes on that platform could indirectly affect things - becasue a lot of people set up a thing that cross posted their insta photos to twitter.

1 Like

Thank you for your response – that all makes sense. But I still find it weird that I don’t run into this problem with El Salvador. I’ll keep investigating.

1 Like