Hi,
I am using searchtweets’ full archive search endpoint with an academic research status. My code looks like this:
def pull_searchtweets(SEARCH_QUERY, RESULTS_PER_CALL, FROM_DATE, TO_DATE, MAX_RESULTS):
search_args = load_credentials(filename="./twitter_keys.yaml",
yaml_key="search_tweets_v2",
env_overwrite=False)
rule = gen_request_parameters(SEARCH_QUERY,
results_per_call=RESULTS_PER_CALL,
start_time=FROM_DATE,
end_time=TO_DATE
)
rs = ResultStream(request_parameters=rule,
max_results=MAX_RESULTS,
**search_args)
tweet_list = [ ]
try:
for tweet in rs.stream():
tweet_list.append(tweet)
except:
print('An error happened while retrieving data')
return tweet_list
The arguments passed on to this function look like this:
coords = geo_data_df['bounding box'][i] # coordinates of different districts in England
keywords = 'immigration OR immigrants'
SEARCH_QUERY = f'{keywords} -RT lang:en bounding_box:[{coords[0]} {coords[1]} {coords[2]} {coords[3]}] '
RESULTS_PER_CALL = 20
FROM_DATE = '2017-01-01'
TO_DATE = '2018-01-01'
The issue is that it is returning the same set of tweets for all the different locations (defined by the bounding box in the query). I know for sure it is the same tweets (and not only retweets) because they have the same tweet ID (returned by searchtweets.ResultStream.stream()). It’s like the same tweet can fit within all the different bounding boxes.
Any idea what is going on?
Thank you.
1 Like
You’re getting the same tweets because for every location, you’re getting any tweet from anywhere that has “immigration”
The query you end up with is:
'immigration OR immigrants -RT lang:en bounding_box:[0.0 0.0 0.0 0.0] ’
(i just filled in 0s for the coords as an example…)
Due to operator precedence Building queries | Twitter Developer (a space is an implicit logical AND is processed before OR), you end up searching for tweets that contain “immigration” from anywhere… OR immigrants, not containing “RT” in English, with bounding box.
The corrected query should be:
(immigration OR immigrants) -is:retweet bounding_box:[...]
So changing it to keywords = '(immigration OR immigrants)' and removing -RT should do it.
-is:retweet may not be needed because retweets do not have geo information so should not be matched.
Hope that helps!
3 Likes
AH that’s true, I didn’t realize I needed the parentheses. I modified it and it worked! Thanks for helping 
1 Like