I’ve been collecting tweets (using the Tweepy Twitter API) that contain specified keywords and/or hashtags. I’ve been doing so using the free twitter developer account which only allows me to retrieve tweets from up to 7 days back. I need to collect from months ago now so I upgraded to the premium account with full archive access but my python script still only works for the 7 day period. It’s not working for anything further back than that. I’ve regenerated all of my keys (except the bearer token?) and still nothing. Here’s the code:
import re
import io
import csv
import tweepy
import itertools
import json
from tweepy import OAuthHandler
consumer_key = '[my consumer key is here]'
consumer_secret = '[secret key is here]'
access_token = '[access token is here]'
access_token_secret = '[secret token is here]'
// create OAuthHandler object
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
// set access token and secret
auth.set_access_token(access_token, access_token_secret)
// create tweepy API object to fetch tweets
api = tweepy.API(auth,wait_on_rate_limit=True)
def get_tweets_withHashTags(query, startdate, enddate, count = 300):
tweets_hlist= [ ]
tweets_list= [ ]
qt=str(query)
for page in tweepy.Cursor(api.search, q=qt, since=startdate,until=enddate,count=300, tweet_mode='extended').pages(100):
count = len(page)
print( "Count of tweets in each page for " + str(qt) + " : " + str(count))
for value in page:
hashList = value._json["entities"]["hashtags"]
flag = 0
for tag in hashList:
if qt.lower() in tag["text"].lower():
flag = 1
if flag==1:
tweets_hlist.append(value._json)
tweets_list.append(value._json)
print("tweets_hash_"+ query +": " + str(len(tweets_hlist)))
print("tweets_"+ query +": " + str(len(tweets_list)))
with open("/Users/victor/Documents/tweetCollection/data/"+startdate +"/" + "query1_hash_" + str(startdate)+ "_" + str(enddate) + "_" +query+'.json', 'w') as outfile:
json.dump(tweets_hlist, outfile, indent = 2)
with open("/Users/victor/Documents/tweetCollection/data/"+startdate +"/"+"query1_Contains_" + str(startdate)+ "_" + str(enddate) + "_" +query+'.json', 'w') as outfile:
json.dump(tweets_list, outfile, indent = 2)
return len(tweets_list)
query = ["keyWord1","keyWord2","keyWord3","keyWordEtc."]
for value in query:
get_tweets_withHashTags(value,"2020-04-12","2020-04-13")
Again, this works perfectly fine for anything within the past 7 days. Changing the date to further back than that returns 0 results.
Academic Access uses v2 APIs, which tweepy does not support. tweepy is still using the v1.1 API underneath.
You can use this for v2: GitHub - twitterdev/search-tweets-python at v2