Sorry, if this is a know issue, I might have missed it the documentation
I have used Premium full archive search (V2 - sandbox) to collate tweets for research
Breifly I used python -searchtweets with pycham to grab tweets from a specific user ( Account name /User _ID) and a date range (Feb 7 - Feb 16 2009) with keyword operators and then printed the results to a CSV file
But while examining the CSV file, every single tweet shares the exact same tweet date/time which is 11/04/2010 - 1:42. This is also true for the other User_ID’s I have tested through all of the 2009 period date ranges. If i manually search for these tweets through twitter advanced search they display dates and times which fall within the originally selected date range. I.E. 08/02/2009 8:35, 09/02/2009 6:25 etc.
Im wondering if this is due to the original tweets being republished as a batch on 11/04/2010 - 1:42 ?
- edit i since worked out a solution
The problem code turned out to be the relationship between twitters snowflake approach to datetime,
and the implementation of created_at code - it was always accessed the datetime in the snowflake. This means it only returned the most recent created_at datetime which was (i assume the batch creation of these tweets). Changing dictionary elements in search-tweets python i was able to obtain the original datetime as follows. This is rather counter-intuitive and is likely an issue for anyone working on tweet data pre 2010. I may post to the github python search tweets later but thought I would confirm the issue here first.
writer.writerow((tweet.user_id, datetime.datetime.strptime(tweet.get('created_at'), "%a %b %d %H:%M:%S %z %Y").astimezone(), tweet.all_text))
print(f"{tweet.all_text}.\nCreated at: {datetime.datetime.strptime(tweet.get('created_at'), '%a %b %d %H:%M:%S %z %Y').astimezone()}")
Another small issue is that almost all the weblinks in the tweets are broken tinyURLS and I expect this is because either the page they were hosted on no longer exists and/or because twitter moved away from the tinyurl encododing to t.co. I would just like some confirmation for this second problem to keep my research as transparent as possible (dont actually need working links for my research).
Thanks all have a great day
1 Like
Yeah tinyurls were a third party service that was popular as far as I remember.
I’ve a few other old major changes listed here in case it helps:
Snowflake IDs came in at some time in around 2010-06-01 and t.co started wrapping links around 2010-06-08. (Links to blog / changelog in repo)
Thanks Igor, As always a wickedly quick response, ill have a browse cheers
1 Like