I am writing my thesis on the public debate during the covid-19 pandemic in Denmark.
Based on a live scraped dataset collected during covid-19, I have Identified a sample of danish tweets. To comply with GDPR I need to know which tweets are deleted by the users, in order to remove these from my dataset. However, I have had a hard time identifying which tweets are deleted specifically by the user. Do you have a solution to this issue?
Kind regards
Ida
I need to know which tweets are deleted by the users, in order to remove these from my dataset.
You can use GET statuses/lookup | Docs | Twitter Developer Platform to check 100 tweet ids at a time, any deleted tweets or tweets from suspended accounts will not be retrieved. If it’s a very large dataset it may take a while.
Twarc has a good dehydrate function to extract ids and a hydrate function to look them up again: GitHub - DocNow/twarc: A command line tool (and Python library) for archiving Twitter JSON this is a good way to remove deleted tweets.
There’s also a batch compliance endpoint but that’s not fully available yet i don’t think Introduction | Docs | Twitter Developer Platform
2 Likes
Thank you very much for your useful reply.
I have tried to use the Batch Tweet compliance lookup, but it seems I lack ‘authentication’ - I am not sure if I got something wrong, or if it is due to the endpoint not being available yet?
Do you have any experience with the batch compliance endpoint?
Yeah unfortunately i don’t think it’s available generally yet - it was in trial for a while.
I think you’re stuck with using statuses/lookup to check for deletions. The twarc command line tool i linked is the best way to use that.