Sharing Tweet-Ids in repositories


#1

Hi everyone,

I hope this is the right place for my question.
I work for a data repository. In the past, data depositors have asked us to archive and publish Twitter-related data, i.e. especially Tweet-Ids. I cannot find explicit/comprehensive policies about how (and if) Twitter allows Tweet-Ids to be shared by repositories. In particular, I wonder whether
a) Twitter has a limit about how many Tweet-Ids may be shared. Please note that I am talking about Ids, not tweets. Thus, Twitter API is always needed to access the tweets and rate limits for API automatically apply.
b) Twitter has policies that allow/disallow extracting information from tweets such as hashtags and sharing those via data repositories directly (not via Tweet-Id).

Thank you very much for your help in advance!
Thomas Ebel


#2

Yep! Sharing Tweet IDs (and or user IDs) is the preferred way to share twitter datasets: Twitter and Open Data in academia

I’m not aware of any restrictions on how many tweet IDs you can share - but the ratelimits for downloading tweets in bulk may put a practical limit on that. As an example: with https://dev.twitter.com/rest/reference/get/statuses/lookup you can download 1,728,000 tweets in 24 hours - if you also use application authentication, you can get up to 2,304,000 in 24 hours.

How many days someone is willing to wait for a dataset to download depends on how interesting the data is…

Not sure if anything’s changed recently, but it should be fine to share derivative data - A database of status an example might be hashtag counts over time, or this giant language model, because that doesn’t have the original tweet text. But even then, it’s nice to have tweet ids to go along with the derived data.


#3

Thank you very much, IgorBrigadir, for your helpful reply.

Following up on your example to get more clarity: In my opinion hashtag counts would not be (truly) derivative work. At least when in the case of files with content “hashtag count”. After all, it is content from tweets which could/would not get updated in case a user deletes their tweet. Is sharing such files permitted?


#4

If a user deletes a Tweet it must not be persisted elsewhere. This is the core reason why we request that you share IDs rather than body text, so that future API calls will correctly reflect that they are no longer in existence.


#5

Thank you, Andy!