Looking for help from anyone with experience releasing Twitter datasets in an academic context, and hopefully guidance from Twitter staff.
In the UK, as in other countries, there is a growing requirement to make research data openly accessible to other academics (and non-academics). Some of the funding that I hold includes this stipulation.
Around the UK General Election debates this year, we collected some 38,000 tweets posted to official hashtags. We used 2% of these tweets—just shy of 800—for a manual qualitative analysis. We’d like to release those 800 tweets as an open dataset alongside the relevant publication.
I’ve been in discussion with my university about the possible legal issues around this, and we’ve reached the following conclusions:
- We think that releasing a CSV file of 800 tweets as a download is fine in principle (based on I.6.b.i of the Developr Policy: https://dev.twitter.com/overview/terms/agreement-and-policy)
- We think the copyright aspect is fine. Although tweets may be copyrightable under EU law, the users are licensing them to Twitter, who are sublicensing them to us.
The point where we’re getting stuck is licensing. Normally, you would release your data under some form of Creative Commons license. However, the Twitter Developer terms specifically forbid sublicensing the content (I.B in the Developer Agreement: https://dev.twitter.com/overview/terms/agreement-and-policy).
Does anyone have any idea how we might release the tweets without falling foul of this?
(NB: We’re aware we can bypass all these issue by just releasing Tweet IDs. However, for other researchers to scrape these tweets would be time consuming, and not in the spirit of sharing data.)