Hi Brian,
A team in our college is conducting an online contest where the plan is to provide the participating groups the following:
- A list of tweetids(Not tweet text).
- For every tweetid, the corresponding hand-curated topic class the tweet text belongs to(say ‘Politics’ or ‘Sports’)
- A script(using standard Twitter API) to download the tweet texts from a list of published tweetids.
The participants needs to train a machine learning algorithm to predict the topic class a particular tweet belongs to. For this they need datasets consisting of about 10000 tweet texts. The script we provide will help them download this dataset.
Will this violate the Twitter Terms of Service?
I understand Niek above used a similar method enabling his users to recreate the tweets from a list of tweetids. We are following a similar approach. And our dataset will be limited to only the participants of the contest and will not publicly available.
We assure you we are going to publish only the list of tweetids and not the text. The script we provide will help download the text on a need basis.
Thanks a lot,
Anirban