Search API Policies for Research


#1

Hello All,

I just have a few questions regarding the use of Twitter’s Search/Stream API. I would like to collect some data for some text mining applications. My two questions are:

  1. Can I take advantage of Mechanical Turk’s service to hand label messages? It isn’t clear to me if this falls under the policy of not-sharing collected messages, or that the text may be separated from the original ID.

  2. Can hand labeled Tweets that are used as a training dataset be shared? If they are just a table of messages and labels, with no other message that can be linked back to the other user, does this violate the policy.

Thank you for any help.

Regards,
David


#2

I haven’t received a direct response to this forum post (obviously) or my email inquiries. At least, Twitter’s response wasn’t clear to me: “Using Mechanical Turk to distribute the text part of the Tweet to be hand labelled should be a problem.” I couldn’t tell if that was a typo and should say “shouldn’t be a problem?”

Anyway, what I believe would work, and not violate the terms of service is to host HITs myself. For example, take a twitter ID, display it on a web page (conforming to the Twitter policies), and let the Turk user rate or classify. There seem to be some libraries set up for the hosting (not necessarily the twitter part): http://gureckislab.org/mtworkshop/#2, https://github.com/NYUCCL/psiTurk

I thought I would post a follow up if anyone else is interested. I DO NOT have this confirmed from Twitter, but it seems like it would stay in the bounds of their policies.