Get more than 1% of Tweets from streaming API?


I am PhD researcher and would like to collect a large amount of Tweets which fit a certain criteria (e.g. 2 million tweets related to ‘Pluto’, or a large enough quantity). NOT for commercial purposes.

I tried both Twitter’s REST API and Streaming API but due to either rate limiting, or random sampling (the streaming API only returns 1% of tweets), the result is not ideal. From my investigation it seems there are a couple of options:

  • Twitter’s Firehose
  • GNIP

Anyone out there has experience doing large scale tweet mining and would be able to give me some pointers on the best (and most effective) way to go about this? GNIP and Firehose both don’t seem to be free, and as a student I obviously have very little wiggle room on budget. :-/

Thanks a lot!


@yilinghwong, Twitter/Gnip does not provide free data to individuals even for research purposes.

Twitter/Gnip does provide data to select universities in the US and around the world, though. I would check with your university engineering/sciences departments to see if your university has a relationship with Twitter or a relationship with another university that does. You may also want to consider contacting Gnip through a university representative to see if there is potential for a research partnership.