I am PhD researcher and would like to collect a large amount of Tweets which fit a certain criteria (e.g. 2 million tweets related to ‘Pluto’, or a large enough quantity). NOT for commercial purposes.
I tried both Twitter’s REST API and Streaming API but due to either rate limiting, or random sampling (the streaming API only returns 1% of tweets), the result is not ideal. From my investigation it seems there are a couple of options:
- Twitter’s Firehose
Anyone out there has experience doing large scale tweet mining and would be able to give me some pointers on the best (and most effective) way to go about this? GNIP and Firehose both don’t seem to be free, and as a student I obviously have very little wiggle room on budget. :-/
Thanks a lot!