Search API Rate Limits for Project


#1

I’m working on a project for an NLP course I’m taking and I’d like to be able to get larger amounts of data to make the application more robust. I’ve run into two issues:

  1. Multiple searches return the same list of tweets so doing multiple searches for 100 tweets doesn’t actually give me more than 100 tweets
  2. I’ll most likely hit the rate limit pretty quickly if I’m grabbing searches from different topics.
    Are there any recommendations for either issue? Would it make sense to use the Streaming API if I’ll be switching hashtags a lot? I was also thinking that it may make more sense to find just archived tweets and parse that instead of using the API.

#2

Sounds like you need to paginate through searches: https://dev.twitter.com/rest/public/timelines

Ratelimits are pretty permissive for search api - You can make lots more calls with App only Auth: https://dev.twitter.com/oauth/application-only (implementation depends on what language & library you’re using)

Won’t make sense to use Streaming API if you plan on changing tracked keywords frequently - Stream API is much better for long running collections.

There are lots of collections of tweets out there you can get (usually come in the form of tweet ids you need to download) - but that depends on what kind of task you’re using the tweets for. Remember the search API is optimised for relevance, doesn’t return all tweets, and is limited to the last ~7 days: could be an issue depending on what you’re doing.


#3

Thanks for the response!
So I think pagination will really help with getting new tweets I haven’t used before, I’ll have to play around with that and see if I can get that working well.
I’m currently using PHP with the twitter-api-php wrapper: https://github.com/J7mbo/twitter-api-php .Would it be difficult to get App only Auth. working? I read through the link, just not too familiar with how I would go about setting it up.

What I’m working on is nothing terribly complicated; I’m gathering tweets surrounding a given topic and building a language model out of them, then using that language model to generate a sentence with that topic. So I don’t necessarily need all of the possible tweets but the more tweets I can use the better. That’s why I’d rather use the API instead of some pre-made data since I may not have any data for a given topic and I’d like to be able to pull from Twitter for that.


#4

I’m not great with PHP, and can’t see if that library has App only Auth. This library is regularly maintained though, and has app only auth: https://github.com/jublonet/codebird-php