I’m working on a kind of a twitter wall. Users can login with twitter and create their own wall, which will display the tweets for certain terms/hashtags.
I’m still looking for the best strategy to get the data out of the Twitter APIs.
Following some of my thoughts:
Strategy 1: Streaming API
- Open a single stream (POST statuses/filter) for all walls
- Each hashtag is added to the track parameter
- When new tweets arrive, they will be processed and sent to the corresponding wall
- (“one account, one application, one open connection” cf. https://dev.twitter.com/discussions/14935)
Problems with the Streaming API
Streaming api is limited to 400 keywords to track
What to do if there are more than 400 keywords to track?
Streaming api is limited to 1% of the tweets of the firehose
It’s very difficult to get above 1% of the firehose, but if you’re tracking a term like “apple” it’d be pretty easy to exceed the 1%. (cf. https://dev.twitter.com/discussions/6349)
How can I handle such popular terms? Blacklist them?
Strategy 2: REST Search API**
- Store user access tokens
- Poll the Search API (GET search/tweets) on behalf of the user, respecting the rate limits of 180 queries per 15 minute
- (cf. https://dev.twitter.com/discussions/11141)
Problems with the REST Search API
- Could get very expensive to poll the API for a lot of users.
Do you have any suggestions/recommendations which strategy would fit the best? Are there already solutions for these problems?