Recommended architecture for cloud based twitter analysis service catering to multiple users



I was hoping to get some guidance on building an architecture for a cloud based twitter analysis service. Suppose I want to build a platform where users can see some analytics based on certain keywords/hashtags they choose. Users can track keywords for days or months and analyse their own respective results. Now to do so I was thinking of employing streaming API to get real time feeds for these keywords. Now suppose I have 100 users following new keywords on a daily basis. I cannot have a single stream for following reasons:

1). Rate limit of 1 users keywords affecting other users part of the stream
2). Restarting stream multiple times a day whenever a new keyword is added

Another option is to create a NEW stream for each user, but then:

1). I have to create a new twitter app with API key and access token for each stream
2). Number of streams grow exponentially and I believe my IP gets blacklisted

Can someone help me on what architecture needs to be followed ideally? I know getting data from Gnip/DataSift may be an option but I dont have that sort of budget.

Looking forward to your insights



I think the only way this would work is to use the public streaming API up to its limits, and then potentially to migrate to something like Gnip. You’re right that you shouldn’t have more than one stream per app, and frequent reconnections to the streaming endpoint will lead to errors.


Thanks Andy. Is there a limit on the number of apps I can run per IP? For instance will it be an issue (technically and also as per Twitter’s TOS), if I create multiple apps and create 1 stream per app? Or maybe 2-3 streams/app?

I couldnt find a quantitative answer to this query anywhere.

Also, when you say use the public streaming API to its limit, do you mean just track 400 keywords max in totality? Or is there some sort of workaround?

Alternatively , is what mentioned in this post possible:



There is a limit, yes - the user streams documentation states:

Also note that the number of simultaneous connections per IP address is still limited, regardless of application.

At least one other developer in this forum is hitting a limit around 10 connections - I cannot speak to whether that number is an absolute limit or not.

Yes, that’s what I was referring to.

Site streams would be an alternative way of achieving what you are trying to do, but the programme is at maximum capacity and we are not currently planning to add new applicants.