How do I Request an Increase in Max Keywords for Streaming API


#1

We are building a web application to monitor new tweets for a pre-defined array of keywords. Currently, in testing, the array contains 2,777 keywords to be tracked using the Streaming API. The information we’ve read regarding rate limits is more geared towards the Search API, although the documentation does state that, for the Streaming API, the rate limit is determined by an undefined percentage of the max volume of tweets (if I understood it correctly).

Eventually, we will need to be able to track approximately 14,000 keywords over our single connection (with an incredibly low volume of new tweets/keyword due to the nature of the application). How can we request an increase in the max number of keywords that we can have in our Streaming API request?

The error message we received when attempting to capture our sample subset of 2,777 keywords returned the following:

“Phirehose: HTTP failure 4 of 20 connecting to stream: HTTP ERROR 413: Request Entity Too Large (Parameter track has too many items specified). Sleeping for 80 seconds.”

As an example, the vast majority of the keywords only get new tweets between business hours (EST), M-F, with a (from our sample data set of 300 keywords) MAX volume of approximately 1-2 new tweets / minute during the aforementioned hours. And, for a large portion of our keywords, they do not get any new tweets per minute, but we still need to have them in the array in order to avoid constantly dis/connecting to the API. Furthermore, we need to be able to graph the lack of new tweets for those new keywords, so it is imperative that they always be in the single request.

Our application is already decoupled, implements caching, and utilizes the recommendations and best practices outlined to ensure we do not interfere with service or experience problems with rate limits. However, we need to be able to index the totality of the keywords, in order to do proper message volume calculations and properly interpret the respective influencers (all of which we are doing on our server-side) post-aggregation.

Due to confidentiality, we cannot publicly disclose the nature of the application (yet), but we would love to talk with a Twitter dev/engineer to discuss the specifics if that would help further clarify the situation and need for an increase in the allowed request parameters.

Thank you for your time; we look forward to your response.


#2

For sanity sake, a relatively fair comparison is how Topsy is/was doing their charting of a keyword. However, whereas, with Topsy the user can define the keyword to be graphed, we will only be graphing data from the pre-defined subset, and the data will be graphed based upon our DB result set.

Hopefully, that clarifies the use-case scenario.