Twitter Streaming API filterStream {streamR}, track predictate not working


#1

I am calling Twitter Streaming API to return tweets matching one or more filter predicates.
(using fileStream method from R package streamR)

The method has ‘track’ as one of the parameters to listen for specified keywords.

However it seems the track parameter is not having any effect as none of the captured tweets contain the specified keyword.

filterStream(file.name=“twitter/data/mau-hotels.json”,
track=“mauritius”,
locations=c(-5, 36, 22, 70),
timeout=10, oauth=my_oauth,language = “en”)

The returned json response doesnt have any tweet with “mauritius”. Why is this behavior?

Thanks in advance.


#2

What this combination of parameters will give you is an OR situation - either Tweets containing the word “mauritius”, OR Tweets from inside the geo box you’ve specified. This is documented in the Streaming API page on our dev site. Note that you’re filtering against 1% of the Twitter firehose, so it is entirely possible that nobody (or very few people) Tweet about Mauritius in the time frame that you’re filtering, but many more people are Tweeting in the bounding box.


#3

Andy – Is the limitations of the filter stream described anywhere in the docs? I had thought that the filter was applied to the entire firehose, but only up to 1% was returned. However, your answer suggests that the filter is applied to 1% of the Twitter firehose. I want to make sure I understand correctly.

Thanks!
–Justin


#4

If that is the impression from my previous answer (and to be honest, even as I typed it out, I feared I was not being clear enough) then I apologise, your latter understanding is correct. On the streaming API you get to filter the entire firehouse up to 1%, based on your chosen filter criteria. If there is a chance that at a point in time your filter terms (track, follow or geo) may exceed 1% of the total volume then you may not receive everything - the example I’ve used in past years is, for example, #worldcup potentially exceeding that total volume during spikes during that event.

As we’ve signalled in the roadmap, we will look to improve the fidelity and flexibility of these endpoints in the future.


#5

Thanks @andypiper understood and makes sense.

Is it then somehow possible, using public APIs, to have the AND between track and locations?
Basically I would like to get tweets from a given location box on a particular track.


#6

You could do that using the standard search API but it would only cover the past 7 days and not be a realtime collection mechanism.