Missing retweets in statuses/filter?


#1

Hi,

I currently monitor mentions of politicians’ @usernames and certain keywords using statuses/filter. I recently noticed an issue with my data this past month (possibly earlier) where retweets are not showing up in the stream. The volume of data on the days that I discovered was not very high, so I am not sure why the retweets are not being returned. It seems unlikely that I would get zero retweets for certain tweets especially on low-volume days.

For example, since May 22nd 2015 I have only received 0 - 6 retweets/day for @NajibRazak. But looking at his timeline there are thousands of retweets that have not shown up. For today (June 5th) tweets alone I only received 5 retweets in the stream but there should be thousands.

Is this normal? Would using the ‘follow’ parameter or Search API help to get the missing retweets?


#2

Thanks for the question. What endpoint and filter/parameter are you using currently?

And has the overall volume of activity dropped across the board – IE, have you possibly added other terms, which increases the breadth of results, but reduced depth on individual terms? (If that makes sense)


#3

Thank you. I am using the public statuses/filter stream. I did get the impression that adding popular keywords does reduce the depth like how you mentioned, which is why I limit what I track.

I have yet to figure out why I have been getting so little retweets as the daily tweet volume has been low for most of that period.

I think it depends on how the 1% limit is being implemented on the public stream. If it prioritises tweets over retweets that might explain the drop in retweet volume.

On a daily basis, the streaming API appears to be providing anywhere between 23K - 108K tweets per day from April 1st - May 21st, depending on what is happening on that day. From May 22nd - June 2nd that dropped to 19K - 35K tweets per day. We had 61K tweets on June 3rd so I don’t think this is a capping issue.

For April - May 21st the proportion of retweets is about the same for most days, between 30% - 50% with a few exceptions, so I think tweets are being randomly distributed. From May 22nd - June 2nd it has changed to 40%- 47%. Because retweet proportion is dependent on the topic I do think its hard to compare these stats.

Another possibility is that the retweets are being made by protected accounts. Those retweets would not show up in the stream, correct?


#4

Would using the follow parameter make a difference in getting retweets e.g. following @NajibRazak’s user ID combined with monitoring ‘@NajibRazak’ in the track parameter ?


#5

Protected account activity would definitely not show up in the stream. Does your use case rely heavily on accounts that changed state of this sort?


#6

Hi,

Thank you for your reply. The reason I brought up protected accounts was because we have a problem in our politics with spamming services being used to increase retweets for politicians. For a moment I was concerned that the spammers started using protected accounts.

Fortunately over the weekend I managed to use the Search API to get roughly 90% of the missing retweets for recent days, so the protected account issue is not important right now.

Which brings me back to my earlier concern on why I have been receiving 0 retweets for certain tweets especially since May 22nd. The volume of tweets on those days was relatively low so this was unexpected.

Just a few more questions to help me understand:

  1. Are the tweets delivered by the public Streaming API randomly selected, or is there a limit/bias for RTs? E.g. RT volume may be capped to 50% of all tweets-per-second.

  2. Is the 1% volume limit enforced on a per-minute or per-hour basis, or some other basis?

  3. Would using the follow parameter instead of the track parameter for users that I track encourage the API to deliver more retweets of those users’ tweets? Should I use both?

At the moment I am more concerned about a work-around, and using the Search API seems adequate for our research work.

However if we are doing a live website with streaming data, then the Search API would be impractical to use for filling in any missing data. So if there is anything I can do to maximise volume of data received with the Streaming API I will make the changes.

Thank you for your time.