Hi,
I currently monitor mentions of politicians’ @usernames and certain keywords using statuses/filter. I recently noticed an issue with my data this past month (possibly earlier) where retweets are not showing up in the stream. The volume of data on the days that I discovered was not very high, so I am not sure why the retweets are not being returned. It seems unlikely that I would get zero retweets for certain tweets especially on low-volume days.
For example, since May 22nd 2015 I have only received 0 - 6 retweets/day for @NajibRazak. But looking at his timeline there are thousands of retweets that have not shown up. For today (June 5th) tweets alone I only received 5 retweets in the stream but there should be thousands.
Is this normal? Would using the ‘follow’ parameter or Search API help to get the missing retweets?
rchoi
#2
Thanks for the question. What endpoint and filter/parameter are you using currently?
And has the overall volume of activity dropped across the board – IE, have you possibly added other terms, which increases the breadth of results, but reduced depth on individual terms? (If that makes sense)
Thank you. I am using the public statuses/filter stream. I did get the impression that adding popular keywords does reduce the depth like how you mentioned, which is why I limit what I track.
I have yet to figure out why I have been getting so little retweets as the daily tweet volume has been low for most of that period.
I think it depends on how the 1% limit is being implemented on the public stream. If it prioritises tweets over retweets that might explain the drop in retweet volume.
On a daily basis, the streaming API appears to be providing anywhere between 23K - 108K tweets per day from April 1st - May 21st, depending on what is happening on that day. From May 22nd - June 2nd that dropped to 19K - 35K tweets per day. We had 61K tweets on June 3rd so I don’t think this is a capping issue.
For April - May 21st the proportion of retweets is about the same for most days, between 30% - 50% with a few exceptions, so I think tweets are being randomly distributed. From May 22nd - June 2nd it has changed to 40%- 47%. Because retweet proportion is dependent on the topic I do think its hard to compare these stats.
Another possibility is that the retweets are being made by protected accounts. Those retweets would not show up in the stream, correct?
Would using the follow parameter make a difference in getting retweets e.g. following @NajibRazak’s user ID combined with monitoring ‘@NajibRazak’ in the track parameter ?
rchoi
#5
Protected account activity would definitely not show up in the stream. Does your use case rely heavily on accounts that changed state of this sort?
Hi,
Thank you for your reply. The reason I brought up protected accounts was because we have a problem in our politics with spamming services being used to increase retweets for politicians. For a moment I was concerned that the spammers started using protected accounts.
Fortunately over the weekend I managed to use the Search API to get roughly 90% of the missing retweets for recent days, so the protected account issue is not important right now.
Which brings me back to my earlier concern on why I have been receiving 0 retweets for certain tweets especially since May 22nd. The volume of tweets on those days was relatively low so this was unexpected.
Just a few more questions to help me understand:
-
Are the tweets delivered by the public Streaming API randomly selected, or is there a limit/bias for RTs? E.g. RT volume may be capped to 50% of all tweets-per-second.
-
Is the 1% volume limit enforced on a per-minute or per-hour basis, or some other basis?
-
Would using the follow parameter instead of the track parameter for users that I track encourage the API to deliver more retweets of those users’ tweets? Should I use both?
At the moment I am more concerned about a work-around, and using the Search API seems adequate for our research work.
However if we are doing a live website with streaming data, then the Search API would be impractical to use for filling in any missing data. So if there is anything I can do to maximise volume of data received with the Streaming API I will make the changes.
Thank you for your time.