Getting duplicate tweets from Streaming API

ruby

#1

Hi all,

I have been using the twitter streaming API successfully for well over a year now.
I am using ruby with tweetstream and twitter gems. Works well. I did not change my code or update the gems.

Since a few hours I can see a lot of duplicate tweets coming in. Not all, but I approximately 80% of the incoming tweets are coming in a second time. Only a second time, never more that two times. And the second one is coming in just milliseconds after the first. They have the same IDs.

Here is an excerpt from my application log:

12:38:17 twitter.1 | DEBUG [30.03.2016 12:38:17.683]: TWITTER 715125993103572992 from Super_HSV: "+news+ #hsv Jung wäre gerne Pol…"
12:38:17 twitter.1 | DEBUG [30.03.2016 12:38:17.730]: TWITTER 715125993103572992 from Super_HSV: "+news+ #hsv Jung wäre gerne Pol…"
12:38:17 twitter.1 | DEBUG [30.03.2016 12:38:17.992]: TWITTER 715125994097623041 from Super_HSV: "+news+ #hsv Abstiegsangst behin…"
12:38:18 twitter.1 | DEBUG [30.03.2016 12:38:18.031]: TWITTER 715125994097623041 from Super_HSV: “+news+ #hsv Abstiegsangst behin…”

Anybody knows what’s going on here?

Thanks and best regards
Christian


#2

That’s weird. Just to be clear, which endpoint are you connecting to?


#3

Thats not easy to say, as I am using the ruby tweetstream gem which depends on the em-twitter gem.

I could find

  :host               => 'stream.twitter.com'

So this could be the default host/endpoint tweetstream is using…


#4

Hi Team,

I’m facing exactly the same problem. It started to happen 15 hours ago and still happening. I tried to regenerate the consumerKey and the consumerSecret but it didn’t work.

Please advise.

Thanks


#5

Good to hear that I am not alone with this.
Around 15 hours ago could be the time it started for me too.

Duplicates are still coming in right now.
I could provide some debug info if needed.

Christian


#6

Just to confirm, we experience the same problem since yesterday 7 PM GMT+2 (we’re located in France):

~/twitter-1.14.3$ ./dup_checker.py
1 /s average: 1 dups: 0
84 /s average: 42 dups: 40
84 /s average: 56 dups: 40
79 /s average: 62 dups: 36
101 /s average: 69 dups: 49
92 /s average: 73 dups: 48

But sometimes, if I disconnect and reconnect, the problem completely disappears, very weird.


#7

Thanks, this is useful information to know that it seems to have occurred at around a consistent time. I’ll ask the team responsible for the Streaming endpoints to take a look.


#8

Here is the script I wrote to confirm the issue was not on our side:


#9

@andypiper Thank you for acknowledging the issue. Please keep us posted about the status.


#10

I’m getting duplicates too, glad to not be alone with this.


#11

No news from the Twitter API team ?

It seems duplicates have decreased but now we experience a high latency of the stream:

~/twitter-1.14.3$ ./dup_checker.py
8 /s average: 8 dups: 0 latency: 221.86 sec (max) 0.87 sec (min)
30 /s average: 19 dups: 0 latency: 145.58 sec (max) 1.17 sec (min)
27 /s average: 21 dups: 0 latency: 94.26 sec (max) 2.22 sec (min)
35 /s average: 25 dups: 0 latency: 288.35 sec (max) 3.29 sec (min)
40 /s average: 28 dups: 0 latency: 287.53 sec (max) 3.94 sec (min)
74 /s average: 35 dups: 0 latency: 269.24 sec (max) 4.31 sec (min)
114 /s average: 46 dups: 0 latency: 358.39 sec (max) 4.49 sec (min)
131 /s average: 57 dups: 0 latency: 433.94 sec (max) 3.94 sec (min)
122 /s average: 64 dups: 0 latency: 434.66 sec (max) 3.87 sec (min)
126 /s average: 70 dups: 0 latency: 341.11 sec (max) 3.8 sec (min)
131 /s average: 76 dups: 0 latency: 359.15 sec (max) 3.84 sec (min)
132 /s average: 80 dups: 0 latency: 288.77 sec (max) 3.98 sec (min)
131 /s average: 84 dups: 0 latency: 359.11 sec (max) 3.8 sec (min)
125 /s average: 87 dups: 0 latency: 341.64 sec (max) 3.46 sec (min)
131 /s average: 90 dups: 0 latency: 359.39 sec (max) 3.29 sec (min)
133 /s average: 93 dups: 0 latency: 436.23 sec (max) 3.33 sec (min)
127 /s average: 95 dups: 0 latency: 359.47 sec (max) 2.57 sec (min)
126 /s average: 96 dups: 0 latency: 281.18 sec (max) 2.3 sec (min)
132 /s average: 98 dups: 2 latency: 436.54 sec (max) 1.92 sec (min)
131 /s average: 100 dups: 0 latency: 435.47 sec (max) 1.84 sec (min)
133 /s average: 101 dups: 0 latency: 436.78 sec (max) 1.78 sec (min)
125 /s average: 102 dups: 0 latency: 436.74 sec (max) 1.75 sec (min)
137 /s average: 104 dups: 1 latency: 360.16 sec (max) 1.77 sec (min)
134 /s average: 105 dups: 1 latency: 436.6 sec (max) 1.81 sec (min)
59 /s average: 103 dups: 1 latency: 270.53 sec (max) 1.91 sec (min)
115 /s average: 104 dups: 2 latency: 340.95 sec (max) 2.47 sec (min)
136 /s average: 105 dups: 1 latency: 436.71 sec (max) 1.81 sec (min)
131 /s average: 106 dups: 0 latency: 436.53 sec (max) 1.74 sec (min)
133 /s average: 107 dups: 3 latency: 436.67 sec (max) 1.77 sec (min)


#12

@julienvigi I can confirm that.
An hour ago the duplicates started to decrease. I see very few now coming in…


#13

@chriso0710 Well once again, it really seems to depend on how lucky you are when you connect to the streaming servers :wink:

120 /s average: 114 dups: 67 latency: 623.11 sec (max) 0.8 sec (min)
124 /s average: 114 dups: 70 latency: 784.07 sec (max) 0.82 sec (min)
112 /s average: 114 dups: 58 latency: 640.65 sec (max) 0.82 sec (min)
116 /s average: 114 dups: 62 latency: 623.79 sec (max) 0.83 sec (min)
102 /s average: 114 dups: 37 latency: 784.72 sec (max) 0.81 sec (min)
126 /s average: 114 dups: 64 latency: 640.93 sec (max) 0.82 sec (min)
123 /s average: 114 dups: 63 latency: 526.85 sec (max) 0.83 sec (min)
138 /s average: 114 dups: 62 latency: 641.32 sec (max) 0.81 sec (min)
117 /s average: 114 dups: 65 latency: 785.47 sec (max) 0.81 sec (min)
109 /s average: 114 dups: 52 latency: 641.67 sec (max) 0.82 sec (min)

Ctrl+C + relaunch:

17 /s average: 17 dups: 0 latency: 475.47 sec (max) 1.31 sec (min)
35 /s average: 26 dups: 0 latency: 625.62 sec (max) 1.41 sec (min)
33 /s average: 28 dups: 0 latency: 643.69 sec (max) 1.82 sec (min)
40 /s average: 31 dups: 0 latency: 488.3 sec (max) 2.88 sec (min)
55 /s average: 36 dups: 0 latency: 788.04 sec (max) 3.42 sec (min)
73 /s average: 42 dups: 0 latency: 645.38 sec (max) 3.87 sec (min)
125 /s average: 54 dups: 0 latency: 789.49 sec (max) 3.74 sec (min)
150 /s average: 66 dups: 0 latency: 788.8 sec (max) 3.41 sec (min)
129 /s average: 73 dups: 0 latency: 519.08 sec (max) 3.19 sec (min)
148 /s average: 80 dups: 0 latency: 788.82 sec (max) 2.83 sec (min)


#14

This is being looked into.


#15

We believe we’ve identified this issue and have it resolved. Apologies for the short period of duplicate deliveries, and thanks for raising!


#16

@andypiper Thank you, that’s great news and good support from you guys!
Looks good here. I will continue to monitor the incoming tweets for the next hours.


#17

Thanks for the update!
I can confirm everything is looking great on our monitors, no duplicate and max latency of 2 sec, looks perfect ! Could we know what happened ? Thanks


#18

As I understand it there was some latency issue between shards, but it’s difficult to go into detail. Gnip customers were unaffected - this just affected the public streaming API. Apologies once more for the inconvenience over this short period.


#19