The issue doesn’t move with the IP. I just started the same stream A2 (same filter conditions + same access credentials) on the second machine 2. A1 lacking every second tweet I see in A2. Thus, the only difference I could currently make out is the IP – it’s really a very simple script that collects all tweets given a bounding box. I also stopped and restarted A1; no change. [Edit: As already mention, the same stream on Machine 1, just with other credentials also doesn’t show this 50% issue. So I think I can rule out any problems with the machine and connectivity.]
Maybe it helps to elaborate a bit. A1 with the 50% output is my long-running crawler, continuously collecting tweets for about 250 days by now. So maybe Twitter decided at some point to limit that stream. The following graph shows the number of daily tweets for the first 200 days, starting from October 31st, 2014:
As one can see, after 10 or 11 days there’s been a significant drop in the average number of tweets, about 50%. Then I have some spikes that are hard to explain, particularly the one on the weekend of March 7th/8th. Looking at the data showed the number of tweeting users was much higher on the weekend, and not the normal number of users just sent more tweets.
A second clear dropped happend on April 27th/28th. This has been observed by several people [1,2] and coincides with changes Twitter made with respect to fetching tweets based on a bounding box – although this change does not explain why the number of tweets per day would drop so significantly.
[1] 80% reduction in tweets with coordinate data
[2] Volume drop in streaming API