I’ve done a bit more digging on this here, and discovered that the problems seem to be in connecting to the Twitter streaming API from AWS EU infrastructure, and seemingly nowhere else.
I wanted to try to remove as many mitigating factors as possible, so started playing around with connecting using curl. I’m seeing the same issues connecting to the API simply using curl and not passing any oAuth credentials at all - so that should rule out any problems being caused by the account being used for authentication. This was also being done from an entirely fresh instance brought up in the AWS EU region in Ireland - so anything to do with IP restrictions shouldn’t be affecting these tests either.
When trying to connect direct with curl the response we’re expecting is to be told that we’re unauthorized - ie. a response containing:
Problem accessing ‘/1.1/statuses/filter.json’. Reason:
We knocked up a quick script (copied below) to test how often we see this dropped connection behaviour and to compare it to trying to create the same connections from servers located elsewhere. Trying to connect from a box in the AWS EU region we saw 52 empty replies and 48 successful connections in 100 attempts. Doing the same thing from a box in the AWS US East Coast facility, and also direct from a laptop over a pretty awful ADSL line from here in London, both of those scripts managed to connect successfully 100/100 times.
I’m also seeing similar behaviour in this testing with curl, in that requests which have a very small payload - ie. just a handful of track terms, seem to connect successfully. However the failures occur once there are more terms. I’m unsure so far whether the frequency of the empty responses increases with term count, or whether there is some cutoff above which the failed connections occur.
CURL_CMD="curl --max-time 120 --connect-timeout 60 --request ‘POST’ https://stream.twitter.com/1.1/statuses/filter.json --data ‘track=$TERMS’"
for i in $(seq 1 100)
echo "Connecting attempt $i"
if [[ “$response” == *Unauthorized* ]]
elif [[ “$response” == *Empty* ]]
echo "Unrecognised response"
echo "Failures: $FAILURE_COUNT. Connections: $CONNECT_COUNT"
echo “FINAL: Failures: $FAILURE_COUNT. Connections: $CONNECT_COUNT”