Intermittent problems connecting to streaming API - empty responses (not even HTTP headers)


#1

HI there, we’ve been having trouble with establishing connections to the Twitter streaming API (/1.1/statuses/filter.json) since approx 17.06 GMT on 21/8/2013.

We’ve been running streaming successfully across a number of different projects for the last couple of years. And although some parts of our streamer stack have changed more recently than that, we haven’t changed anything at our end recently which should have an impact on connecting the the API in the manner we’re seeing.

When connecting we’re now often getting null responses - no HTTP headers whatsoever. Our stack is Python 2.7 (Twython/Requests/urrlib3/httplib). All the way down the chain httplib is raising a BadStatusLine exception when it tries to open a connection and receives an empty response. The comment along with the exception in the lib to indicate what it thinks is happening says ‘# Presumably, the server closed the connection before sending a valid response.’. So it seems that the connection is getting closed by Twitter before it’s ever established.

This is only happening intermittently. Not all requests fail to initiate in this way, but lots do, probably around 50% of the time.

I’ve not managed to establish for certain yet, but this seem to be happening more on nodes we have tracking higher number of terms. Those tracking a full 400 seem to be experiencing the problem regularly. I’ve tried reducing the number of terms a node is tracking and the problem seems to be mitigated, although at the moment that’s just speculation.

This only seems to be happening to our streamer nodes hosted in the AWS EU region, not those hosted in the US. Traceroute on the boxes seems to indicate that requests from the nodes in Ireland get routed to Twitter edge locations in London so it seems like there may be a problem there? All traceroute runs seem to complete successfully, which seems to rule out what I’ve read about IP based blacklisting, although I’m not 100% sure that on it’s own indicates there’s no problem there.

Any help in discovering whether we’re somehow being blacklisted in a way which is affecting us intermittently, whether something in the Twitter streaming API endpoints hosted in the UK has changed somehow, would be greatly appreciated.

Regards,
Alex Kelly


#2

I’ve done a bit more digging on this here, and discovered that the problems seem to be in connecting to the Twitter streaming API from AWS EU infrastructure, and seemingly nowhere else.

I wanted to try to remove as many mitigating factors as possible, so started playing around with connecting using curl. I’m seeing the same issues connecting to the API simply using curl and not passing any oAuth credentials at all - so that should rule out any problems being caused by the account being used for authentication. This was also being done from an entirely fresh instance brought up in the AWS EU region in Ireland - so anything to do with IP restrictions shouldn’t be affecting these tests either.

When trying to connect direct with curl the response we’re expecting is to be told that we’re unauthorized - ie. a response containing:

Problem accessing ‘/1.1/statuses/filter.json’. Reason:

    Unauthorized

However much of the time I’m getting curl: (52) Empty reply from server. Strangely curl seems to think it’s transferred the whole of it’s payload up (3652 bytes in this case). It then sits there for around 20 seconds before dying with the empty reply message.

We knocked up a quick script (copied below) to test how often we see this dropped connection behaviour and to compare it to trying to create the same connections from servers located elsewhere. Trying to connect from a box in the AWS EU region we saw 52 empty replies and 48 successful connections in 100 attempts. Doing the same thing from a box in the AWS US East Coast facility, and also direct from a laptop over a pretty awful ADSL line from here in London, both of those scripts managed to connect successfully 100/100 times.

I’m also seeing similar behaviour in this testing with curl, in that requests which have a very small payload - ie. just a handful of track terms, seem to connect successfully. However the failures occur once there are more terms. I’m unsure so far whether the frequency of the empty responses increases with term count, or whether there is some cutoff above which the failed connections occur.

— Connection testing curl script —

#!/bin/bash

SLEEP_TIME=30
TERMS='disillusionment,gram,masculines,flounders,skullduggery,dressage,workplace,snare,colloquia,Magsaysay,effeminacy,Episcopalian,gargles,Erika,intermediate,Tartar,bugging,nonresidents,hooter,Balaton,shirks,glaucoma,commissariat,Transcaucasia,ornithology,adulterer,hagglers,Zionisms,regulator,Hooke,inseminating,ahoy,diplomat,perishables,Lilliputian,schlock,toe,archipelago,thou,readjustments,profusions,ship,motherless,denouement,Silva,Tolyatti,declension,clergyman,zany,killjoy,wittier,dominance,fixation,prefabricates,Alvarado,campaigners,medal,reactivated,Ruth,Jeremiah,lurk,furnishings,mild,rebound,beltway,effigy,antiabortion,feels,sturdiness,stuffing,scrubbing,Bourbon,hemorrhoids,mop,Capablanca,mobiles,globules,centerfold,clef,Na,rostrum,Helene,deaconess,maelstroms,Puccini,vivacity,ion,councils,snoozing,Lela,floozy,Cotopaxi,Finnish,trolled,scalpel,Swanson,reforms,nuisance,biospheres,triennial,counts,Laundromat,Shanghai,ladybugs,disciplining,ratting,undemonstrative,deposing,merit,cooperating,vagrancy,ovum,edification,Emile,ye,Caribbeans,abreast,obsessions,drones,oxide,ornament,brightly,Passovers,hood,operand,yeast,wiring,Mahabharata,tiffs,defaulters,sparse,Moloch,vision,pyromaniac,improvident,attest,imagining,Atacama,paramedical,tribune,aquaplanes,conditioners,resurgences,hydrocarbon,hedonistic,shoelace,oversight,purge,Lithuania,provides,snipe,truant,quintets,litigation,prudential,hygienically,stimuli,single,oversells,armlet,polisher,jellyfish,stored,gill,carton,wilderness,scone,internists,multitudinous,Dawes,agronomy,voluntaries,limestone,slug,weapon,inform,pleat,boodle,twin,gazebos,repays,Shankara,washbowls,seismologists,grudge,wearier,basting,Buick,manageability,boatmen,castaway,Albuquerque,misapprehends,reserving,cannonade,rapine,castigation,Chattanooga,chained,Timur,whimsical,joyous,reassures,captions,Justinian,bloodstream,commissioner,rhombi,globally,immunity,Keven,Cinderella,swish,Scandinavians,weeknight,pricier,acacias,cochleae,fever,perfectionist,ascension,armrest,transshipped,hardiness,accusing,ramifying,blitz,Verde,uncle,ranting,lipstick,slur,illegitimate,tenderizing,Nikolayev,civilities,tared,caw,unhorsing,roadway,misrule,attitude,universality,simulcast,clubbing,Wac,cunninger,preexisting,afforests,awesome,substantiating,slacker,thumbtack,wheelwright,comparability,countenances,anemones,purism,bibles,Lowenbrau,threnodies,locutions,sweats,retold,devilry,rulers,conductors,Grey,Saskatchewan,albatross,alkaloids,species,drugstore,obliteration,grizzly,crazes,entertainers,maligns,championship,deconstructions,statesmanship,Austerlitz,pathogens,reverential,scampers,Friday,taffeta,recklessly,proclaimed,putative,fuddle,oxymoron,fueling,dither,undisciplined,corrosion,civilization,anticipations,engraving,carjacking,expostulated,rubier,sequel,inpatient,acrimony,ceases,heliport,Berliner,orangutangs,overhauls,pounded,hinds,Fronde,bedstead,demilitarizing,fretfulness,Yiddish,plutocracy,hobby,super,finitely,residual,Bastille,cactus,sax,tactlessly,dustier,Connolly,stargazers,nerdier,pups,heeling,goodliest,windpipe,Algiers,Poirot,pediment,Cindy,melodrama,colloquy,diarist,tequilas,Downy,naps,subservience,catalogue,filed,fatty,folklore,portioned,Hohokam,legging,Tallinn,deliriously,rightmost,motorbiked,miscarries,inner,outshined,metacarpi,agitating,interviewers,foes,Kalashnikov,sylph,bottles,Verizon,adverted,scrambled,woodwind,demoralized,beaux,insidiousness,enthronement,mismatched,improbability,vigilant,Antipas,Nankings,redraws,jauntiness,Len,synchronous,Emacs,moldering,Hovhaness,enclose,foreman,weirdness,dermatitis,admixtures,originality,Irtish,rosette,Mb,Barbra,stockrooms,confusingly,misrepresent’
CURL_CMD="curl --max-time 120 --connect-timeout 60 --request ‘POST’ https://stream.twitter.com/1.1/statuses/filter.json --data ‘track=$TERMS’"
let CONNECT_COUNT=0
let FAILURE_COUNT=0

for i in $(seq 1 100)
do
echo "Connecting attempt $i"
response=$($CURL_CMD 2>&1)
if [[ “$response” == *Unauthorized* ]]
then
echo "Connected"
let CONNECT_COUNT++
elif [[ “$response” == *Empty* ]]
then
echo "Timeout"
let FAILURE_COUNT++
else
echo "Unrecognised response"
echo $response
fi
sleep $SLEEP_TIME
echo "Failures: $FAILURE_COUNT. Connections: $CONNECT_COUNT"
done

echo “FINAL: Failures: $FAILURE_COUNT. Connections: $CONNECT_COUNT”


#3

Some more info. We do actually seem to be getting connected to stream.twitter.com properly before it drops the connection. It’s not like a routing issue is preventing us from contacting the streaming API at all. The curl output below shows a successful SSL handshake, followed by a delay, then a closed connection.

* About to connect() to stream.twitter.com port 443 (#0) * Trying 199.16.156.20... connected * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): * SSLv3, TLS handshake, Server hello (2): * SSLv3, TLS handshake, CERT (11): * SSLv3, TLS handshake, Server finished (14): * SSLv3, TLS handshake, Client key exchange (16): * SSLv3, TLS change cipher, Client hello (1): * SSLv3, TLS handshake, Finished (20): * SSLv3, TLS change cipher, Client hello (1): * SSLv3, TLS handshake, Finished (20): * SSL connection using RC4-SHA * Server certificate: * subject: C=US; ST=California; L=San Francisco; O=Twitter, Inc.; OU=Twitter Security; CN=stream.twitter.com * start date: 2013-06-28 00:00:00 GMT * expire date: 2013-12-31 23:59:59 GMT * subjectAltName: stream.twitter.com matched * issuer: C=US; O=VeriSign, Inc.; OU=VeriSign Trust Network; OU=Terms of use at https://www.verisign.com/rpa (c)10; CN=VeriSign Class 3 Secure Server CA - G3 * SSL certificate verify ok. > 'POST' /1.1/statuses/filter.json HTTP/1.1 > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: stream.twitter.com > Accept: */* > Content-Length: 3652 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > * Done waiting for 100-continue * SSLv3, TLS alert, Client hello (1): * Empty reply from server * Connection #0 to host stream.twitter.com left intact curl: (52) Empty reply from server * Closing connection #0 * SSLv3, TLS alert, Client hello (1):

#4

Another update. We’ve now run our connection testing from a variety of places including AWS North Virginia, AWS Singapore, AWS Dublin, and non-AWS hosts in London and Dublin.

We can confirm that we’re only seeing the connection problems from the AWS Dublin region. We’re getting connected to stream.twitter.com before the connection gets dropped, so it seems likely that we’re ending up routed to different stream hosts from AWS EU, or AWS are suddenly doing something very strange to HTTP connections originating in Dublin.

We’ve also tried routing our connections from our AWS EU hosts out through an SSH tunnel on a host in North Virginia, which works fine with no closed connections.


#5

Hi Smesh:

We’re continuing to investigate this issue. Can you please provide the following items of information:

  • Username under which you’re trying to connect (i.e., the username the app is associated with)
  • The app name
  • The IP address(es) of your servers trying to connect to the stream

Thank you very much.


#6

Thanks for looking into it Arthur.

As an example of an account which had problems I’ve done testing with this account - ‘SmeshKelsta’ using the app ‘ZeeboxDevKelsta’. Note however that the problems we’re seeing seem to be account/app agnostic - ie. we’re seeing the connection problems even with the script above which uses no credentials whatsoever, but getting closed connections rather than 401s.

Example IPs from AWS EU boxes which experience connection problems: 54.216.62.72, 54.216.61.111. But also any other box in the AWS EU region, including ones which I’ve brought up specifically for testing this problem. Unfortunately I’ve terminated those so no longer have the IPs available.

Example IP of an AWS North Virginia box from which connections work OK: 54.224.238.240.

Thanks again,
Alex


#7

Hi Arthur.

The connection problem we were experiencing seems to have gone away as of around 04.07GMT today. Do you have any idea what may have caused the problem or what changed to make it go away? Did anything change at the Twitter end this morning?

Also, as a workaround we set up an instance in the AWS North Virginia region running HAProxy, through which we could pass the requests from our EU streamers in order to mitigate the problem. Thankfully that’s not necessary now.

Running through the proxy worked technically, but I just wanted to check where we’d stand with that vs the Twitter TOS if we needed to do that again in a hurry. This is multiple streamers for a variety of different apps we run for different clients, on different accounts, but they’d all be passing through a single proxy instance and so appear to be coming from a single IP. Would that risk IP blacklisting?

Thanks for your help,
Alex


#8

Hi,

We are also facing similar problem, We were able to consume streaming data for months (1 month) without streams getting closed. Recently, for the past 2-3 weeks, the connection is getting closed intermittently after being run for few minutes to several hours and we have to restart on such occasions.

I am using twitter id: saptwit1

Is this something that Twitter can take care ?
Please let us know if you need any more details from our side.

Thanks,
Babu


#9

Babu:

Please see above for the kind of diagnostic information we require to effectively troubleshoot your issue. Also, please include:

  • the dates and times you’re seeing the problem
  • ping, traceroute, and MTR output while you’re experiencing the trouble: https://en.wikipedia.org/wiki/MTR_(software)
  • any output or diagnostic information you are able to share

Thanks!