Get Statuses - Sample stops after 40 minutes


#1

Whenever I try downloading twitter data using the Get Statuses/Sample API, the download stops after 40 minutes. It’s nothing to do with the hard disk space or internal memory. I have tried this at least 5-6 times and this has always been the case. Could anyone let me know if I’m doing something wrong or if that’s an API limitation?


#2

In most cases this is because you’re falling behind in consuming the stream. Make sure that you’re not trying to convert the JSON you’re receiving into objects/data structures on the fly – you should do that in another process.

And check out the stall_warnings=true parameter option, which will send your app a message when it’s detected that you’re falling behind in consuming the stream in real time.


#3

@episod: Thanks for replying.
I don’t convert anything - I just save everything to a json file. I use the curl command which saves it to json. So should I cache the stream instead of saving everything immediately?


#4

@episod: This is what I do: curl https://stream.twitter.com/1/statuses/sample.json -uUSERNAME:PASSWORD > FILENAME.json --verbose
It usually disconnects after around 45 minutes with this message:

  • Connection #0 to host stream.twitter.com left intact
  • Closing connection #0
  • SSLv3, TLS alert, Client hello (1):
    } [data not shown]

#5

This is what I do: curl https://stream.twitter.com/1/statuses/sample.json -uUSERNAME:PASSWORD > FILENAME.json --verbose

I get this message after like 40-45 minutes.

Connection #0 to host stream.twitter.com left intact

  • Closing connection #0
  • SSLv3, TLS alert, Client hello (1):
    } [data not shown]

Do you have any suggestions on what can be done?


#6

Start using the stall_warnings=true parameter to get an idea if you’re stalling out. You could be stalling out also due to network congestion. The Streaming API does get restarted when upgrades happen as well – it’s not really possible to build a streaming client using the command line alone that will follow all the reconnection rules and suggested retry behavior.


#7

OK, I get it. I wrote a python script using tweepy with some exception handling to get the data and it’s running strong after 5 hours. Thanks!