Streaming API times out and I'm new to the API!


#1

Just looking for a simple set of tweets from one user from past year without having to use REST API. The Streaming API times out when I use this: https://stream.twitter.com/1.1/statuses/filter.json?follow=2233154425

The REST API is not an option. I tried it and spent hours parsing 800 rows (from 4 GET requests with the max of 200 tweets each). The REST API returns inconsistent fields, since the fields have apparently changed over a 12 month period. I can’t simply line up the columns in a table and get a good alignment of the fields. And the API returns 500+ stinking columns! That’s the abridged version of my pain in dealing with the REST API. I am a non-coder, so I can’t go any other approach other than API to JSON to CSV (it’s an automated workflow).

I’d like the streaming API to work as advertised and let me do a simple visualization with the data! Can Twitter come to rescue? thx in advance.


#2

An hour later, I tried again and got this message, so apparently I’m spamming the API? Really?

HTTP/1.1 420 Enhance Your Calm


#3

9 a.m. EST. Tried again. This is error I’m getting:

HTTP/1.1 504 Gateway Time-out
ETag:
"586e8258-158"
Date:
Thu, 06 Jul 2017 13:08:11 GMT
Content-Length:
344
Connection:
keep-alive
Content-Type:
text/html
Server:
Apigee Router

<!DOCTYPE html>
<html>
<head>
<title>Error</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>An error occurred.</h1>
<p>Sorry, the page you are looking for is currently unavailable.<br/>
Please try again later.</p>
</body>
</html>

#4

Are you trying to use the Apigee API Console, or directly use the API? Unfortunately the API console is unmaintained by Apigee at this time. For the streaming API, you need to have a persistent connection so a request/response format such as the one provided in the console probably is not ideal.


#5

Hey Andy - thanks for the info. I am using the Apigee API Console. Glad I wasn’t totally lost on this issue. Apigee was the only place I found to access the API. Would you mind pointing me to the primary approach for streaming APIs? thx.

Softball question: For a persistent connection, is it just a matter of leaving my laptop on and connected to the network?
thx.


#6

Hey Josh, appreciate that things can be confusing here, especially if you’re new to it / not necessarily a coder and just want to do some analysis.

To clarify a couple of points - the REST API will enable you to query back over a seven day period (search) or up to 3200 Tweets (timeline). Streaming is an “always on” delivery mechanism that delivers up to 1% of the Twitter firehose in realtime, i.e. you’ll get future Tweets, but you can’t look backwards. Using the filter option you’ve specified, you’d get all the Tweets from the user ID you are following as they are posted.

Another thing I’m not sure about is your reference to 500+ columns - I think a Tweet object should be substantially smaller than that in terms of properties, and it hasn’t changed a lot over the past 12 months (the exception being the new extended format Tweets which enable the full 140 characters to be used for text). Saying that, there will be Tweets with different values - if a Tweet has media attached then the extended_entities object will be included.

On the question of how to connect to the streaming API, realistically you need to write a bit of code to sit and listen to it (and yes, that could be your laptop being open and listening / running the code). There are a large number of client libraries available to do this. You might find it straightforward to write something in Python, for example, depending on your experience (tweepy has a simple interface) - or, using Node.js and the twit module, something simple like this:

 var Twit = require('twit')

 // get keys and tokens from apps.twitter.com
 var T = new Twit({
   consumer_key:         'xxxx',
   consumer_secret:      'xxxx',
   access_token:         'xxxx',
   access_token_secret:  'xxx',
   timeout_ms:           60*1000,  // optional HTTP request timeout to apply to all requests.
 })

 // define the ID of the user we are interested in
 var userID = '786491';

 // open a stream following events from that user ID
 var stream = T.stream('statuses/filter', { follow: ( userID ) });

         stream.on('tweet', function (tweet) {
                 // compare the user ID inside the Tweet object we passed in
                 // to check it matches
                 if (tweet.user.id == userID) {
                         console.log("this was sent by the user we want to track")
                         // now do something else like save to a file
                 } else {
                         console.log(tweet.user.id + " - " + tweet.user.screen_name)
                         // so we can ignore it
                 }
         });

#7

Hey Andy - thanks for the hand holding (I need it at this point). I didn’t know streaming was only one way (and not historical) but that makes sense.

The 500+ columns I referenced are what I get when I take the straight Apigee interface for the REST API (Search), use that JSON output and ingest it into the json-csv.com tool to get my table. That table is for my vis work. So without any tinkering, I get a bloated table with everything related to a tweet. Here’s an example table from the output: https://drive.google.com/open?id=1hhj0dCK6JPq5P5Wpk8KY1BbEIVZbTnQ9ae5w3Sft9uw

So basically, I tried to use the REST API with the max 200 tweet per request limit. When I tried to collate the different sheets, they didn’t align. Some of the missing fields that I was interested in (and they were missing from the older tweets) were (e.g.):

retweeted_status__quoted_status__text
quoted_status__retweet_count

So, that’s my workflow in short. I’m sure there’s plenty of pitfalls doing it this automated way.

Question: Is the Apigee Console the only REST API tool, or do you recommend client libraries for REST as well?

thx for the leads on the Python library.

Josh

P.S. Here’s some of the fun viz work I’m doing with Twitter data!


#8

At this stage I’m starting to recommend using Postman as a good console-style way to exercise the API. The configuration you’ll need to do is (still) to create an app via apps.twitter.com to get tokens for the OAuth header which you can plug into Postman. I’ve been meaning to do a tutorial / create a video on this for some time, but confess I’ve not had a chance yet… an alternative command-line way is to use our Ruby-based twurl tool, and we do have some tutorials on that - aimed at the Ads API, but equally relevant for the standard API.

The page I linked to for client libraries generally cover access to the API in both REST (request/response) and Streaming (realtime) formats, so you can build what you want in the language of your preference there.

The comment you make about additional fields does make sense - I see what you mean about translating variable sized JSON objects into CSV / tables, which would expect data to be in the same place in every object.

Thanks for sharing the link - good luck with the project!


#9

So much good information! Thx again for the primer Andy! I’m motivated to cut my teeth on these tools and see if I can get a solution for the viz analytics. I appreciate your time and will let you know when I bump my head on another issue:)

Josh