What are the difference between “Tweets” downloaded formats and its decode/encode via python?

python
tweepy
api

#1

Hello,

Using Python 2.7 & Tweepy Library

Main topic: Downloading tweets from Streaming API using Python.

I am confused about different formats of downloaded tweets from Streaming API, as formatting differentiate from one to another of the same tweet !!

Note: I am concerning only in Arabic Tweets.

1st format is:

{“created_at”:“Wed Feb 03 12:52:53 +0000 2016”,“id”:694866144142848001,“id_str”:“694866144142848001”,“text”:"\u06 ………

Used Code of 1st format:

import tweepy
import json
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

class StdOutListener(StreamListener):
  def on_data(self, data):
     print(data)
     file.write(data)

if __name__ == '__main__':
#OAuth process, using the keys and tokens
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    listener = StdOutListener()
    stream = Stream(auth, listener)
    stream.filter( track=[u'الى' , u'إلى' ,u'عشان',u'علشان',u'ماشى',u'ليه',u'ازاى'])

============================================

2nd format is:

{u’contributors’: None, u’truncated’: False, u’text’: u’\u0627’, u’is_quote_status’: False,….

Used Code of 2nd:

def on_data(self, data):
    print json.loads(data)

Note: Error when writing json.loads(data) in a file

=================================================

3rd format is:

{“contributors”: null, “truncated”: false, “text”: "RT @a_meles: @EHSANFAKEEH\n\u0627 ", “is_quote_status”: false, “in_reply_to_status_id”: null, “id”: 695174171903582208, “favorite_count”: 0, “source”: “http://twitter.com/download/android” rel=“nofollow”>Twitter f……

Used Code of 3rd format:

def on_data(self, data):
    x = json.loads(data)        
    print (json.dumps(x))

================================================

4th format is:

Status(contributors=None, truncated=False, text=u’@AlsaeedFajer \u0627\ ', is_quote_status=False, in_reply_to_status_id=None, id=694494200520413184L, favorite_count=0, _api=, author=User(follow_request_sent=None, profile_use_background_image=True, _json={u’follow_reques……….

Used code of 4th format: used on_status instead of on_data

def on_status(self, status):
            print status

============================================

Then, which is the familiar way to extract tweet text and write in a file without problems??

Thanks for your efforts,


#2

Ideally, you want to work with & store JSON objects as they are sent to you from Twitter:

1st format: on_data method of a stream listener receives all messages: see https://github.com/tweepy/tweepy/blob/master/docs/streaming_how_to.rst & https://dev.twitter.com/streaming/overview/messages-types This may or may not be what you want to store in a file for later.

2nd format: print json.loads(data) prints the string representation of the Python object you create with .loads (it’s usually a dict)

3rd format: result is almost the same as 1st - but you are Deserializing a json object from a string, then Serializing it straight away. Difference between 1st and 3rd: the order of fields like “created_at”, “id” sometimes changes depending on the library.

4th format is a string representation of a Status Python object: tweepy Status object is not JSON, but it has a _json property which contains the JSON response from twitter.

It should work if you use on_status() listener, storing the status._json in a file for later (not the status object itself)

Hope that helps!


#3

Thanks a lot for your reply, I really appreciate it,

Please, correct me if I am wrong,
JSON object is not a type of dictionary, and to extract data from it I must convert using .loads
i.e.: make another representations
from {“created_at”:"Wed Feb 03 [1st format] which I can’t use;
to
{u’contributors’: None, [2nd format] which I can use efficiently and it is a dictionary with kind of rearrangements with same information


#4

The thing to remember is that JSON is just the format - different languages will have different kinds of objects that let you interact with the data.

But yes: in python - if you’ve stored a bunch of tweets from the stream, the way to read them is with json.loads - eg:

...

data = ... # tweet json eg: {"created_at":"..", "text":"Bla #tweet", ...}

status = json.loads(data)
print('Tweet text: ' + status.text)