Hi, how are you all doing?

Recently, i have been developing a twitter collector using Python’s library “TwitterAPI”, and i’m trying to retrieve a set of tweets using GET search/recent, downloading the media(if available) in the process.

Unfortunately, the response seems truncated, and it does not contains the “includes” field, where the information about the media should be…

Here is the part of the code which is making the call

EXPANSIONS = 'attachments.media_keys'
TWEET_FIELDS='attachments,author_id,created_at,entities,geo,id,in_reply_to_user_id,public_metrics,text,referenced_tweets'
MEDIA_FIELDS = 'url,type'

    r = api.request('tweets/search/recent', 
      {
        'query': keyword_variable,
        'tweet.fields':TWEET_FIELDS,
        'expansions':EXPANSIONS, 
        'media.fields':MEDIA_FIELDS,
        'max_results':100
      })
   c = r. get_iterator()
   while True:
          try: 
             status = c.__next__()
 ....      

And here is one example of the response, represented by the “status” variable:

{
     "created_at": "2021-05-21T16:31:02.000Z",
     "entities": {
          "urls": [
               {
                    "start": 122,
                    "end": 145,
                    "url": "https://t.co/2BZKsiZ7T9",
                    "expanded_url": "http://jethrotulllyricbook.com",
                    "display_url": "jethrotulllyricbook.com"
               },
               {
                    "start": 146,
                    "end": 169,
                    "url": "https://t.co/HyKMsdcs0q",
                    "expanded_url": "https://twitter.com/jethrotull/status/1395779177085800449/video/1",
                    "display_url": "pic.twitter.com/HyKMsdcs0q"
               }
          ],
          "annotations": [
               {
                    "start": 0,
                    "end": 2,
                    "probability": 0.7675,
                    "type": "Person",
                    "normalized_text": "Ian"
               },
               {
                    "start": 32,
                    "end": 62,
                    "probability": 0.3375,
                    "type": "Product",
                    "normalized_text": "Jethro Tull Silent Singing book"
               }
          ]
     },
     "text": "Ian's signing pre-orders of the Jethro Tull Silent Singing book, available in classic, signature and ultimate editions at https://t.co/2BZKsiZ7T9 https://t.co/HyKMsdcs0q",
     "id": "1395779177085800449",
     "public_metrics": {
          "retweet_count": 28,
          "reply_count": 11,
          "like_count": 162,
          "quote_count": 3
     },
     "attachments": {
          "media_keys": [
               "7_1395779078230319108"
          ]
     },
     "author_id": "185347378"
}

If i call “r.json()[‘includes’]”, it will return all of the includes, and i could link the original tweet to an include with the media_key. But i want to access the includes category without the need to do such workaround… Is there possible?

I would appreciate any help.
Thanks in advance!

Found it!
Must use hydrate. Thanks all!

2 Likes

Would you please explain how to “use hydrate” in more detail?
I just met the same problem.
Thanks in advance!

This example is the one for TwitterAPI library.

Twarc also does this, with flatten command: twarc2 (en) - twarc

Thanks for your very quick reply.
I’ll give them a try.
By the way, I am now using the ‘searchtweets’ library, do you have any clue about how to solve this problem with this library? (Its documentation says it ‘supports the configuration of v2 expansions and fields.’)
I am using Full-archive with an Academic Research project.
I’ve added the ‘expansions’ parameter and related fields in my request, but I failed to get data of user.fields and place.fields. I’ve tried to print parts of results for check, but these fields are even not shown in the output.
Thanks!

Make sure that you are definitely using searchtweets-v2 · PyPI and not searchtweets · PyPI

pip uninstall searchtweets
pip install searchtweets-v2

to make sure.

What’s the exact code / command you’re running? The expansions are found in the includes part of the data, not in data part of the response. But you can change the output by specifying --output-format "a" for example.

1 Like

Thank you Igor!
I’ve checked I installed the v2 version.
My code is shown below:
Here is the part of the code which is making the call

search_args = load_credentials("~/.twitter_keys.yaml",
                                       yaml_key="search_tweets_v2",
                                       env_overwrite=True)

query = '(#ClimateChange #GlobalWarming OR climate OR change) lang:en'
start_time = '2021-05-01T00:00:00Z'
end_time = '2021-05-01T00:59:59Z'
max_results = '10'
expansions = 'geo.place_id,author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id'
tweetFields = 'author_id,public_metrics,conversation_id,created_at,in_reply_to_user_id,geo,possibly_sensitive,referenced_tweets'
placeFields = 'contained_within,country,country_code,full_name,geo,id,name,place_type'
userFields = 'created_at,id,location,name,protected,public_metrics,url,username,verified,withheld'
query_params = {'query':query ,'start_time':start_time,
 'end_time':end_time, 'max_results':max_results,'expansions' : expansions,
 'tweet.fields': tweetFields,'user.fields': userFields,
 'place.fields': placeFields}
 
rs = ResultStream(request_parameters=query_params,
                    max_tweets=30,**search_args) #maxtweets = 
print(rs)

tweets = rs.stream()
tweetList = list(tweets)
for tweet in tweetList[:10]:
    print(tweet)

And a part of printed results is like below:

Request payload: {'query': '(#ClimateChange #GlobalWarming OR climate OR change) lang:en', 'start_time': '2021-05-01T00:00:00Z', 'end_time': '2021-05-01T00:59:59Z', 'max_results': '10', 'expansions': 'geo.place_id,author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id', 'tweet.fields': 'author_id,public_metrics,conversation_id,created_at,in_reply_to_user_id,geo,possibly_sensitive,referenced_tweets', 'user.fields': 'created_at,id,location,name,protected,public_metrics,url,username,verified,withheld', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'next_token': 'b26v89c19zqg8o3fostu5ipf2sn3wopnehwpuzsl8h5rx'}
Rate limit hit... Will retry...
Will retry in 4 seconds...
{'possibly_sensitive': False, 'referenced_tweets': [{'type': 'retweeted', 'id': '1388123504151875584'}], 'author_id': '1375413086631337985', 'public_metrics': {'retweet_count': 2907, 'reply_count': 0, 'like_count': 0, 'quote_count': 0}, 'conversation_id': '1388297109376356353', 'created_at': '2021-05-01T00:59:58.000Z', 'text': 'RT @taekookfolder: Some things never change 😭 https://t.co/kY7DMWo3KW', 'id': '1388297109376356353'}
{'possibly_sensitive': False, 'referenced_tweets': [{'type': 'replied_to', 'id': '1387860571375218692'}], 'author_id': '1149876602085527553', 'public_metrics': {'retweet_count': 0, 'reply_count': 0, 'like_count': 0, 'quote_count': 0}, 'conversation_id': '1387860571375218692', 'in_reply_to_user_id': '1912936586', 'created_at': '2021-05-01T00:59:58.000Z', 'text': "@curi0usJack @sickcodes Doesn't The FSF have a better source repo? It would just be faster to cut their incompetence out than wait for them to change. Just remember to bring a towel.", 'id': '1388297109267419139'}
{'possibly_sensitive': False, 'referenced_tweets': [{'type': 'retweeted', 'id': '1388295860899958785'}], 'author_id': '1143667146188345344', 'public_metrics': {'retweet_count': 4159, 'reply_count': 0, 'like_count': 0, 'quote_count': 0}, 'conversation_id': '1388297108889944067', 'created_at': '2021-05-01T00:59:58.000Z', 'text': 'RT @POTUS: It’s been a big month, folks. We hit our goal of 200 million shots, announced the end of America’s longest war, hosted a global…', 'id': '1388297108889944067'}

I am sorry for my late reply.

Thanks, it appears you’re using an older (it may be the latest on pypi but it’s still out of date) version.

Once you install the latest one:

pip install --upgrade https://github.com/twitterdev/search-tweets-python/archive/v2.zip

You can add output_format="a" to ResultStream to get individual tweets with expansions included inline, or output_format="r" to get full original response objects where you’ll find expansions in includes

1 Like

Thanks Igor!
I upgrade the searchtweets v2 and it works!
But I still have problems converting the response to a csv file.
I use pandas to do this then the user fields per call show in a separate line, not inline.


I’ve tried pandas.json_nomalize() you mentioned in another answer and it still didn’t work.
Maybe I should try it with twarc2.

These are generally parsing problems, i would suggest using twarc-csv for this.

pip install twarc-csv

and to convert:

twarc2 csv input.json output.csv