Hi everybody,
I am using the academic access and trying to understand different conversations on Twitter. Each time I try to download the dialogue I don’t get any result. I checked the rate limits and the usual problems. What I wonder most about is that there should not be any empty results as every query should at least return the tweet I extracted the conversation_id from… any ideas or suggestions?

Best, Fabienne

How exactly are you making the calls? I think for conversation id queries you need to explicitly specify a start_time

Have you tried twarc? You can put a bunch of conversation IDs into a text file and it will process them all appropriately: twarc2 (en) - twarc

with open(output_file, “a”, encoding = “utf8”) as f:

for conversation_id in conversation_ids:
    
    search_parameters = {
        'query': f"conversation_id:{conversation_id}", # up to 1024 characters
        "tweet.fields": "attachments,author_id,conversation_id,created_at,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,source,withheld",
        # ommitted fields: context_annotations,entities,non_public_metrics,possibly_sensitive, organic_metrics, promoted_metrics
        "user.fields": "id,name,username,created_at,description,location,public_metrics,verified",
        # ommitted user fields: entities,pinned_tweet_id,profile_image_url,protected,url,withheld
        "media.fields": "public_metrics",
        "place.fields": "country",
        "max_results": "100"
        }

    search_connect(bearer_token, f, search_parameters)

Thanks for the answer, I will definitely try twarc as well and add a time limit… but what really concerns me is that it is properly working for 30% of the tweets, but I have no idea, why the remaining 30% don’t work.

1 Like

Hi, I used now twarc and twurl, but I still just get around 60-80% of the conversations. How long are the conversations in general available? 7 days, 30 days? And is there any documentation where is listed which reasons there are for conversations not being available anymore?

How are you loading the IDs? If Excel or Javascript was involved at some point, they may be corrupted - or, you are using recent search and these results are older than 7 days - so you won’t get them.

What’s the exact twarc command you’re running and do you have a sample of IDs that down’t work?

Okay, I am collecting the ids via:

with open(output_file, “a”, encoding = “utf8”) as f:

search_parameters = {
        'query': f"#trump OR #notrump OR #biden OR #nobiden OR #stopbiden", # up t0 1024 characters
        'expansions': "attachments.media_keys,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id",
        "tweet.fields": "attachments,author_id,conversation_id,created_at,geo,in_reply_to_user_id,lang,public_metrics,referenced_tweets,source,withheld",
        # ommitted fields: context_annotations,entities,non_public_metrics,possibly_sensitive, organic_metrics, promoted_metrics
        "user.fields": "id,name,username,created_at,description,location,public_metrics,verified",
        # ommitted user fields: entities,pinned_tweet_id,profile_image_url,protected,url,withheld
        "media.fields": "public_metrics",
        "place.fields": "country",
        "max_results": "100",
        "start_time": "2021-11-01T01:30:00.000Z",
        "end_time": "2021-11-01T01:32:00.000Z"
        }

search_connect(bearer_token, f, search_parameters)

and then I just sort out the conversation ids…

Then I put them into the promt:

twarc2 conversation 1454969101047246857

A sample not working are e.g.: “1454969101047246857”, “1454969096995577861”, “1454969038157881348”

Ah ok - the reason is that these are older than 7 days, and are also retweets - retweets are their own unique tweets, so you need to get the retweeted tweet ID to get the conversation.

But you can’t retrieve anything older than 7 days with the recent search. You can only do it if you have academic access, in this case, try adding --archive to the twarc command.

Also, using twarc you can give it a file with tweets, so you don’t need to manually sort out conversations, eg:

twarc2 search --archive --start-time "2021-11-01T01:30:00" --end-time "2021-11-01T01:32:00" "#trump OR #notrump OR #biden OR #nobiden OR #stopbiden" results.jsonl

and then collect all conversations from results.jsonl with:

twarc2 conversations --archive results.jsonl conversations.jsonl
3 Likes

Thank you! I think the retweet id in combination with the --archive should solve my problem… still one question regarding the retweets. Even they are retweeted, the API sends back a conversation_id. As far as I understood this should be the id of the original conversation they stem from, correct? So if I try these ids with the archive search, shouldn’t they work out as well?

Hi thank you for the answer above, I was able to get all the conversations I needed to get. But I have another question: Is there any documentation where the structure and parameters of the retrieved conversation thread is explained?

Sure, the tweet data elements are detailed in Twitter API v2 data dictionary | Docs | Twitter Developer Platform for all the twitter objects.

Also, you can try twarc-csv for turning the json into a CSV you can examine (but be aware that opening the file in Excel will not work - excel fails to parse twitter data correctly, especially IDs, but you can use google sheets instead.)

1 Like