I am using R to fetch all comments in conversation with a specific twitter status id. The status ID has 1492 quote_tweets, and only 856 commented tweets are returning. I am on an academic API so should have access to the full archive. Any thoughts?

This is the tweet (status ID: 789903797803970560) I am interested in https://twitter.com/RussianEmbassy/status/789903797803970560. I am using the academictwitteR package in R; code below.

get_all_tweets(start_tweets = tweet_metrics$tweet_time,
end_tweets = current_time,
bearer_token = get_bearer(),
n = tweet_metrics$quote_count,
conversation_id = tweet_status_id)

The tweet_time, current_time, quote_count, and tweet_status_id change accordingly depending on the the tweet of interest. For this one, it is as follows: “2016-10-22T18:58:35Z”, “2021-07-09T22:36:59Z”, “1492”, and “789903797803970560”, respectively.

One thing I notice is that when I make n = 1492, I actually only get about 500 comments. When I enter a much larger number, say 50000 as the n max, then I get 856.

How do I get the whole record? Thanks!

1 Like

By “comments” do you mean Quote Retweets? or Replies?

As for missing data - there may be multiple reasons why you may not get all the tweets, some users are private and their quote tweets will add to the count but will not be retrievable.

I think n in get_all_tweets is more like a limit not a target.

I don’t know R well enough to dig into it, but I can get the correct data using twarc:

twarc2 search --archive --start-time "2016-10-22" --end-time "2021-07-09" "conversation_id:789903797803970560 OR (url:789903797803970560 is:quote)" 789903797803970560.jsonl

This will get you all replies and all quote retweets. conversation_id alone won’t give you quote retweets, so you have to use (url:789903797803970560 is:quote)

You could probably modify the R command to use the conversation_id:789903797803970560 OR (url:789903797803970560 is:quote) query too.

1 Like