I used twarc2 to get raw tweet data. The code to input in terminal is as follows:
twarc2 search "keyword" --archive --start-time "2011-12-31T15:00:00" --end-time "2012-01-31T15:00:00" jst201201.json
There are about 70 fields that you can get with twarc2. Some of them are easy to understand, e.g., “text” is the text of the tweet, “created_at” is the date and time when the tweet was created, and “lang” is the language of the tweet. However, I have trouble with some others. For example:
(1) “in_reply_to_user_id”, if it is not blank, but the id of a Twitter user, does that mean the tweet I am dealing with is a reply to another tweet (the tweet with the user id in the column)?
(2) “referenced_tweets.replied_to.id”, what is this?
(3) “referenced_tweets.retweeted.id”, what is this?
(4) what is the difference between “author_id” and “author.id”?
And they are just part of them. I wonder how I can find detailed explanations of the field names. For the full list of the field names please see the lines 13-85 of this link.
Thank you for your time and help.
All of these are derived exactly from Twitter API v2 data dictionary | Docs | Twitter Developer Platform so the descriptions are there.
There are a few implementation quirks however, due to how the data is processed after being flattened from the original nested structure, so author_id is derived from tweet fields, but author.id is derived from the author object from expansions from the user objects. They are always the exact same but both are kept. It’s safe to remove one if you like.
Any field with a . period like that is a key from another object, so similarly, referenced_tweets.replied_to.id is the ID of the tweet that the current tweet is replying to, derived from the referenced tweets expansion, But in_reply_to_user_id is the ID of the user that the current tweet is replying to, derived from the tweet fields. And yes, you can use these referenced tweet values to figure out what kind of tweet it is, a quote, or reply, or a reply that quoted another tweet etc.
The __twarc fields are just some useful metadata we decided to put in, not from twitter, from the tool.
Hope that helps!
1 Like
That helps a lot!
Thank you for your detailed reply.
1 Like