Hi how can I gather retweets using jupyter notebook
I have a data in csv of tweets I want to gather the retweets of that data the data is huge and using the prompt like that will make me do one by one like that

so is there a way to use jupyter note book or .py python instead of using twarc on cmd
1 Like
Generally, you can run command line commands in jupyter notebook cells if you add a ! to them, like
!twarc retweets ... > output.json
But be aware that you may not get the same input or features as a terminal. You can also start a new terminal in jupyter lab if you want.
Alternatively, you can use twarc as a library: for v1.1 API twarc.Client - twarc and v2 API twarc.Client2 - twarc
Since you are using the v1.1 API call to get retweets remember that this is limited to the last 100 retweets only. There is no endpoint to get all of them.
In v2 API you could get all users who retweeted a tweet, but not the retweets themselves: Retweets | Twitter API | Docs | Twitter Developer Platform in twarc this is not yet implemented, however:
To get the retweets themselves, you need to get all retweets of a specific user, and then filter them for the specific ones you’re interested in, using a search with a retweets_of:user query. Twarc2 will soon have an easier way to do this: Retweets · Issue #403 · DocNow/twarc · GitHub but for now you’ll have to use the search method instead.
2 Likes
here is my code in jupyter notebook
is this is right and how to save it into JSON file so I can read it note that i is the id as I have taken the IDs from csv file and saved it into a list but the ids are strings should they be integer?
It is not possible to answer that given the context of the small snippet of code you have, but twarc v1.1 library expects an iterator for that function, so you could pass your entire list of tweets and it will get the latest 100 retweets for each, see twarc.Client - twarc (strings or integers should work)
To save the results as json, iterate over the results of that function and save as you would any other dictionary:
If s is a list of tweets, and you already have twarc as t
...
import json
for retweet in t.retweets(s)
with open("output.jsonl", "a") as f:
f.write(json.dumps(retweet)+"\n")
(Double check that though because I didn’t run that to test, in case I made a mistake)
Alternatively, put all the tweet IDs you’re interested in into a text file 1 ID per line and run this in the command line:
!twarc retweets ids.txt > retweets.jsonl
See twarc
2 Likes
Thanks a lot you’ve really saved my time
my last 3 questions are
1- can I save the ids into a list and then into text file and use cmd instead of jupyter?
2- how much retweets per 15 min do I have?
3- can I use hydrate to get only the retweet but without request limits?
Use twarc dehydrate if you have full json objects 1 per line and want to extract just the IDs. See twarc
Twarc retweets calls GET statuses/retweets/:id | Docs | Twitter Developer Platform user auth vs app auth depends on twarc configuration.
No, hydrate command calls GET statuses/lookup | Docs | Twitter Developer Platform so it can only get tweets for which you have exact IDs. Retweets are their own unique tweets.
The rate limits are in the docs, and user vs app auth depends on how you configured twarc with twarc configure command (which won’t work in jupyter because it lacks input for commands like that, you have to use a terminal / Powershell / whatever command line you have)
I have got the ids using python with and new line my I am using an academic version and it’s about 75 request per 15 min so it’s the only way I can get retweets only is that right ?
Yes. Academic access is for search only, and only for v2 API, you’re using v1.1 API, so the rate limits are as described in the docs.
2 Likes
Thanks so much I’ve learned so much from you wish you a happy weekend!
1 Like