Hi everyone, I’m working on my thesis project and I need some help with twarc2.
I’m using twarc2 as command line in the Anaconda Prompt and I have Academic Research access. I would like to use the command ‘counts’ to count all the tweets posted by a list of user ids. Anyone know if is it possible to do that? Thanks in advance

Yes, but to do this you may have to prepare an appropriate input file.

To get a count of all tweets posted by a set of users, define a list of user ids or usernames in a text file, 1 per line, and use

twarc2 users input.txt users.jsonl

The total tweet count for each user will be in public_metrics.tweet_count. You can get a CSV using:

twarc2 csv --input-data-type users users.jsonl users.csv

after you pip install --upgrade twarc-csv

Alternatively, you can define an intput.txt with queries, like:

from:igorbrigadir twitter
from:twitterdev twitter
from:twitterapi twitter

And use twarc2 searches but specifying counts only like this:

twarc2 searches --archive --counts-only --granularity day --start-time "2022-01-01" --end-time "2022-02-01" input.txt counts.csv

This will give you daily counts, for each query in users.txt and save it in a csv.

2 Likes

Thank you very much Igor, your help has been much appreciated and really helpful. I tried your suggestions and they work perfectly. I was wondering if it would be possibile to use ‘counts’ to count only the tweets of users but that match specific keywords in a query, in order to count only tweets that deal with a certain topic. I have tried to insert the keywords in the code line but it seems that ‘counts’ do not support this option, do you agree with that? If so, would be any other way to count tweets from specific users that match specific keywords listed in the query?

You have to use the twarc2 searches command like my other example:

In this example for from:igorbrigadir twitter you’ll get counts of any tweets that I’ve made that have “Twitter” in them. Each line in the input file is a separate search query.

And then you can count up the totals from the csv yourself.

1 Like

Thanks a lot, sorry I didn’t catch it at first from your example, I have put the keywords along with user ids in the input text file and now it performs greatly without any fault. Thanks again and have a nice day

1 Like

Sorry Igor, I tried to get the counts of the tweets using the following command that you suggested : twarc2 searches --archive --counts-only --granularity day --start-time “2020-04-01” --end-time “2020-12-31” author_ids_rand.txt countstweets_noquery.csv where author_ids_rand.txt is made of 1500 lines. On each line I have a query such that “from: 370639956 lang:it -is:retweet -is:reply -is:quote -is:verified”, thus I am fixing the user and I am excluding retweets, replies, quotes and verified accounts. The results are poor: the daily count is always zero. Do you have any clue of the reason why this is happening? Thank you in advance

It could be the case that there are simply no results.

Here it’s critical to be exact: if there is a space between from: and the user ID it will not work. It will also not work if there are quotes " surrounding the entire query, quotes are for phrases. Also, -is:verified is redundant or may break things if the user is verified and this may be causing 0 counts. I’d remove the verified users from the input list instead, -is:retweet will already exclude all the tweets from verified users.

I’d also maybe relax the restrictions a bit, I’d argue that quote retweets are the same as normal tweets, so should be included even if you’re excluding retweets. But this is just an opinion.

That’s all I can think of for now!