Hello, I am a graduate student studying digital media using academic research API.
I was wondering whether I could search for a certain topic from specific multiple users.
I already have a list of users, and I would like to know whether they mentioned the topic.

Thank you in advance!

Yes, you can combine using operators: you can use keywords, or the context annotations for topics: Search Tweets - How to build a query | Docs | Twitter Developer Platform and GitHub - twitterdev/twitter-context-annotations: Flat files containing available context annotation entities. for the context annotations Overview | Docs | Twitter Developer Platform

Thank you Igor for your information!
I looked through the documents, but I could not understand how to use this function…

Specifically, I would like to examine how many times each politician’s friends mentions “MARTIAL SURNAME”
Accordingly, this is my data schema and what I am currently doing.

(1) Collect the friend list of each politician

from twarc.client import Twarc
from twarc_csv import CSVConverter
# Your keys here
t = Twarc(consumer_key=, consumer_secret=)

# Get a bunch of user handles you want to check:
list_of_names = test
user_objects = t.user_lookup(ids=list_of_names, id_type="user_id")

for user in user_objects:

    following = list(t.friend_ids(user['id'], max_pages=300))
    with open(f"{user['id']}_friends.txt", 'w') as outfile:
        outfile.write('\n'.join(following) + '\n')

(2) Collect the tweets of the friends

twarc timelines user['id']_friends.txt user['id']_friends_tweets.jsonl

(3) Search ‘martial surname’

friends_tweets = pd.read_csv(f'{user_id}_friends_tweets.csv')
surname_tweets = friends_tweets[following_tweets['text'].str.contains('martial surname')]

(4) Count the number of tweets mentioning ‘martial surname’, and save it to the politicians table.

politicians_sample['surname_friends_tweets_count'][politicians_sample['id'] == politician_id] = surname.count()

In this context, I came up with an idea about searching ‘martial surnames’ first, instead of collecting all the tweets. How to collect the tweets of each user’s friends mentioning ‘martial surnames’ using the function you mentioned?
Hope this is not confusing, thank you in advance! :slight_smile:

If all you need are counts, it’s easier to do with twarc2 counts commands: twarc.Client2 - twarc

For example:

twarc2 counts --archive "from:user surname" --granularity day --text

For one user.

But if you have a big set of users, it’s best to use the Python API and aggregate counts that way. The process is the same as for search, Examples of using twarc2 as a library - twarc but using the counts_all function, and the output will be counts, not tweets. If it helps, you can directly get a dataframe of counts if you also use twarc-csv dataframe converter: Examples of using twarc2 as a library - twarc it can process counts objects if you create it like converter = DataFrameConverter(input_data_type="counts") but you can also aggregate and count up by directly processing the counts results json.

Wow, that’s a useful function. Thank you very much for your information:)
I actually also need how many times each user in the lists tweeted about ‘surname’.
Is it possible?

Yes, this is defined by the query,

from:politician jones

This query in the counts command will count all the tweets sent by @politician that has jones in the text or in a url

1 Like

Thank you very much Igor! It actually works!
How can I store the count output in CSV in the format I attached?
Also, is it possible to count all the tweets at once? (not showing results by day or hour)

1 Like

To get counts as csv, specify

twarc2 counts --csv ...

See twarc2 counts --help for the full list of options

It’s not possible to just get a total in CSV format, it will always output whatever you set as the --granularity - so --granularity day is counts per day, --granularity hour is per hour. But if you specify --text instead of --csv the total is printed in the console.

I see! thank you Igor :slight_smile: Is it possible to store the id associated with the count in CSV?
I want to iterate over the process, so it’s better to have ids to identify whose count is.

I coded like this, but it might be too inefficient.

test = politicianIds[0:3]
for politician in test:
    subprocess.run(['twarc2', 'counts', '--archive', f'from:{politician} surname', '--start-time', '2022-08-01', '--end-time', '2022-08-02', '--granularity', 'day', '--csv', f'{politician}_surname_count.csv'])
    count_df = pd.read_csv(f'{politician}_surname_count.csv')
    count_df['id'] = politician
    count_df.to_csv(f'{politician}_surname_count.csv')```

import glob

# fetch all the csv in the path
csv_files = glob.glob('test/*.csv')

data_list = []

for file in csv_files:
    data_list.append(pd.read_csv(file))

df = pd.concat(data_list, axis=0, sort=True)

df.to_csv("test/total1.csv",index=False)