Hello all,

I am a researcher who has gained approval for Academic Research Access to Twitter data. Now I am struggling with the process of retrieving tweets. I am looking to retrieve tweets with a specific hashtag within specific date ranges.

I have reviewed some of the documentation that Twitter provides, however, I do not have a coding background and much of the documentation remains unclear.

I appreciate any support you all are able to provide.

Kind regards,
Melissa

UPDATE:
I came across Tweet downloader. I believe that I can retrieve the data I need using this tool.

I am looking to retrieve Tweets that used the MeToo hashtag between October 15, 2017 and October 19, 2017. I want to retrieve Tweets in English and Tweets tweeted within the United States.

Does the following query look correct:
GET/2/tweets/search/all (from:Twitter API) has:MeToohashtags/language:English/country:United States

This is what we built twarc for, hopefully this gives you a good starting point:

The query is built from these operators: Search Tweets - How to build a query | Docs | Twitter Developer Platform and should be like:

lang:en place_country:US #metoo

And then the start and end dates are specified as other parameters, not inside the query. In twarc this would be:

twarc2 search --start-time "2017-10-15" --end-time "2017-10-19" --archive "lang:en place_country:US #metoo" results.jsonl

It’s worth checking counts too, to see a ballpark number of tweets per day for a query, using:

twarc2 counts --start-time "2017-10-15" --end-time "2017-10-19" --archive "lang:en place_country:US #metoo" --granularity day --text

Also remember that place_country significantly restricts results, as only geo tagged tweets are returned, of which there are very few to begin with.

eg: there are 9,278 tweets with geo info, but without place_country in the query

twarc2 counts --start-time "2017-10-15" --end-time "2017-10-19" --archive "lang:en #metoo" --granularity day --text

there are 675,759

But you may not want all retweet objects (tweets will still have counts of retweets), so:

twarc2 counts --start-time "2017-10-15" --end-time "2017-10-19" --archive "lang:en #metoo -is:retweet" --granularity day --text

gives you 215,222

which you can retrieve if you change counts to search and replace --granularity day --text with a name of an output file like results.jsonl

twarc2 search --start-time "2017-10-15" --end-time "2017-10-19" --archive "lang:en #metoo -is:retweet" results.jsonl

Thank you for your informative response, Igor!

I installed Python and then installed and configured twarc. However, I am receiving the following message when I try to search for tweets, “The filename, directory name, or volume label syntax is incorrect.” I am starting with the queries #metoo and lang:en place_country:US #metoo.

What’s the full command you’re running and in what terminal / OS?

This error is most commonly due to incorrect file names (containing unsupported characters) java - The filename, directory name or volume label syntax incorrect - Stack Overflow

But since : is in the query, i suspect the query was not surrounded in quotes or the quotes inside were not escaped:

twarc2 search "#metoo lang:en place_country:US" test.jsonl

Thank you for your prompt response.

The full command is C:\Users\Melissa\Desktop\twarc> #metoo
The operating system is Windows 10 Home.

I believe I resolved my issue. Now when I run the command I receive 24 tweets total. Do you receive the same results?

I am using the same query, however, I am receiving a slightly different number of tweets than you reported.

For example, I got 9,257 tweets instead of 9,278 tweets with geo info and 674,528 tweets instead of 675,759 tweets without geo info. What could be contributing to this discrepancy?

Do “–granularity day --text” queries include tweets and retweets? Does 674,528 or 675,759 refer to the number of unique tweets?

This is a normal discrepancy given this is 2 weeks ago.

Depends entirely on the query, the --granularity day --text options in a command are for modifying the output for the twarc2 counts command only, not for the query. This --granularity day outputs counts per day, and --text into the terminal.

Igor, thank you again for your help. Would the query below result in the output of only original tweets?

twarc2 counts --start-time “2017-10-15” --end-time “2017-10-19” --archive “lang:en #metoo -is:retweet -is:reply -is:quote” --granularity day --text

Yes, -is:retweet -is:reply -is:quote should do it in the query. But that twarc command will give you counts only, to turn it into a command that will save tweets:

twarc2 search --start-time "2017-10-15" --end-time "2017-10-19" --archive "lang:en #metoo -is:retweet -is:reply -is:quote" results.jsonl

Hi Melissa,

I am am in a similar position as you. I’m a grad student looking to access Twitter API for my dissertation, but have no coding experience and no one who can advise me on where to start. Would you mind sharing some of your experience and what you learned (e.g., what I should do or avoid doing to get started and which programs I’ll need to install?). Or if you found an easy to understand guide for academic researchers on how to approach this, please could you share? I have not applied for access yet, as I’m still not sure which CAQDAS program I’ll use or if I’ll have to do coding by hand (based on funds).

Thanks!
Cristina

1 Like

Thank you, Igor. I greatly appreciate your help!

Hello Cristina,

I am more than happy to share my experiences! Like you, I have no coding experience and no one at my institution to guide me. I applied for Twitter’s Academic Research Access. I found the process much easier compared to gaining IRB approval at my institution. I have been using twarc to gather data. As you can see, I am also not afraid of asking questions! This forum has been a invaluable resource! I would be completely lost without it. What kind of data are you interested in/ do you need for your research project?

Kind regards,
Melissa

1 Like