I was trying to pull data using search query of twarc2 but it showed me an error stating that “:zap: There were errors processing your request: Reference to invalid operator ‘place_country’. Operator is not available in current product or product packaging. Please refer to complete available operator list at https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/operators-by-product. (at position 27)”.

Please note that I am having elevated access. However, the same functionality of place_country is working Ok when I am using search query in rtweet.

Unfortunately this operator is only available in v2 Academic Access - I had a look and I think rtweet calls a different API entirely, the v1.1 Premium API Search API | Twitter API | Docs | Twitter Developer Platform while Academic Access is the v2 API, which requires different libraries in R - like https://github.com/cjbarrie/academictwitteR

For v2 API details, it should be Search Tweets - How to build a query | Docs | Twitter Developer Platform (Any operators with “Advanced” require using a bearer token from an app attached to an Academic Access Project, not an elevated one)

In twarc, the equivalent is in twarc commands, for v1.1: twarc1 (en) - twarc

1 Like

Thank you Igor for correcting me. I used Twarc v1.1 and yes it is giving me the results but the csv file it is generating have the readability issue. Something like below screenshot, if this is so then how to make it readable?

Thank You!
Vipin

1 Like

The results are in JSON, so they have to be parsed specifically in mind with twitter data for them to be readable in CSV like this. Unfortunately there’s no quick and easy way to do this, but something that comes close, would be to grab the IDs, and the re-download those in v2 format so that twarc2 csv can run and process them:

It’s easier to “dehydrate” v1.1 Tweets with twarc dehydrate and hydrate them again with v2 with twarc2 hydrate.

twarc dehydrate tweets.jsonl > tweets_ids.txt
twarc2 hydrate tweets_ids.txt tweets_v2.jsonl
twarc2 csv tweets_v2.jsonl tweets_v2.csv

Also i highly recommend not opening twitter data in excel, as it has problems with 64bit integers, and corrupts tweet ids and user ids. Google sheets tends to work slightly better. If you must open in excel, always import the file manually and set the column type as “Text” to fix the ID issue. Otherwise all of your IDs will be garbled.

1 Like

Thank you again Igor for giving the solution to deal with readability issue of csv. Indeed csv is having problem to read twitter data specially tweet_id, user_id and for that I used prepend_ids while saving them in R. Also the richness of twitter text like emoticons shows alien signs after opening in csv which further can’t operationalized.

I will work upon the provided solution!

Thank you,
Vipin

1 Like

Yeah, tweets are provided as UTF-8 encoded json, a thing to be careful of, is when using twarc commands in Windows especially, if you use > output redirection, you can end up with UTF-16 encoded text! Because that’s the default windows command prompt encoding. This ruins emojis and some languages in tweet text. It’s possible to fix this later with Home - ftfy: fixes text for you but it’s better to avoid it by always explicitly setting everything to UTF-8 - the terminal, the files when opening and saving etc.

I think R may have a similar issue? But i’m not sure - i know R is definitely capable of setting file encodings when reading & writing but i’m not sure of the defaults.

twarc2 explicitly writes tweet data in UTF-8 when you’re not using > in the command (output redirection) so dehydrating / hydrating with twarc2 may also fix the garbled text issues too.

So,

twarc2 hydrate tweets_ids.txt tweets_v2.jsonl

This will always write UTF-8 text, but this may write it as something else if your terminal / command prompt is configured differently:

twarc2 hydrate tweets_ids.txt > tweets_v2.jsonl
1 Like