I’m going to get tweets which contain EUR/USD, however I see in the data that I get, most of the tweets are those contain EUR or @EUR…, I tried to put “EUR/USD” or even specifying “EUR/USD”, but nothing changed.
I know I can clean data later, however I do not want to extract millions of tweets that I do not need, as I have limited budget. Is there any way to specify exact term in tweets like EUR/USD?

Thanks

You can specify "EUR USD" (With " quotes) which will match EUR/USD but also EUR\USD and EUR-USD because special characters don’t get indexed at all, so you will have to retrieve more and then do your own filtering after.

What’s the full query you’re using and what code library is this being called from? I’d you’re getting odd results the reason is almost always the order of operations or quotes or something like that.

Thanks. It was huge help. I try to understand how the query syntax works. Is there any document on that rather than Search Tweets - How to build a query | Docs | Twitter Developer Platform?

another question, I’m getting user.fields in addition to the tweet.fields. I want to create a datframe in python where tweet, user_name are two columns, but I noticed length of “includes” (user.fields) in the json response is smaller than “data” (tweets fields).
why this happens?

thanks

No that’s the correct document for queries - what specific thing are you having trouble with?

As for includes, each unique object is only included once, and it is referenced multiple times by ID.

To get a dataframe with all the tweet metadata included and matched up, I recommend using twarc to retrieve, and twarc-csv to convert:

(twarc2 commands in the terminal will work with v2 API, and while it was designed to work with Academic Access, it will work with Essential and Elevated Access too, toy just won’t be able to use --archive command line switch)

thank you so much for the reply.

I want some keywords in the tweets body to exist, Assume X OR Y OR Z, but the tweets which contain Z are not all relevant. I get all of the tweets which contain Z , and save it as a pandas dataframe , and then I filter the tweets by just keeping those which contains one of the words e.g. a, b, and c. I do not have such problem with the tweets contain X or Y, so I was wondering how I can have the efficient query. Since there are many tweets which contains 2 or all 3 keywords X, Y, and Z, so I think we will waist resource if we consider separate queries.

Is not is in user.fields identical to author_id in tweet.fields? I thought it should be identical then I mapped them using it as key. I did a quick check and noticed that in my dataset, there are more distinct user ids than author ids. how this is possible?

another question, does that mean I can not use twarc2 for premium access getting historical tweets? if no, how about tweepy?