Hi y’all,
I got an academic api so I could build a stimuli set for an emoji research project I’m working on. I need to collect 250 or so tweets that are at least 10 words long and include specified emoji. Bonus if I can also collect replies to those tweets to include in my study stimuli. The idea is that we show these tweets to people to test their reading speed in various contexts. We’re interested in conversations between people and how emoji influence emotional interpretation and speed.

A high quality stimulus might be: “Morning Sunshine! I hope you have a lovely Thursday also :grinning:

So, I’m using postman to pull data and can do basic searches in the languages I need, including with specified emoji to include/exclude. Seems pretty simple. I’d like to limit my search to tweets of a particular minimum length and of a minimum like_count (or perhaps replies?) to help get more relevant results. I can’t figure out how to include these qualifiers in the query - I can only view the like_count with the public_metrics once I run the search. I can filter and sort for the best examples once I run the query, but I’d like to improve the initial query as best I can!

Any other advice welcomed. Thank you!

1 Like

You can use twarc for this and specify emoji as you would ordinary keywords (just make sure your terminal is in utf8) twarc2 (en) - twarc

Postman is not suitable for data collection, it’s really only for making individual calls.

Minimum length and like count is not possible to specify at query time, but you can filter these yourself later. You may want to exclude retweets with and replies with -is:retweet -is:reply and then get the conversations late with twarc2 conversations command.

Thanks for your reply! I was having trouble with getting authentication to work using python scripts, which is why I’d switched to postman just to get it to work. I’ll try troubleshooting it further, though, based on your advice. I haven’t tried with twarc yet, so maybe the authentication handling won’t be so problematic?

Too bad length/like count isn’t possible - super helpful to know it’s not so I can stop trying to dig around for info. I can definitely filter as a second step.

Progress! I think the problem before with authenticating my tokens was with Powershell. I don’t have a solution within Powershell, but it does seem to work fine elsewhere.

Now, I’m trying to figure out how to limit my results for each search I need to conduct to say, 500. I got ~20,000 on my first attempt, even with a lot of narrowing in the query and even interrupting the search before it completed. When I looked at the twarc log, I saw that I had a rate limit error but I still got a successful output file. And, it let me search again, so I’m not sure why I got the error.

It looks like I should be able to specify a limit with max_results, but I know I’m doing it wrong. Can I put the max_results directly in the search or do I have to specify a function or…?

! twarc2 max_results=100 search --archive “:blush: AND search -terms” --start-time “2021-06-01” --end-time “2021-11-09” happy_results.jsonl

Thanks for any advice!

Edited to add: I think I’d also need to specify pages somehow because max_results may be per page?

The max_results param is per request, if you are getting 20,000 results you might be calling that function multiple times or in a loop.

Max_results: The maximum number of search results to be returned by a request. A number between 10 and the system limit (currently 100). By default, a request response will return 10 results.

1 Like

I don’t see how I would have accidentally called it multiple times with that one line of code. Could it be something with a module doing it? I had twarc, pandas, plotly, and json, all pretty straightforward, I’d think.
I was also running it via Jupyter Notebook.

That line I posted up there is pretty much what I was using before that was giving me rate limit issues. The only difference is that I hadn’t yet tried to add a max_results:

! twarc2 search --archive “:blush: AND search -terms” --start-time “2021-06-01” --end-time “2021-11-09” happy_results.jsonl

yes, to limit total results with twarc, set a --limit, like:

twarc2 search --archive --limit 100 "😊 -is:retweet thanks" --start-time "2021-06-01" --end-time "2021-11-09" happy_results.jsonl

Twarc will handle rate limits and pagination for you.

You can chacke for all the options with

twarc2 search --help

btw, “AND” is not a valid keyword, a space is an implicit AND so you don’t need to specify it, the example above will find original (not retweet, but may find quotes) tweets that have the “blushing” and “thanks” in the text.

1 Like

I saw the --limit parameter, but I think I was specifying it in the wrong place when I tried using it. Also, thanks for catching and correcting the AND!

I’m still getting a rate limit error with these corrections: 2021-11-17 14:48:16,793 WARNING rate limit exceeded: sleeping 150.2067415714264 secs

Any thoughts about what could be going wrong?

EDITED: Spoke to soon! Maybe I just needed to wait for a couple of minutes - re-ran the same line after restarting and it seems to be working. Thanks for all your help!

1 Like