Hi, two questions:

  1. Is it possible to minimise the amount of tweets you get from bots when using the Twitter API?
  2. If not, how do you best detect them in your data?

On (2), so far I have found the users who have duplicated tweets which has been successful in finding a couple of bots.

Thanks!

Specifically, I have obtained data filtering by campaign hashtags for US presidential elections. I want to find tweets from the respective voter bases, but it seems that my sample mainly entails bots.

Firstly, many tweets come from the same users. If I remove duplicate user ids, I get roughly 50% of the sample. Then, many of the users receive a high score on Botometer (roughly 0.75 to 0.8).

What should I do?

1 Like

You an try some heuristics like excluding tweets posted from certain apps (filtering on your end for the source: field).

I wouldn’t rely entirely on botometer scores blindly, there are some methodologial flaws with doing that, here’s a good disussion & paper on this:

Hi Igor - thanks! Just wondering, do you know of any litterature that could help with which apps to exclude?

This will depend on what your study is and why you’re filtering things but i would generally filter anything that’s not the official twitter clients (with a few exceptions like popular social media management tools like mixpanel, etc.) There are bots that automate the mobile / web clients and would appear as though they’re tweeting from the website but those are few in my experience. I don’t have a comprehensive list unfortunately.

Thanks, Igor - much appreciated! Could you please refer me to any website/guide that may help as to how I implement this filter in the query? I have not found anything, unfortunately.

This would be something you would manually define in your own code, there’s no search operator for specifying it in v2 search unfortunately: Search Tweets - How to build a query | Docs | Twitter Developer Platform

Ah, okay. Will see if there exists some literature on this. Thanks again, Igor.

1 Like

In case it helps, you can take a look at this paper:

“SOCIAL MEDIA, SENTIMENT AND PUBLIC OPINIONS: EVIDENCE FROM #BREXIT AND #USELECTION” Gorodnichenko, Pham and Talavera

They have a relatively easy way of doing it and they also mention some literature that might come handy

1 Like