Language filter and Language fields/information on Historical Powertrack API

powertrack

#1

Hi all,

Hope you’re doing well :slight_smile: .
I’m working with the Historical Powertrack 2.0 API.
I’m not sure about a few things when it comes to gathering data using the language filter and what information is received back.

  1. When I want to gather data by language which filter should I use with my API?

This API reference says it’s “lang”.
http://support.gnip.com/sources/twitter/powertrack_operators.html

However, here it says “lang”, “twitter_lang” and “bio_lang”.
http://support.gnip.com/sources/twitter/powertrack_operators.html

I’m getting errors when trying to use “twitter_lang” and “bio_lang”.
Though the reason why I need to be sure when using “lang” operator is that there is difference in language codes, e.g. Indonesia is called with “in” on one operator, and on the other it is called “id”. I wouldn’t like to gather the wrong data set.

  1. If I gather the data set without applying the language filter, will I have the language information gathered with the post data?

Since, the data set is not large, I’m thinking it might be better to harvest the data set without the language filter and filter out languages that I need. Is this information gathered for each post if it is available?

Best regards


#2

Hi @je_nicolo - that is old documentation. Please use: https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/premium-operators

For Indonesian - in

Language is determined per Tweet unless it is undefined - und