Poor quality of the "lang" parameter for Finnish language



I have a problem with the “lang” parameter for the Twitter premium API.
My goal was to download a list of Finnish tweets, so for this reason i specified lang=fi .
However, I ended up with more or less half tweets in other languages:

  • some in hungarian
  • some in estonian
  • some in english as well
  • some with mixed finnish and something else (which i can accept)

So, do you have any suggestion here on how to improve the quality of the results? Some parameter I can add to the query maybe?

Another question: is there a way to avoid very short tweets? like for example tweets with only one or two words. I could not find a parameter for the tweet size but maybe I am wrong here.

Thank you.


As far as i know the lang parameter is as good as it gets - you could do some of your own language detection and filtering but there’s no way to do this with the API.

Maybe adding GEO / Place filters could work too? Restricting your searches to Finland.

Depends on what you need the tweets for, but one way would be to pick a whitelist of users who you know are going to tweet in Finnish and crawl all their tweets for more data. All this is done on your end though.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.