Poor quality of the "lang" parameter for Finnish language



I have a problem with the “lang” parameter for the Twitter premium API.
My goal was to download a list of Finnish tweets, so for this reason i specified lang=fi .
However, I ended up with more or less half tweets in other languages:

  • some in hungarian
  • some in estonian
  • some in english as well
  • some with mixed finnish and something else (which i can accept)

So, do you have any suggestion here on how to improve the quality of the results? Some parameter I can add to the query maybe?

Another question: is there a way to avoid very short tweets? like for example tweets with only one or two words. I could not find a parameter for the tweet size but maybe I am wrong here.

Thank you.


As far as i know the lang parameter is as good as it gets - you could do some of your own language detection and filtering but there’s no way to do this with the API.

Maybe adding GEO / Place filters could work too? Restricting your searches to Finland.

Depends on what you need the tweets for, but one way would be to pick a whitelist of users who you know are going to tweet in Finnish and crawl all their tweets for more data. All this is done on your end though.


