Multilingual support in twitter API


#1

Hi All,

I am an ETL (Extract, Transform and Load) developer and working on an use case which extracts data from twitter API and loads it into a database. For this, I am using Pentaho Data Integration ETL tool. By referring the link - https://anotherreeshu.wordpress.com/2015/09/06/stream-data-from-twitter-api-with-oauth-using-kettle/ I can successfully retrieve data from twitter.

Now my problem is some of the tweets are in non-english language (eg Hindi, French, Spanish etc). The resulting json file that is downloaded from twitter contains this non-english language data in the following format - “name”:"\u0938\u0941\u0927\u0947\u0936 \u092c".
When it is loaded in MS SQL server it is loaded as ‘???’.

Can someone please provide any insight on how to convert this data from twitter API in english language?

Thanks,
Sanket Kelkar.


#2

Those are UTF-16 Unicode characters. You’ll need to convert them to whatever SQL Server can understand.