When I tried to fetch my post using https://api.twitter.com/1.1/statuses/user_timeline.json at https://dev.twitter.com/console, it retrieves like :
“text”: “아무리 오픈 소스래지만…”
If there is English text with it, it is like
“text”: “Hello, 아무리 오픈 소스래지만…”
However, when I tried using open source library like twitcurl, what it retrieved is :
“text”: “Hello, \uc544\ubb34\ub9ac \uc624\ud508 \uc18c\uc2a4\ub798\uc9c0\ub9cc…”
so, it only escape Korean text in \u escaped text. ( the code point of 아 is \uC544 and its actual binary representation is 0xEC 0x95 0x84 )
If it’s normal to have \u escaped text, it’s strange that it doesn’t do the same for the “Hello,” part. ( In know that UTF-8 has same code values for ASCII area, but… )
I tried to click the “Snapshot” button and there I can see it raw text, but it doesn’t contain the \u escaped text. It contains real Korean text.
However, by opening “Web Console” of Firefox browser and inspecting the json response, I could see \u escaped text.
So, should we, an applicaton developer, convert those \u escaped text before displaying? ( I use C++. I noticed that there are some json libraries for PHP/Python, which decodes the \u escaped text correct to display such unicode while those English portion is not changed.
(It’s kind of weird only some portion of text is escaped, still. )
I’m trying to build json libraries for C/C++ to have some consistent behavior. But I’m not sure whether it’s correct to turn on Unicode support or not.
If it’s to be displayed as “\u …” it would be that I have to turn off unicode support, because \u … is ASCII representation of UTF-8. But on the other hand, if there is such decode() function which replaces the \u text with real Unicode text, I may need to turn on Unicode support.
To decide what I should do, I need the information asked so far.
Please someone respond.
Thank you in advance.