Emoji causing issues through statuses/user_timeline


#1

Having some trouble with a specific tweet due to an issue with emoji. The tweet in question that caused the issue for the app is https://twitter.com/restorationm/status/617365812769353729. Apparently there is a Russian and US flags in that tweet but the browsers are not interpreting them and the API seems to be returning a corrupted text leading to issues with the data stored in MySQL (because we can’t catch the emoji and convert it to HTML before saving; as utf8mb encoding is not an option). When pasting the URL in Slack, Slack is able to parse the emoji into the respective images. That’s how I discovered what the emoji were.

Is this something on Twitter’s end, something with how we’re communicating through the API, or maybe something else? The app is in PHP and using curl to communicate to Twitter.

I’ve tried a number of regex to try and clean up the tweet to prevent it from causing damage to the app but nothing I’ve tried seems to catch the corrupted characters.

Thanks!


#2

Just a follow up, discovered the utf8 characters causing the issue are
\xF3\xBE\x93\xAC => Russian flag
\xF3\xBE\x93\xA6 => US flag

We’ve accounted for them locally in the app.


#3

Looks like the Twitter website doesn’t correctly display those characters, either. I guess the way they were originally posted into the Tweet didn’t encode them correctly for redisplay.


#4

Turns out they are old Google emoji unicode bytes so maybe someone posted from an older Android device? Not sure but I was able to generate a list of them so hopefully we won’t have any other issues on our apps end.