Parsing emojis

emoji

#1

When using the Twitter API you get emojis as unicode characters. You can then easily parse out and display emojis as pictures. One example of such a parsing tool is Twemoji, which is the official emoji parsing tool from Twitter. In the twemoji.js file there is a long regular expression they use to parse out the emojis.

However, this file has not been updated in two years, and there are many emojis missing from Twemoji that are displayed just fine on twitter.com.

Does anybody know of any other well maintained list of current emojis supported by twitter.com? A bonus would be that they use/provide a regular expression I can import into my own code.


#2

Twemoji was last updated in May for Unicode 10.0. So far as I’m aware, that should cover all of the emoji currently supported by twitter.com. What is specifically missing?


#3

twemoji.js was last commited to 2 years ago, according to the Twemoji github page. I have been using the regular expression from that file in my application, to great success, but there are a lot of emojis that are not caught correctly by that regex. Examples are plenty in this tweet. Look in the tweet from the API to see the unicode character representations:

The mountain emoji is not caught by the regex.
The person emoji is caught, but on twitter.com this emoji consists of two emojis (four unicode characters), but twemoji only uses the first emoji characters, which leaves something else behind it.
The Austrian flag also consists of two emojis, and the regex catches these as two separate emojis (one saying “A”, another saying “T”).

I have been able to fix these by adding them to the regex, but it would be nice to have a more complete regex that already includes them, as well as others that have been added in the two years that twemoji.js has not been updated.


#4

Thanks for the feedback. I see you raised a Github issue as well, which is probably the most appropriate way to point out what is missing, since the code is not maintained by the teams that operate these forums. I’ll see if I can bring it to the correct folks.

I’m not aware of an alternative regex at the moment.


#5

I really hope you’ll be able to forward this to the correct folks. It’s quite boring having to update a regex every time I discover some missing emojis. Let’s hope the Github issue is taken care of.

Thanks for the reply and for trying, though :slight_smile: