Entities Issues when tweet contains Emoticons



I found some inconsistencies in Twitter text entities; when the tweet contained emoticons.
Here is the tweet example.

The response from API looks something like this:

  "created_at": "Fri Mar 06 05:29:05 +0000 2015",
  "id": 573716910807781400,
  "id_str": "573716910807781376",
  "text": "@tweetblast3 good morning ๐Ÿ‘ฎ๐Ÿค๐ŸŽ๐ŸŽ๐Ÿ‘ป#Holi2015 #HappyHoli",
  "source": "<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>",
  "truncated": false,
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": 1647140947,
  "in_reply_to_user_id_str": "1647140947",
  "in_reply_to_screen_name": "tweetblast3",
  "geo": null,
  "coordinates": null,
  "place": null,
  "contributors": null,
  "retweet_count": 1,
  "favorite_count": 1,
  "entities":  {
    "hashtags":  [
        "text": "Holi2015",
        "indices":  [
        "text": "HappyHoli",
        "indices":  [
    "symbols":  [],
    "user_mentions":  [
        "screen_name": "tweetblast3",
        "name": "3 Star",
        "id": 1647140947,
        "id_str": "1647140947",
        "indices":  [
    "urls":  []
  "favorited": false,
  "retweeted": false,
  "lang": "en"

If you look at the hashtags entities,
It says, โ€œHoli2015โ€ appears at the indices 31 to 40. So, it actually, just calculated those emoticons as single character only. But in case of UTF-8, those emoticons are of length more than one character.

What can be done to calculate those emoticons as one character only?
All I need to arrange the offsets of those entities. In my case, the hash tag โ€œHoli2015โ€ appears at 36th position.



Did you find any fix for this. because currently i am also facing the same issue.


You need to account for it in your code. An emoji is two code points of text but the Twitter indices assume that itโ€™s a single character.

I handle this at my end by calculating the code points and then adjusting the display range myself


Hi @richardhyland, care to share the matching logic? As some of the emojis can be of length 1 as well. How to capture when to increase offset?


Thanks for the info @richardhyland.
@Yogin16, the following link helped solving the issue.