Entities Issues when tweet contains Emoticons


#1

Hi,

I found some inconsistencies in Twitter text entities; when the tweet contained emoticons.
Here is the tweet example.

The response from API looks something like this:

{
  "created_at": "Fri Mar 06 05:29:05 +0000 2015",
  "id": 573716910807781400,
  "id_str": "573716910807781376",
  "text": "@tweetblast3 good morning ๐Ÿ‘ฎ๐Ÿค๐ŸŽ๐ŸŽ๐Ÿ‘ป#Holi2015 #HappyHoli",
  "source": "<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>",
  "truncated": false,
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": 1647140947,
  "in_reply_to_user_id_str": "1647140947",
  "in_reply_to_screen_name": "tweetblast3",
  "user":  {
    "id": 2962644210,
    "id_str": "2962644210",
    "name": "Samantha Pane",
    "screen_name": "samanthapane6",
    "location": "",
    "profile_location": null,
    "description": "",
    "url": null,
    "entities":  {
      "description":  {
        "urls":  []
      }
    },
    "protected": false,
    "followers_count": 11,
    "friends_count": 64,
    "listed_count": 0,
    "created_at": "Wed Jan 07 08:48:51 +0000 2015",
    "favourites_count": 31,
    "utc_offset": null,
    "time_zone": null,
    "geo_enabled": false,
    "verified": false,
    "statuses_count": 266,
    "lang": "en",
    "contributors_enabled": false,
    "is_translator": false,
    "is_translation_enabled": false,
    "profile_background_color": "C0DEED",
    "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
    "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
    "profile_background_tile": false,
    "profile_image_url": "http://pbs.twimg.com/profile_images/581127119557632000/sdDQiR4j_normal.jpg",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/581127119557632000/sdDQiR4j_normal.jpg",
    "profile_link_color": "0084B4",
    "profile_sidebar_border_color": "C0DEED",
    "profile_sidebar_fill_color": "DDEEF6",
    "profile_text_color": "333333",
    "profile_use_background_image": true,
    "default_profile": true,
    "default_profile_image": false,
    "following": false,
    "follow_request_sent": false,
    "notifications": false
  },
  "geo": null,
  "coordinates": null,
  "place": null,
  "contributors": null,
  "retweet_count": 1,
  "favorite_count": 1,
  "entities":  {
    "hashtags":  [
       {
        "text": "Holi2015",
        "indices":  [
          31,
          40
        ]
      },
       {
        "text": "HappyHoli",
        "indices":  [
          41,
          51
        ]
      }
    ],
    "symbols":  [],
    "user_mentions":  [
       {
        "screen_name": "tweetblast3",
        "name": "3 Star",
        "id": 1647140947,
        "id_str": "1647140947",
        "indices":  [
          0,
          12
        ]
      }
    ],
    "urls":  []
  },
  "favorited": false,
  "retweeted": false,
  "lang": "en"
}

If you look at the hashtags entities,
It says, โ€œHoli2015โ€ appears at the indices 31 to 40. So, it actually, just calculated those emoticons as single character only. But in case of UTF-8, those emoticons are of length more than one character.

What can be done to calculate those emoticons as one character only?
All I need to arrange the offsets of those entities. In my case, the hash tag โ€œHoli2015โ€ appears at 36th position.

Thanks


Regarding emoji in twitter
#2

Hi,
Did you find any fix for this. because currently i am also facing the same issue.


#3

You need to account for it in your code. An emoji is two code points of text but the Twitter indices assume that itโ€™s a single character.

I handle this at my end by calculating the code points and then adjusting the display range myself


#4

Hi @richardhyland, care to share the matching logic? As some of the emojis can be of length 1 as well. How to capture when to increase offset?


#5

Thanks for the info @richardhyland.
@Yogin16, the following link helped solving the issue.