Hi,
I found some inconsistencies in Twitter text entities; when the tweet contained emoticons.
Here is the tweet example.
The response from API looks something like this:
{
"created_at": "Fri Mar 06 05:29:05 +0000 2015",
"id": 573716910807781400,
"id_str": "573716910807781376",
"text": "@tweetblast3 good morning 👮🐤🎁🎐👻#Holi2015 #HappyHoli",
"source": "<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": 1647140947,
"in_reply_to_user_id_str": "1647140947",
"in_reply_to_screen_name": "tweetblast3",
"user": {
"id": 2962644210,
"id_str": "2962644210",
"name": "Samantha Pane",
"screen_name": "samanthapane6",
"location": "",
"profile_location": null,
"description": "",
"url": null,
"entities": {
"description": {
"urls": []
}
},
"protected": false,
"followers_count": 11,
"friends_count": 64,
"listed_count": 0,
"created_at": "Wed Jan 07 08:48:51 +0000 2015",
"favourites_count": 31,
"utc_offset": null,
"time_zone": null,
"geo_enabled": false,
"verified": false,
"statuses_count": 266,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"is_translation_enabled": false,
"profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_image_url": "http://pbs.twimg.com/profile_images/581127119557632000/sdDQiR4j_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/581127119557632000/sdDQiR4j_normal.jpg",
"profile_link_color": "0084B4",
"profile_sidebar_border_color": "C0DEED",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"default_profile": true,
"default_profile_image": false,
"following": false,
"follow_request_sent": false,
"notifications": false
},
"geo": null,
"coordinates": null,
"place": null,
"contributors": null,
"retweet_count": 1,
"favorite_count": 1,
"entities": {
"hashtags": [
{
"text": "Holi2015",
"indices": [
31,
40
]
},
{
"text": "HappyHoli",
"indices": [
41,
51
]
}
],
"symbols": [],
"user_mentions": [
{
"screen_name": "tweetblast3",
"name": "3 Star",
"id": 1647140947,
"id_str": "1647140947",
"indices": [
0,
12
]
}
],
"urls": []
},
"favorited": false,
"retweeted": false,
"lang": "en"
}
If you look at the hashtags entities,
It says, “Holi2015” appears at the indices 31 to 40. So, it actually, just calculated those emoticons as single character only. But in case of UTF-8, those emoticons are of length more than one character.
What can be done to calculate those emoticons as one character only?
All I need to arrange the offsets of those entities. In my case, the hash tag “Holi2015” appears at 36th position.
Thanks