Can't work out entity indices in specific tweet with emojis


#1

In this tweet the hashtag entities claim that the #AustrianGP hashtag starts at index 43. However, when I count the characters in the text I find that this hashtag starts at index 44.

According to the documentation:

The first integer represents the location of the # character in the Tweet text string.

Which means that the # character should be at index 43, and not index 44.

I’ve also noticed that most of the time the indices returned from the REST API count emojis as one characer (even though they consist of at least two unicode characters). If I count emojis as one character the indices are even more off and the hashtag starts at index 40.

So how can I make heads or tails out of the indices for that specific tweet?


#2

Bump