I have some PHP for outputting tweets using via the 1.1 api for user_timeline, where I am using the start and end indices for string replacement to add in the links. However, when unicode characters are present, the index values do not take those full character lengths into account, leading to the string replacement cutting off unicode characters incorrectly.
What would be the best method of ensuring that my string replacement for entities will not interfere with unicode characters which have more then one “character” in the string?
Example: https://twitter.com/PresHernandez_/status/390187276195491840
My output right now, note the cut off unicode character before the t.co link.
Another sneak peek��<a href="http://t.co/ZqyFEgJcKI" target="_blank">http://t.co/ZqyFEgJcKI</a>c<a href="http://www.twitter.com/WEtv" target="_blank">@WEtv</a>etv
To generate the replacement, I’m using the following -
foreach($tweet_entities as $entity_replace){
$tweet_replace = '<a href="'.$entity_replace['replace_url'].'" target="_blank">'.$entity_replace['replace_text'].'</a>';
$tweet_replace_length = $entity_replace['end'] - $entity_replace['start'];
$tweet_text = substr_replace($tweet_text, $tweet_replace, $entity_replace['start'], $tweet_replace_length);
}