The tweet contains retweeted status has wrong indices in entities


#1

e.g. tweet id 570801644809674752 https://snap.apigee.com/1DsLfZy

It is a tweet contains retweeted status, the url indices https://t.co/oPMN0YQu9F has indices 94,95 which is absolutely wrong. So the text field after processing with entites is messed up.

Please fix the issue, all endpoints which will show retweeted status are related.


#2

You are right, the indices are wrong for the indicated status.
Below is a (truncated) snippet:

{ 
    "id_str": "570801644809674752", 
    "text": "RT @ivenvd: 又来啦,自编译 pentadactyl hg-7157 版,兼容 Firefox 36,比官方 nightly 版新: https://t.co/oPMN0YQu9F",
    "truncated": false,
    "retweeted_status": {
        "id_str": "570769161548201984", 
        "text": "又来啦,自编译 pentadactyl hg-7157 版,兼容 Firefox 36,比官方 nightly 版新: https://t.co/oPMN0YQu9F",
        "truncated": false,
        "retweet_count": 2, 
        "favorite_count": 0, 
        "entities": { 
            "hashtags": [], 
            "symbols": [], 
            "user_mentions": [], 
            "urls": [ 
                { 
                    "url": "https://t.co/oPMN0YQu9F", 
                    "expanded_url": "https://dl.dropboxusercontent.com/u/1126893/pentadactyl-1.2pre.xpi", 
                    "display_url": "dl.dropboxusercontent.com/u/1126893/pent…", 
                    "indices": [ 
                        60, 
                        83 
                    ] 
                } 
            ] 
        }, 
        "favorited": false, 
        "retweeted": false, 
        "possibly_sensitive": false, 
        "lang": "zh" 
    }, 
    "retweet_count": 2, 
    "favorite_count": 0, 
    "entities": { 
        "hashtags": [], 
        "symbols": [], 
        "user_mentions": [ 
            { 
                "screen_name": "ivenvd", 
                "name": "Iven ʕ•̬͡•ʔ✧", 
                "id": 23571467, 
                "id_str": "23571467", 
                "indices": [ 
                    3, 
                    10 
                ] 
            } 
        ], 
        "urls": [ 
            { 
                "url": "https://t.co/oPMN0YQu9F", 
                "expanded_url": "https://dl.dropboxusercontent.com/u/1126893/pentadactyl-1.2pre.xpi", 
                "display_url": "dl.dropboxusercontent.com/u/1126893/pent…", 
                "indices": [ 
                    94, 
                    95 
                ] 
            } 
        ] 
    }, 
    "favorited": false, 
    "retweeted": false, 
    "possibly_sensitive": false, 
    "lang": "zh" 
}

As you can see there is a retweeted_status (which is what clients should actually use), which has the right indices.
Still, the outer one should have right indices too, maybe @andypiper can forward it to the Dev team so that it can be fixed?


#3

Any further response?
Here is another tweet containing two links: https://snap.apigee.com/1wBJlRr
In outer tweet, two url entities have same indices, i think it may help you find the cause.


#4

Nearly three months after, i see problem again https://snap.apigee.com/1PYgRu5
@andypiper will this be fixed?


#5

I can also confirm this is an actual issue.
Devs?


#6

I have a problem with URL indices with this tweet: https://twitter.com/reactjs/status/718191892869984256

Text: "We just shipped React v15! 🎉 Many thanks to everybody in the community who helped get it ready! https://t.co/vV9Xb5IEVk"
URL indices: [97,120] should be [98,121]
Slice: " https://t.co/vV9Xb5IEV" should be "https://t.co/vV9Xb5IEVk"


#7

[97,120] looks correct: you need to treat the emoji character as a single character - sometimes they are incorrectly interpreted as 2 characters when counting (though i didn’t check the tweet from the API, just your pasted text)