Full-archive search issue report:
During recent attempts to fetch tweets using twitter API v2, we noticed that there were missing ‘urls’ under ‘entities’ for tweets retweeting another tweet. This happened most, but not all of the time. For example, tweet with id 1499880727214706692 has entities:
{‘mentions’: [{‘start’: 3, ‘end’: 19, ‘username’: ‘criticalthreats’, ‘id’: ‘106738320’}, {‘start’: 112, ‘end’: 126, ‘username’: ‘TheStudyofWar’, ‘id’: ‘71298686’}], ‘hashtags’: [{‘start’: 27, ‘end’: 35, ‘tag’: ‘Russian’}]}. Notice that the image below is the example tweet, it has a URL.
See attached spreadsheet which includes a subset of retweets for
@aei, where the majority have a URL in the original tweet, and yet these aren’t included in entities (column Q is filled where it is included). (Green shaded rows, between 2-21, do not have a URL in the original tweet, and therefore should not have one in entities.)
It isn’t clear why sometimes the URL from the original tweet is included in entities, and why it usually isn’t.
Getting the expanded URL:
Where tweet authors have used a shortened URL in their original tweet, the API often doesn’t include the fully expanded URL.