Every time a retweet is created, Twitter generates a tweet encapsulating the original tweet and automatically adds “RT @author_screen_name:” in the beginning of the retweet text field. Therefore, the author of this original tweet is also added in the retweet entities.user_mentions field.

Despite being an involuntary (i.e., Twitter automatically does it for you) and hidden (you can only see on data, not in the web interface) mention, Twitter forgets to update the mention field, when necessary, leading to inconsistencies. For instance, if the original author changed his/her user.screen_name, Twitter will update the user info in the encapsulated retweeted_status object, but it won’t update the entities.user_mentions field. I imagine this is a rare scenario if you collect tweets using the stream API, however, if you are using any other search methods, this could be problematic.

The figure below shows an example. On the left, a tweet retrieved using REST API pointing to different users at retweeted_status.user.id and entities.user_mentions[].id; on the right the same tweet, but retrieved a while ago before user.id=124053720 had changed screen_name.

Are you certain that you are collecting the data via similar methods in both cases? from looking at the id and id_str fields on the Tweets on both left and right, I see that the right-hand side is not matching and this suggests to me that your code on the right is using Javascript that cannot handle Snowflake IDs.

They were collected using different methods at different time. The one on the right is very old and was probably get in 2012 using the streaming api. The one on the left was get last week using the rest api. However, the inconsistency between us and id_str is not the problem I’m trying to address here. The problem is the difference between the user on retweeted_status from the entities.user_mention.

I replied to the main post instead of you, just making sure you are notified…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.