URL entities sometimes disappear from large accounts


#1

Hi! I query the Rest API via friends to get a list of users that a user is following along with their bio/website/etc. For the bio, the URL is transformed into a t.co URL which I then change back into the display_url by grabbing the displayed URL from entities['description']['urls']. However, it looks like for some very large accounts this field blanks out for a couple minutes at a time.

95% of the time it looks as expected:

{`description`: {'urls': [{'display_url': 'NYTimes.com',
   'expanded_url': 'http://NYTimes.com',
   'indices': [111, 134],
   'url': 'https://t.co/YapuoqX0HS'}]}}

But every couple hours it just becomes:

{'description': {'urls': []}

Any way to circumvent this issue or resolve it? Thanks!


#2

This is really odd, and I can’t immediately explain it. My hypothesis might be that on the odd occasion you hit a different data center in between some kind of data sync occurring. I know that’s unscientific but I can’t easily explain it otherwise.

Can you provide any example accounts - you mention “some very large accounts” - any specific metrics we can trace this down to?


#4

Hey @andypiper thanks for the reply! Some accounts I’ve seen it with are:

  • twitter
  • nytimes
  • TheEconomist

Here’s an example Tweepy user object from nytimes with the symptom:

User(blocking=False, listed_count=183137, url='http://t.co/ahvuWqicF9', id_str='807095', profile_use_background_image=True, location='New York City', profile_background_image_url_https='https://pbs.twimg.com/profile_background_images/736339684/948f072cc2da4e3a5e9f2ebfb3b1a0e7.png', profile_text_color='333333', is_translation_enabled=True, lang='en', muting=False, entities={'description': {'urls': []}, 'url': {'urls': [{'expanded_url': None, 'url': 'http://t.co/ahvuWqicF9', 'indices': [0, 22]}]}}, has_extended_profile=False, profile_sidebar_border_color='FFFFFF', protected=False, status=Status(in_reply_to_status_id=None, id_str='807345352382775296', source='SocialFlow', in_reply_to_screen_name=None, retweet_count=16, retweeted=False, favorited=False, favorite_count=19, lang='en', geo=None, in_reply_to_user_id_str=None, id=807345352382775296, _api=<tweepy.api.API object at 0x7fa851140a58>, in_reply_to_user_id=None, text='Racist objects are not a thing of the past. We asked for your experiences with these objects. Here are your stories: https://t.co/DYSbFahS8y', truncated=False, coordinates=None, entities={'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://nyti.ms/2heduZz', 'display_url': 'nyti.ms/2heduZz', 'url': 'https://t.co/DYSbFahS8y', 'indices': [117, 140]}], 'hashtags': []}, source_url='http://www.socialflow.com', _json={'in_reply_to_status_id': None, 'id_str': '807345352382775296', 'source': '<a href="http://www.socialflow.com" rel="nofollow">SocialFlow</a>', 'in_reply_to_screen_name': None, 'retweet_count': 16, 'retweeted': False, 'favorited': False, 'id': 807345352382775296, 'geo': None, 'in_reply_to_user_id_str': None, 'favorite_count': 19, 'in_reply_to_user_id': None, 'text': 'Racist objects are not a thing of the past. We asked for your experiences with these objects. Here are your stories: https://t.co/DYSbFahS8y', 'truncated': False, 'coordinates': None, 'entities': {'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://nyti.ms/2heduZz', 'display_url': 'nyti.ms/2heduZz', 'url': 'https://t.co/DYSbFahS8y', 'indices': [117, 140]}], 'hashtags': []}, 'lang': 'en', 'place': None, 'is_quote_status': False, 'contributors': None, 'possibly_sensitive': False, 'created_at': 'Fri Dec 09 22:05:06 +0000 2016', 'in_reply_to_status_id_str': None}, place=None, is_quote_status=False, contributors=None, possibly_sensitive=False, created_at=datetime.datetime(2016, 12, 9, 22, 5, 6), in_reply_to_status_id_str=None), favourites_count=13579, time_zone='Eastern Time (US & Canada)', contributors_enabled=False, profile_link_color='607696', friends_count=966, statuses_count=259016, description='Where the conversation begins. Follow for breaking news, special reports, RTs of our journalists and more from https://t.co/YapuoqX0HS.', translator_type='none', id=807095, follow_request_sent=False, profile_sidebar_fill_color='EFEFEF', verified=True, is_translator=False, notifications=False, name='The New York Times', profile_background_tile=True, following=True, default_profile_image=False, profile_image_url_https='https://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_normal.jpg', default_profile=False, profile_background_color='131516', profile_background_image_url='http://pbs.twimg.com/profile_background_images/736339684/948f072cc2da4e3a5e9f2ebfb3b1a0e7.png', live_following=False, utc_offset=-18000, geo_enabled=False, _json={'blocking': False, 'listed_count': 183137, 'url': 'http://t.co/ahvuWqicF9', 'id_str': '807095', 'profile_use_background_image': True, 'location': 'New York City', 'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/736339684/948f072cc2da4e3a5e9f2ebfb3b1a0e7.png', 'profile_text_color': '333333', 'translator_type': 'none', 'utc_offset': -18000, 'muting': False, 'entities': {'description': {'urls': []}, 'url': {'urls':[{'expanded_url': None, 'url': 'http://t.co/ahvuWqicF9', 'indices': [0, 22]}]}}, 'has_extended_profile': False, 'description': 'Where the conversation begins. Follow for breaking news, special reports, RTs of our journalists and more from https://t.co/YapuoqX0HS.', 'status': {'in_reply_to_status_id': None, 'id_str': '807345352382775296', 'source': '<a href="http://www.socialflow.com" rel="nofollow">SocialFlow</a>', 'in_reply_to_screen_name': None, 'retweet_count': 16, 'retweeted': False, 'favorited': False, 'id': 807345352382775296, 'geo': None, 'in_reply_to_user_id_str': None, 'favorite_count': 19, 'in_reply_to_user_id': None, 'text': 'Racist objects are not a thing of the past. We asked for your experiences with these objects. Here are your stories: https://t.co/DYSbFahS8y', 'truncated': False, 'coordinates': None, 'entities': {'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://nyti.ms/2heduZz', 'display_url': 'nyti.ms/2heduZz', 'url': 'https://t.co/DYSbFahS8y', 'indices': [117, 140]}], 'hashtags': []}, 'lang': 'en', 'place': None, 'is_quote_status': False, 'contributors': None, 'possibly_sensitive': False, 'created_at': 'Fri Dec 09 22:05:06 +0000 2016', 'in_reply_to_status_id_str': None}, 'favourites_count': 13579, 'time_zone': 'Eastern Time (US & Canada)', 'contributors_enabled': False, 'profile_link_color': '607696', 'friends_count': 966, 'statuses_count': 259016, 'created_at': 'Fri Mar 02 20:41:42 +0000 2007', 'protected': False, 'is_translation_enabled': True, 'id': 807095, 'blocked_by': False, 'profile_sidebar_fill_color': 'EFEFEF', 'verified': True, 'is_translator': False, 'notifications': False, 'name': 'The New York Times', 'profile_background_tile': True, 'following': True, 'default_profile_image': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_normal.jpg', 'default_profile': False, 'profile_background_color': '131516', 'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/736339684/948f072cc2da4e3a5e9f2ebfb3b1a0e7.png', 'live_following': False, 'lang': 'en', 'geo_enabled': False, 'screen_name': 'nytimes', 'followers_count': 32058113, 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/807095/1355346050', 'follow_request_sent': False, 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_normal.jpg', 'profile_sidebar_border_color': 'FFFFFF'}, screen_name='nytimes', _api=<tweepy.api.API object at 0x7fa851140a58>, followers_count=32058113, profile_banner_url='https://pbs.twimg.com/profile_banners/807095/1355346050', blocked_by=False, profile_image_url='http://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_normal.jpg', created_at=datetime.datetime(2007, 3, 2, 20, 41, 42))

and here’s the raw JSON:

{'blocking': False, 'listed_count': 183137, 'url': 'http://t.co/ahvuWqicF9', 'id_str': '807095', 'profile_use_background_image': True, 'location': 'New York City', 'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/736339684/948f072cc2da4e3a5e9f2ebfb3b1a0e7.png', 'profile_text_color': '333333', 'translator_type': 'none', 'utc_offset': -18000, 'muting': False, 'entities': {'description': {'urls': []}, 'url': {'urls':[{'expanded_url': None, 'url': 'http://t.co/ahvuWqicF9', 'indices': [0, 22]}]}}, 'has_extended_profile': False, 'description': 'Where the conversation begins. Follow for breaking news, special reports, RTs of our journalists and more from https://t.co/YapuoqX0HS.', 'status': {'in_reply_to_status_id': None, 'id_str': '807345352382775296', 'source': '<a href="http://www.socialflow.com" rel="nofollow">SocialFlow</a>', 'in_reply_to_screen_name': None, 'retweet_count': 16, 'retweeted': False, 'favorited': False, 'id': 807345352382775296, 'geo': None, 'in_reply_to_user_id_str': None, 'favorite_count': 19, 'in_reply_to_user_id': None, 'text': 'Racist objects are not a thing of the past. We asked for your experiences with these objects. Here are your stories: https://t.co/DYSbFahS8y', 'truncated': False, 'coordinates': None, 'entities': {'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://nyti.ms/2heduZz', 'display_url': 'nyti.ms/2heduZz', 'url': 'https://t.co/DYSbFahS8y', 'indices': [117, 140]}], 'hashtags': []}, 'lang': 'en', 'place': None, 'is_quote_status': False, 'contributors': None, 'possibly_sensitive': False, 'created_at': 'Fri Dec 09 22:05:06 +0000 2016', 'in_reply_to_status_id_str': None}, 'favourites_count': 13579, 'time_zone': 'Eastern Time (US & Canada)', 'contributors_enabled': False, 'profile_link_color': '607696', 'friends_count': 966, 'statuses_count': 259016, 'created_at': 'Fri Mar 02 20:41:42 +0000 2007', 'protected': False, 'is_translation_enabled': True, 'id': 807095, 'blocked_by': False, 'profile_sidebar_fill_color': 'EFEFEF', 'verified': True, 'is_translator': False, 'notifications': False, 'name': 'The New York Times', 'profile_background_tile': True, 'following': True, 'default_profile_image': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_normal.jpg', 'default_profile': False, 'profile_background_color': '131516', 'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/736339684/948f072cc2da4e3a5e9f2ebfb3b1a0e7.png', 'live_following': False, 'lang': 'en', 'geo_enabled': False, 'screen_name': 'nytimes', 'followers_count': 32058113, 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/807095/1355346050', 'follow_request_sent': False, 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_normal.jpg', 'profile_sidebar_border_color': 'FFFFFF'}

#5

Pinging this thread as it’s been a month since I’ve heard anything!


#6

Thanks, I dropped this while we were taking some time out, apologies for that.

You originally mentioned accuracy of around 95% - I still think this could be related to eventual consistency and data center caching, but I can’t be sure. I’ll have to dig in to see if this is related to how the URL is hydrated into the user object. Are you able to provide any more recent examples (accounts this happens on, frequency of calls and data, and header tracing, the Server HTTP Header in failing cases, etc. Thanks.