Which field to get urls from extended_tweet, normal entities or extended_entities present


#1

I had a use case where I was following tweets from specific twitter account and mentions of tweets from these twitter account. I am interested in extracting valid urls (not twitter.com/status/…, but real article urls ). I thoroughly investigated API documentation, and now I am confused about one aspect. There is [‘entities’][‘urls’], when truncated: True, then [‘extended_entities’][‘urls’] and both of these fields in [‘extended_tweet’]. Now, as per my understanding when tweet is longer in character limit, then extended_entities was introduced to put whatever fields in entities in there as normal entities field url can contain the twitter url (most of the times) and that being reflected in [‘entities’][‘urls’]. I don’t need that twitter url but an article url which might have been part of the tweet text.
I have case where actual url is present in entities.urls within extended_tweet with truncated:true but urls fields in extended_entities is completely empty. Can someone give me a heads up how to design my algorithm to extract real articles urls based on presence of certain fields.
For example here is the sample of a tweet.

`{
	"filter_level": "low",
	"coordinates": null,
	"favorite_count": 0,
	"contributors": null,
	"in_reply_to_user_id_str": "1917731",
	"favorited": false,
	"possibly_sensitive": false,
	"is_quote_status": false,
	"display_text_range": [9, 140],
	"text": "@thehill You can't have it both ways #MSM! It was portrayed as a snub and disrespectful in 2010. And now, it's what… https://t.co/kfjTYFXvBV",
	"created_at": "Wed Nov 29 00:03:29 +0000 2017",
	"user": {
		"contributors_enabled": false,
		"profile_use_background_image": true,
		"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
		"default_profile_image": false,
		"id_str": "266851619",
		"lang": "en",
		"profile_background_tile": false,
		"friends_count": 245,
		"follow_request_sent": null,
		"following": null,
		"protected": false,
		"profile_sidebar_fill_color": "DDEEF6",
		"name": "Gary B",
		"default_profile": true,
		"followers_count": 20,
		"profile_text_color": "333333",
		"profile_sidebar_border_color": "C0DEED",
		"profile_image_url": "http://pbs.twimg.com/profile_images/2202107814/DefofJ3_normal.jpg",
		"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
		"favourites_count": 96,
		"translator_type": "none",
		"url": null,
		"notifications": null,
		"time_zone": "Pacific Time (US & Canada)",
		"screen_name": "gwilliamb",
		"profile_link_color": "1DA1F2",
		"id": 266851619,
		"created_at": "Tue Mar 15 22:58:05 +0000 2011",
		"profile_background_color": "C0DEED",
		"geo_enabled": false,
		"is_translator": false,
		"verified": false,
		"utc_offset": -28800,
		"profile_image_url_https": "https://pbs.twimg.com/profile_images/2202107814/DefofJ3_normal.jpg",
		"description": null,
		"location": null,
		"listed_count": 2,
		"statuses_count": 1228
	},
	"in_reply_to_status_id": 935655312807419904,
	"in_reply_to_screen_name": "thehill",
	"id": 935660449659367425,
	"source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
	"in_reply_to_user_id": 1917731,
	"lang": "en",
	"place": null,
	"retweeted": false,
	"entities": {
		"hashtags": [{
			"text": "MSM",
			"indices": [37, 41]
		}],
		"user_mentions": [{
			"id": 1917731,
			"indices": [0, 8],
			"name": "The Hill",
			"screen_name": "thehill",
			"id_str": "1917731"
		}],
		"symbols": [],
		"urls": [{
			"expanded_url": "https://twitter.com/i/web/status/935660449659367425",
			"indices": [117, 140],
			"url": "https://t.co/kfjTYFXvBV",
			"display_url": "twitter.com/i/web/status/9…"
		}]
	},
	"id_str": "935660449659367425",
	"truncated": true,
	"quote_count": 0,
	"extended_tweet": {
		"entities": {
			"hashtags": [{
				"text": "MSM",
				"indices": [37, 41]
			}],
			"media": [{
				"id": 935660272261283840,
				"expanded_url": "https://twitter.com/gwilliamb/status/935660449659367425/photo/1",
				"sizes": {
					"thumb": {
						"w": 150,
						"resize": "crop",
						"h": 150
					},
					"medium": {
						"h": 789,
						"resize": "fit",
						"w": 996
					},
					"large": {
						"resize": "fit",
						"w": 996,
						"h": 789
					},
					"small": {
						"resize": "fit",
						"w": 680,
						"h": 539
					}
				},
				"media_url_https": "https://pbs.twimg.com/media/DPwiD26UQAAmLKt.jpg",
				"url": "https://t.co/epCeGZPlb0",
				"id_str": "935660272261283840",
				"display_url": "pic.twitter.com/epCeGZPlb0",
				"media_url": "http://pbs.twimg.com/media/DPwiD26UQAAmLKt.jpg",
				"indices": [238, 261],
				"type": "photo"
			}],
			"symbols": [],
			"user_mentions": [{
				"screen_name": "thehill",
				"id": 1917731,
				"indices": [0, 8],
				"name": "The Hill",
				"id_str": "1917731"
			}],
			"urls": [{
				"expanded_url": "http://bit.ly/2hXMWuz",
				"indices": [214, 237],
				"display_url": "bit.ly/2hXMWuz",
				"url": "https://t.co/UMwpCtNgXI"
			}]
		},
		"display_text_range": [9, 237],
		"extended_entities": {
			"media": [{
				"expanded_url": "https://twitter.com/gwilliamb/status/935660449659367425/photo/1",
				"sizes": {
					"small": {
						"h": 539,
						"resize": "fit",
						"w": 680
					},
					"large": {
						"resize": "fit",
						"w": 996,
						"h": 789
					},
					"medium": {
						"h": 789,
						"resize": "fit",
						"w": 996
					},
					"thumb": {
						"resize": "crop",
						"w": 150,
						"h": 150
					}
				},
				"id": 935660272261283840,
				"type": "photo",
				"url": "https://t.co/epCeGZPlb0",
				"media_url_https": "https://pbs.twimg.com/media/DPwiD26UQAAmLKt.jpg",
				"id_str": "935660272261283840",
				"display_url": "pic.twitter.com/epCeGZPlb0",
				"indices": [238, 261],
				"media_url": "http://pbs.twimg.com/media/DPwiD26UQAAmLKt.jpg"
			}]
		},
		"full_text": "@thehill You can't have it both ways #MSM! It was portrayed as a snub and disrespectful in 2010. And now, it's what? Hypocrisy runs deep and rampant when journalist don't do their jobs. Provide historical context. https://t.co/UMwpCtNgXI https://t.co/epCeGZPlb0"
	},
	"timestamp_ms": "1511913809586",
	"in_reply_to_status_id_str": "935655312807419904",
	"reply_count": 0,
	"geo": null,
	"retweet_count": 0
}`

#3

Do you have some information to share or what ?