Embedded tweet shows video but nothing in JSON references it

streaming
cards

#1

Our use case is that we can’t play videos and so we don’t want to show them. We have two HTML templates that we use to show tweets - one that shows media when the tweet doesn’t contain video and then a second that uses data-cards=“hidden” to not show any media with tweets that include video.

We’re streaming tweets in and using the tweets “media” to determine if there is a video or not. We’ve found some odd cases where the mediaType = “photo” but the “expandedUrl” contains “video” and so we know those tweets have videos too. We decide at this phase of the process which template to use. The expectation at the end of the day is that media that isn’t video should show but that any tweets that include video will show no media.

But still we have some slip through. A specific example would be tweet with id: 958094354916233219

I’d love some way while streaming the tweets in to determine that the tweet actually references a video. But somehow the above tweet ‘shadow-references’ a video. Does anyone know how I could reliably determine whether a tweet actually contains a video?

FYI, the the JSON I get from requesting details about this tweet reads:

{
  "id": 958094354916233219,
  "id_str": "958094354916233219",
  "text": "\"GOP Rep. Matt Gaetz says there is a \"mosaic of evidence\" that should stop special counsel Robert Mueller's investig. https://t.co/nPF96so1cX\"",
  "full_text": null,
  "display_text_range": null,
  "extended_tweet": null,
  "favorited": false,
  "favorite_count": 345,
  "user": {
    "name": "CNN Politics",
    "status": null,
    "description": "Political news, campaign stories and Washington coverage from CNN's political team.",
    "created_at": "2008-02-22T19:12:49-08:00",
    "location": "Washington, D.C.",
    "geo_enabled": false,
    "url": "https://t.co/KWFMkrEjdY",
    "lang": 42,
    "email": null,
    "statuses_count": 111009,
    "followers_count": 2584816,
    "friends_count": 329,
    "following": true,
    "protected": false,
    "verified": true,
    "entities": {
      "url": {
        "urls": [
          {
            "url": "https://t.co/KWFMkrEjdY",
            "display_url": "cnn.com/politics",
            "expanded_url": "http://cnn.com/politics",
            "indices": [ 0, 23 ]
          }
        ]
      },
      "description": { "urls": [] }
    },
    "notifications": false,
    "profile_image_url": "http://pbs.twimg.com/profile_images/918899077168934912/NrRRE0_b_normal.jpg",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/918899077168934912/NrRRE0_b_normal.jpg",
    "follow_request_sent": false,
    "default_profile": false,
    "default_profile_image": false,
    "favourites_count": 10,
    "listed_count": 14123,
    "profile_sidebar_fill_color": "F1F5E7",
    "profile_sidebar_border_color": "FFFFFF",
    "profile_background_tile": true,
    "profile_background_color": "100F14",
    "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/591311847145406464/NVCpyjyz.png",
    "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/591311847145406464/NVCpyjyz.png",
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/13850422/1503504077",
    "profile_text_color": "000000",
    "profile_link_color": "0000FF",
    "profile_use_background_image": true,
    "is_translator": false,
    "contributors_enabled": false,
    "utc_offset": -18000,
    "time_zone": "Eastern Time (US & Canada)",
    "withheld_in_countries": null,
    "withheld_scope": null,
    "Id": 13850422,
    "id_str": "13850422",
    "screen_name": "CNNPolitics"
  },
  "current_user_retweet": null,
  "coordinates": null,
  "entities": {
    "media": null,
    "urls": [
      {
        "url": "https://t.co/nPF96so1cX",
        "display_url": "twitter.com/i/web/status/9.",
        "expanded_url": "https://twitter.com/i/web/status/958094354916233219",
        "indices": [ 117, 140 ]
      }
    ],
    "user_mentions": [],
    "hashtags": [],
    "symbols": []
  },
  "extended_entities": null,
  "created_at": "2018-01-29T13:47:49-08:00",
  "truncated": true,
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": null,
  "in_reply_to_user_id_str": null,
  "in_reply_to_screen_name": null,
  "quoted_status_id": null,
  "quoted_status_id_str": null,
  "quoted_status": null,
  "retweet_count": 198,
  "retweeted": false,
  "retweeted_status": null,
  "possibly_sensitive": false,
  "lang": 42,
  "contributorsIds": null,
  "contributors": null,
  "source": "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck</a>",
  "place": null,
  "scopes": null,
  "filter_level": null,
  "withheld_copyright": false,
  "withheld_in_countries": null,
  "withheld_scope": null
}

#2

This is an interesting one!

The Tweet you’ve got here has come in via the streaming API. If I pull the same Tweet ID via the statuses/show endpoint with tweet_mode=extended, I see a slightly different set of metadata. I’m surprised to see

  "full_text": null,
  "display_text_range": null,
  "extended_tweet": null,

This is an extended Tweet, so I would expect those values to be populated by default on the streaming endpoint. Have you post-processed this data and “lost” any values by any chance?

The actual URL included in the extended format of the Tweet is

    "urls": [
      {
        "url": "https://t.co/UxYW1ZmqrX",
        "expanded_url": "http://snpy.tv/2ElUj9z",
        "display_url": "snpy.tv/2ElUj9z",
        "indices": [
          140,
          163
        ]
      }
    ]

This is a SnappyTV / Amplify video. The video itself is not embedded in the Tweet, but it will be displayed directly as part of the Tweet. I’m fairly sure this is the only other way that a video would be served to the embedded Tweet beyond a Tweet having video or GIF directly attached, or (for example) a YouTube or other player card.

At the moment there’s no single piece of metadata or field that says “there’s a video to be displayed here”, but that’s an interesting thought that would help to clean up the situation you’re describing.


#3

Adding on to this answer. If you were to consume this Tweet from the premium or enterprise API endpoints, you’d see this additional data:

      "urls": [
        {
          "url": "https://t.co/UxYW1ZmqrX",
          "expanded_url": "http://snpy.tv/2ElUj9z",
          "display_url": "snpy.tv/2ElUj9z",
          "unwound": {
            "url": "http://www.snappytv.com/snaps/about-the-lead-with-jake-tapper-on-cnngo_yl--5",
            "status": 200,
            "title": "Video",
            "description": "See more at cnn.com"
          },

#4

We are using the Tweetinvi library and the JSON I posted resulted from simply serializing the Tweet object provided by the library after making a GetTweet(id) request. In short, I’m not positive whether there is some post-processing going on within the library itself. We are using the latest version, 2.1.


#5

OK. So was that from the streaming API or one of the standard REST endpoints? In both cases I’m surprised by the data as I see something different via twurl so just wanted to double check.


#6

That was from one of the REST endpoints for the simple reason that the situation occurs only rarely and I’m not aware how to get a tweet via streaming API once it has already been retrieved in that manner (and we’re not saving the raw serialized stream data).