Twitter Card summary_large_card not working for some (not all) URLs

cards

#1

Hello! :slight_smile: I have two URLs, one which does render a Twitter card and one which does not. I don’t believe there is anything different between the two; they both have the required twitter card markup in the (twitter:card, twitter:site, twitter:title) as well as twitter:description, twitter:image and twitter:creator

The image size on both URLs is < 2 MB.

For the broken URL I do see the warning “WARN: Not whitelisted” when checking with the Twitter card validator but I understand this is misleading (Not Whitelisted, unable to render, or no image: READ THIS FIRST).

Requesting curl -A Twitterbot $URL I can see what I believe to be the correct markup in the section for both pages.

Please help? :slight_smile:


#2

Hmm. I just tried requesting both URLs using curl -A Twitterbot, and received an HTTP 416 and “Pardon Our Interruption” page indicating that either your CDN or the site itself thinks I’m a bot (which, of course, in this case, I am) - so I do not see the Twitter Cards markup in the page header. Twitterbot expects an HTTP 200 and for the tags to be in the page it receives. I don’t know if there’s some explicit rule in place blocking access? It’s possible you don’t see the same thing from within the corporate network, I guess?


#3

Hmm, this is interesting. I do not see this at all from inside our network. I shall try from outside the network and update :slight_smile:

Why do you suppose that one URL worked (specifically, https://www.skyscanner.net/news/14-worlds-most-amazing-abandoned-airports) but the other didn’t despite you receiving the same unexpected HTTP 416 response for both?


#4

So I just checked again, and this doesn’t seem to be related to the user-agent string - it just doesn’t like responding to curl (and our internal crawler is doing basically the same thing and getting the same response). If it helps, the page returned contains references to cdn.distilnetworks.com which I assume is a CDN which fronts your site, and contains a form to fill in to get whitelisted.

My hypothesis is that the working URL was cached at some point in the past week, potentially before this anti-robot measure (or whatever is now in place) was working?

You may also want to check that Twitter’s IP ranges are not blacklisted somehow. We recently changed these - see Update to Twitter outbound IP configuration (may affect Cards)


#5

Yeah, on further inspection I have a bit of a working theory that agrees with your suggestion. I suspect that the “good” URL has been requested at some point previously and been cached therefore the response is being served from our CDN. In the case of the “bad” URL it hasn’t been cached and the page has been rendered from scratch. We do have an anti-scraping/crawling layer that seems to be misbehaving here. I’ve been able to reproduce the problem, at least.


#6

OK thanks - and understandable to have some protections in place! Hope you can figure out a way to let the right things happen here :slight_smile:


#7

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.