I am building a Link Shortener that takes full urls and creates a vanity url for them. One thing that I am seeing is that the Open Graph data for the full url is not being carried over in the tweet, probably because the vanity url has no HTML. Is there a way to put Open Graph data into the tweet object when it is posted? If not, what is the best option to carry this data so that a card or preview is generated?

The only way to add this data is on the server side in the HTML page.

Moving this to the Twitter Cards category.

1 Like

What about adding a preview of any sort? Also, when Twitter reads the url does it go through the redirect or not? Curious as to why it doesn’t grab the OG data from the final URL

Ah, maybe I was misunderstanding here - I assumed that you were asking about directly adding OG data into the Tweet object.

The redirect should be followed. Can you provide me an example of one or two of the shortened links that point to a final URL which does have OG data? I can have a quick poke.

here is one of the tweets

curl https://isaacl.dev/biw gets:

< HTTP/1.1 200 OK
< Date: Tue, 21 Apr 2020 10:32:55 GMT
< Content-Type: text/html
...
< 
<html>

<head><meta property="og:title" content="Ahead of the Pack, How Microsoft Told Workers to Stay Home"><meta property="og:type" content="article"><meta property="og:image" content="https://static01.nyt.com/images/2020/03/15/business/15microsoft/15microsoft-facebookJumbo-v4.jpg"><meta property="og:url" content="https://www.nytimes.com/2020/03/15/technology/microsoft-coronavirus-response.html"><meta property="og:description" content="Its executives, with headquarters just a few miles from one of the country’s worst coronavirus outbreaks, were among the first to confront the impact."></head>

<body><script>window.location.replace('https://www.nytimes.com/2020/03/15/technology/microsoft-coronavirus-response.html?utm_source=isaacl&utm_medium=twitter&utm_campaign=link&WT.mc_id=link-twitter-isaacl')</script></body>

Your shortener service is not a proper redirect, it returns HTTP 200 so twitter will never follow the <script> redirect.

Other link shorteners like bit.ly work because they’re HTTP 301 redirects, which twitter follows to the original site and reads the meta tags on.

To keep the current HTTP 200 behavior, you’ll have to get the twitter card meta tags from the original site yourself, and include them in your response (currently you only have the og tags, to get twitter cards you need all the twitter card tags Cards markup | Docs | Twitter Developer Platform), or you can change how your shortener works to make it a proper 301 redirect.

Or alternatively, reconsider using a vanity url shortener - they’re pretty much useless with every url on twitter wrapped in t.co anyway, it just makes your links more brittle in future.

1 Like

@IgorBrigadir I changed the code to do a 302 instead of populating OG data and it still does not work. curl returns the full html

Oh, this time it’s robots.txt denying twitterbot: https://isaacl.dev/robots.txt

user-agent: *
disallow: /

Getting started with Cards | Docs | Twitter Developer Platform to fix that - and check with https://cards-dev.twitter.com/validator - and then i think it should work!

1 Like

So funny story, I can’t update my robots.txt without a redirect, which I have done. The validator does not like that

https://isaacl.dev/robots.txt

Any thoughts @IgorBrigadir @andypiper

I don’t understand what exactly you mean by you not being able to update robots.txt without a redirect - robots.txt can be a static file served on the root - it doesn’t need to be more complicated than that as far as i know.

I don’t know if having robots.txt a redirect or not matters, but the file resolves and is reachable, and
robots.txt still has:

user-agent: *
disallow: /

This still bars TwitterBot from crawling your site - unless you add a specific line in that robots.txt that allows Twitterbot I doubt you’ll be able to fix it.

I just tried pulling that URL directly and as @IgorBrigadir mentions, this still blocks all crawlers.

Try now, it still block in Twitter card validator

Ah - your link is actually redirecting to the NYT website. That’s where the robots.txt is also blocking the crawler. That appears to be because of the URL parameters added in the redirect. The original article link https://www.nytimes.com/2020/03/15/technology/microsoft-coronavirus-response.html validates just fine, but with the ?utm_source=isaacl&utm_medium=twitter&utm_campaign=link&WT.mc_id=link-twitter-isaacl URL parameters the NYT site is rejecting the crawler. In their robots.txt file there’s this:

User-agent: *
...
...
...
Disallow: /*?*utm_source=
Disallow: /*?*login=
Disallow: /*?*searchResultPosition=
Disallow: /*?*campaignId=
Disallow: /*?*mcubz=
Disallow: /*?*smprod=
Disallow: /*?*ProfileID=
Disallow: /*?*ListingID=

So, your redirect is not going to work with a Twitter card.

Ah got it, so it will work on redirects that do not block my QS params. Why does the link show previews in this thread?

the forum software is not using the same technology as the Twitter cards crawler, so I do not know how it is pulling in the images,

Well it looks for some links and others it does not, guess I will have to live with it. Thanks Andy and Igor

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.