Does twitter card handle shebang urls?


#1

I run saavn.com and we use hashbangs for our urls. Does twitter cards support hashbangs? Does the bot convert hashbangs to escaped_fragments?


#2

I’m afraid it doesn’t. I just tried the Card Validator with http://www.my-site.com/path/#!dynamic-part and what I got in my access.log was:
199.59.148.209 - - [12/Sep/2013:08:20:19 +0200] “GET /path/ HTTP/1.1” 200 5846 “-” "Twitterbot/1.0"
Getting and “Invalid car type” in the Card Validator as there is no real content there.

If I try the same against Facebook debug tool, the hit in the log is:
69.171.224.115 - - [12/Sep/2013:09:06:28 +0200] “GET /path/?escaped_fragment=dynamic-part HTTP/1.1” 200 648 “-” “facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)”

And when Google spider bot passes, the hit is similar:
66.249.75.158 - - [12/Sep/2013:08:15:38 +0200] “GET /path/?escaped_fragment=dynamic-part HTTP/1.1” 200 813 “-” “Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)”

Even the MSN search bot (probably Bing) does it right:
65.55.213.74 - - [12/Sep/2013:02:06:10 +0200] “GET /path/?escaped_fragment=dynamic-part HTTP/1.1” 302 20 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”

Twitterbot should have better manners and behave like other bots, but there is a lot of questions here with none official answer related to Twitterbot not replacing the shebang for the scaped_fragment query parameter:

I just filled an issue to try to get that fixed:
https://dev.twitter.com/issues/1290


#3

As of today, the team has rolled out a change that will convert #! to “escaped_fragment.” Twitter Cards now supports hashbangs.


#4

Thank you, @jbulava, but looks like the code doesn’t get to the Card Validator yet. Am I right?

You can try these links, but you need to change the UA to something that matches /facebookexternalhit|Feedfetcher-Google|Googlebot|FlipboardProxy|Twitterbot|msnbot/si to hit the actual content_fragment and see the Card metadata.
one. http://www.laurareyero.com/retrato/#!LR_081209_01 is still hitting http://www.laurareyero.com/retrato/ (no card metadata there) instead of http://www.laurareyero.com/retrato/?escaped_fragment=LR_081209_01


#5

I have the same problem with Card Validator.
When I post url to our site with #! on tweetdeck, backends logs shows that everything is ok and twitterbot check page with “?escaped_fragment=”.

But when I try validate card, twitterbot crawl page without “?escaped_fragment=”.


#6

Could you provide with an example URL to your domain for us to investigate?


#7

Hi Jorge, we deployed an update this weekend that fixes an indexing and caching issue with #! URLs. Hopefully this resolves the issues you’ve been seeing. If not, let me know and I’ll report back to engineering with your specific case to investigate. Apologies for the slow turnaround time.


#8

TL;DR I think you’re using different ways to retrieve the URLs for the validator preview and for the final approval validation process, missing the _escaped_fragment_param in the last one.


Wow, it’s been a long time but I never was able to validate my URLs. I tried again today, looking to my web server log, and that is what I got:

When I put the URL i want to validate into the validator, the web tries to render the card and do it successfully, with these requests:

199.16.156.126 - - [16/Feb/2015:16:59:12 +0100] "GET /robots.txt HTTP/1.1" 200 101 "-" "Twitterbot/1.0"
199.16.156.126 - - [16/Feb/2015:16:59:13 +0100] "GET /something-like-fashion/?_escaped_fragment_=LR_130421_01 HTTP/1.1" 200 544 "-" "Twitterbot/1.0"
199.16.156.124 - - [16/Feb/2015:16:59:29 +0100] "GET /robots.txt HTTP/1.1" 200 101 "-" "Twitterbot/1.0"
199.16.156.124 - - [16/Feb/2015:16:59:30 +0100] "GET /something-like-fashion/?_escaped_fragment_=LR_130421_01 HTTP/1.1" 200 544 "-" "Twitterbot/1.0"
199.16.156.124 - - [16/Feb/2015:16:59:32 +0100] "GET /robots.txt HTTP/1.1" 200 101 "-" "Twitterbot/1.0"
199.16.156.124 - - [16/Feb/2015:16:59:33 +0100] "GET /something-like-fashion/?_escaped_fragment_=LR_130421_01 HTTP/1.1" 200 544 "-" "Twitterbot/1.0"
199.16.156.125 - - [16/Feb/2015:16:59:34 +0100] "GET /robots.txt HTTP/1.1" 200 101 "-" "Twitterbot/1.0"
199.16.156.125 - - [16/Feb/2015:16:59:35 +0100] "GET /something-like-fashion/?_escaped_fragment_=LR_130421_01 HTTP/1.1" 200 544 "-" "Twitterbot/1.0"

But when I request the approval, requests are different, missing the _escaped_fragment_ parameter, going to a non relevant and non card ready page:

199.16.156.124 - - [16/Feb/2015:17:00:37 +0100] "GET /robots.txt HTTP/1.1" 200 101 "-" "Twitterbot/1.0"
199.59.148.209 - - [16/Feb/2015:17:00:37 +0100] "GET /robots.txt HTTP/1.1" 200 101 "-" "Twitterbot/1.0"
199.16.156.124 - - [16/Feb/2015:17:00:39 +0100] "GET /something-like-fashion HTTP/1.1" 200 4745 "-" "Twitterbot/1.0"
199.59.148.209 - - [16/Feb/2015:17:00:39 +0100] "GET /something-like-fashion HTTP/1.1" 200 4745 "-" "Twitterbot/1.0"

I think you’re using different ways to retrieve the URLs for the validator preview and for the final approval validation process.


#9

Hi @Coquevas, thanks for pointing this out and providing the example. I’ve communicated this to the engineering team and they will investigate. (Internal tracking reference: PREL-12452)


#10