TwitterBot: does it respect the wildcard character in Allow direcives (i.e. /users/*/photo_album/)


#1

Is there any documentation on what the TwitterBot considers valid robots.txt? Since this is a non standard file, it’s really up to each bot to implement or not implement whatever feature set they want (i.e. googlebot decided to implement a wildcard character when looking through urls in the robots.txt file).

Trial and error is far too slow since the twitter validator tool for twitter cards decides to cache the robots.txt file for 24 hours. Any help, especially documentation, would be appreciated. Thanks!


#2

As a follow up, I could just do:

UserAgent: TwitterBot
Disallow:

But in that case, what does this mean for my site. Will TwitterBot actually crawl my entire site or just specific links users post on twitter?


#3

I’m having the same problem.