Recently we started to get traffic to our shared resources from a new crawler that is not identified as Twitterbot but as a random user agent:
::ffff:10.0.3.70 - - [06/Sep/2017:13:42:14 +0000] "GET /nt8T HTTP/1.1" 301 53 "-" "Twitterbot/1.0" ::ffff:10.0.3.70 - - [06/Sep/2017:13:42:14 +0000] "GET /nt8T HTTP/1.1" 301 53 "-" "Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en-US) AppleWebKit/534.8+ (KHTML, like Gecko) Version/18.104.22.1681 Mobile Safari/534.8+"
As you can see, the IP is the same and at the same time. It happens every time we share a new resource and this seems a new not documented behaviour (https://dev.twitter.com/cards/getting-started#crawling).
We need to be able to identify the crawler to avoid counting visits. Until now that was easy but now, using random user agents, we’ll need to use a IP blacklist (we can’t use robots, we don’t want to block).
Is this new behaviour ok or just some tests? Is there any safe way we can identify the crawler now?
Thanks in advance