Does the API remove phisihing links before returning the result


#1

Hello,

I am a researcher, looking at the ways the users can be phished how the attacker are using this methods. I am using a machine learning platform (Orange) to fetch sample tweets.
But I discovered that the REST API removes the back links and also converts the link to twitter short form (i.e., https://t.co/ ) unfortunately this will defat the purpose for me since I do need to examine the links to identify the phished or spam tweets. is there a way of disabling this feature to get the full link and full tweet without any cleansing?
failing that, do you of a test datset that can be used for tweeter that has phished and good tweets?


#2

You can see how Twitter treats unsafe links in general in this help article. If a link is identified as problematic, users will be prevented from posting it, and Tweets containing it may be withheld from the API responses. There’s no way to disable this.

Talking about URLs in general - from an API perspective, all URLs posted to Twitter are wrapped using our t.co service. In the Tweet objects returned by the API, you can find the unwrapped / expanded URL inside the entities section of the Tweet, e.g.

  "entities": {
    "hashtags": [

    ],
    "symbols": [

    ],
    "user_mentions": [

    ],
    "urls": [
      {
        "url": "https://t.co/wpSGfpPXep",
        "expanded_url": "https://andypiper.co.uk",
        "display_url": "andypiper.co.uk",
        "indices": [
          8,
          31
        ]
      }
    ]
  },

If you use the (paid) premium APIs, you can get a URL enrichment in addition to this.

"entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
        {
            "display_url": "andypiper.co.uk",
            "expanded_url": "https://andypiper.co.uk",
            "indices": [
                16,
                39
            ],
            "unwound": {
                "description": "a weblog by Andy Piper about technology, photography, and life",
                "status": 200,
                "title": "The lost outpost",
                "url": "http://andypiper.co.uk/"
            },
            "url": "https://t.co/sYpHB2sjig"
        }
    ],
    "user_mentions": []
},

#3

Thank you. This is a great help.
Are you aware of any Twitter training dataset for detection of phishing for Machine learning?


#4

I’m sorry, I’m not aware of anything like that.


#5

Ok. Thanks. It was worth the shot :):