Search API returns invalid results?

search

#1

Hello all,

I am creating a bug report after a user of the library mentioned that the Search API was not returning the results he was expecting.

So I read again the documentation of the Search and I think he is correct.

Lets take this very simple example : https://api.twitter.com/1.1/search/tweets.json?q=TRUMP

URL Entities Matches

First thing we can note is that Twitter is not only filtering based on the text but also based on the content of Url entites.

For example the Tweet https://twitter.com/StateOfNetwork/status/862024698091188224 does not contain anywhere the TRUMP keyword BUT the link description does (on twitter.com). The JSON returned by Twitter REST API does not include any metadata with the TRUMP keyword.

The problem here is the fact that the JSON provided does not include the extra link metadata because it makes it impossible for the developers to understand why the tweet has been matched.

Image Entities Matches

Another type of matching seems to be images. For example the tweet https://twitter.com/AndyKroll/status/861688635002941441 contains an image with the text TRUMP.

Same problem the JSON does not provide any information that the tweet is related to TRUMP

Mysterious matches

Finally we have what I call the mysterious matches.

With a search for the keyword ISIS twitter has returned the Tweet https://twitter.com/DMV_Drummerboyy/status/832426727343271937 which has no text whatsoever related with ISIS.

I verified ALL the comments related to this tweet and none of them mention ISIS either.

Could you please provide more information as to what is going on please.

Cheers,
Linvi


#2

Just had a quick look at each of these - with such fast-moving topics it is difficult to reproduce the search results themselves.

I cannot find the keyword for Tweet 862024698091188224 in the JSON response, but as you say, it appears in the text in the card (the API does not return cards data and there are no current plans to change this). I agree that this is unexpected as a search result.

If I retrieve 861688635002941441 via the API with ?tweet_mode=extended then the expanded_url field contains a matching term, so this is not related to the image in this case.

I’m not certain why Tweet 832426727343271937 is matching on that term. It also seems to be outside of the 7 day window from the Search API so I’m not sure why it is being returned at all.

Per the recent announcements related to the APIs, we’re in the process of building a better Search API that should return more consistent results, with a more complete index.