Search API and web search is not returning most recent tweets for some popular urls

restapi
search
api
urls

#1

tl;dr: Search API and Web search does not return any recent tweets for certain popular domains as a search term. What’s happening here?

The search API and web search is showing inconsistent behaviour with several domains as the search term. While “der-postillon.com” or “abc.net.au” only shows/returns tweets from over a year/over two months ago, e.g. “news.com.au” and most other domains just work fine.

This behaviour holds true for several tools we are using here and started to occur to us already a few months ago with the “postillon”. Now that it happens with abc.net.au (Australian public broadcaster), it starts to hurt.

The streaming API returns tweets for the urls in question though.

We’re wondering whether this is a bug or reflects some “relevance” filtering (hard to believe in the case of abc.net.au) or …


#2

I’m not sure when it changed - but twitter search indexing of URLs definitely did change at some point… i’d blame that over relevance filtering.

You need to use "url:abc url:net.au" to search for any links that have abc.net.au

"news.com.au" works like you said, but i’d still use "url:news.com.au" instead.

"der-postillon.com" fails because of the - i assume: the dash is a special search operator.
Use "url:der url:postillon.com" instead.

I’m guessing it’s because of the way twitter tokenizes and indexes tweet text and url text. Don’t think there’s documentation on exactly how tokenization is handled for search & streaming. But if there is, i’d love to see it.

Unfortunately it’s trial and error to get it to work for these edge cases.


#3

Thanks! I’ll test that. It’s really worrisome that this is not documented anywhere. I mean that’s not a developer problem anymore, that affects the user-experience in a very negative way too. What are they thinking at Twitter?


#4

The query operators are all here: https://dev.twitter.com/rest/public/search

but the url: search quirks are just something to do with the tokenizer or something else that’s in the indexing that makes it behave that way.


#5

Thanks, I’ve seen that change in October and just did not pay attention to it because I did not expect it to break things that worked so far, what I would consider good practice as long as you don’t announce it. Therefore I just forgot about it. The inconsistency, as said most urls still work the old way, is another annoying surprise. Whatever, I hope I can sort out a way that works reliably. We have to query a lot of urls. Checking every single one by hand repeatedly, because we have to expect unannounced changes that break everything again would render our project here unfeasible.


#6

Hi,
I’ve been rising a similar issue here: Twitter search for domain name fails

Thank you Igor for a nice hint with url:. However, it seems to be bogus too:

If I search for “influencive.com” over API I get 160 tweets.
If I search for “url:influencive url:com” I get mere 58 tweets.

With other domain names it seems to work as you describe. So, the influencive.com case is either a bug, or we still don’t know a proper way to search for tweets with urls.

I invite @andypiper here, hope we can get some help here.
Thank you.


#7

@ArturBrugeman: Have you made sure that the additional tweets are not false positives that don’t contain a URL? That’s what happened in our case after I’ve switched search terms. Actually a positive side effect. In my experience the . is equivalent to a space i.e. an AND operator.


#8

Hi @FlxVctr, thank you for a hint. I didn’t make sure additional tweets were not false positives. Now I looked a little deeper.

Here is an example tweet:

Found with “influencive.com”, not found with “url:influencive url:com”. I will check other tweets later, but it still seems like something is wrong here.


#9

@ArturBrugeman Oh. Yep, indeed it should have catched this one. I’ve tested with about 30 Australian websites and there the numbers (of Tweets actually pointing to the respective, if necessary unshortened, URL) were sometimes less than 5% less, but mostly significantly higher. Let us definitely know what you can find out on your side.

Just to make sure: Your searches are made exactly over the same Tweet ID range?


#10

@FlxVctr thanks for the hint, I didn’t specify any tweet id range - just collected everything search api would give. I will try to play around with id ranges too. Will get back to this when time allows. Thank you!


#11

I’m having this problem too, and again, I first reported it here: Search no longer shows domain-tweets searches from within shortened url’s And now it’s back again. Effecting Twitter Search, Tweetdeck, but strongly not HootSuite I hear!?