Searching hashtags in extended tweets

search

#1

I’ve noticed that searching via the REST API for hashtags with and without the # will return different results when the hashtag appears in the extended tweet. This can be problematic when retrieving tweets that use a popular hashtag, since searching with the # can yield significantly less results.

I’ve created a test script that demonstrates the problem. Here’s what it does:

  1. Creates a tweet that has a length of 137 characters that includes a unique hashtag at the end.
  2. Retweets the tweet, which will prepend RT: {screen_name} and pushes the hashtag over the 140 character mark (see extended tweet)
  3. Sleep for 20 seconds to wait for Twitter to update their search index.
  4. Print out the number of search results with and without the ‘#’ prefix.

If you don’t want to run the script you can search for #t150324403 and t1503244033 using your preferred Twitter API client, at least while they are in the search window.


#2

That seems correct to me. If you look at the RT, it does not have a hashtag entity on it so there is not actually anything for the hashtag search to match on. The same thing would happen if you were searching for the last word of a tweet that got truncated on a RT.


#3

Thanks for looking into this @abraham. That’s interesting how the hashtag entity isn’t part of the RT. Unfortunately your theory about the same thing happening when searching for the last word of a tweet isn’t correct. Try searching for t1503272104 and you will get both the original and the RT, eventhough the text is in the extended text.

Of course you are entitled to your opinion about what is correct. As a user I expect that searching for the hashtag form will return both tweets and retweets that use the hashtag. Currently this is not the behavior when the hashtag is in the extended text, and it feels like a bug that is the result of how Twitter’s indexing pipeline works.