Search API returns tweets that do not contain search keywords!


#1

Hello,

After querying the search API to retrieve some tweets (GET search/tweets), I have received tweets that do not contain the keyword I was looking for, or some containing the keyword, but not in the text.

For example, a search with “POF” returned me 3 kinds of tweets :

  1. tweets containing “POF” in the text (that’s OK)
  2. tweets containing “POF” in one of the other fields of the JSON I received (for example, in the screen_name of the user who sends the tweet, or in the expanded urls used, in the ‘entities’ field)
  3. tweets that do not contain “POF” at all, in the entire JSON I received (that’s my major issue)

For 2)
- Can anyone tell me how tweets are indexed with keywords ?
- Are all tweets containing “POF” in any of the JSON fields returned ? Or maybe the search is based on some of the fields (if yes, what are they ?) ?
- Is it possible that the search API looked for meta-description in the URLs, even if they are shortened ? Because I got a tweet containing a shortened URL, but the final URL (not the expanded_url in entities because it was shortened twice) included “POF” in the meta-description of the referenced website. Is the search API able to do such intelligent processings ?

For 3), it’s strange because this issue only happens when the keyword is neither a hashtag (#…) nor an account (@…). Moreover, it seems to only happen on short keywords (which returns many results).
- Why did this happen ? Because “POF” does not appear at all in the JSON, not in the URLs used in the tweet; in fact, nowhere (I checked multiple times)!

Finally, when I search for ‘AbcdEfg’ (1 word), I get tweets containing ‘abcd’ AND ‘efg’ (2 words, separated).
- Why did capital letters seem to act like spaces ?

Thanks for reading.

@ytktee


#2

Hello,

I forgot to say that I also tried with exact search (POF between quotes), but the issue still remains.

@ytktee


#3

I have the same issue. Isearched for the term twitter and get tweets that do not have that term in it. same for the term “firefox”