After querying the search API to retrieve some tweets (GET search/tweets), I have received tweets that do not contain the keyword I was looking for, or some containing the keyword, but not in the text.
For example, a search with “POF” returned me 3 kinds of tweets :
- tweets containing “POF” in the text (that’s OK)
- tweets containing “POF” in one of the other fields of the JSON I received (for example, in the screen_name of the user who sends the tweet, or in the expanded urls used, in the ‘entities’ field)
- tweets that do not contain “POF” at all, in the entire JSON I received (that’s my major issue)
- Can anyone tell me how tweets are indexed with keywords ?
- Are all tweets containing “POF” in any of the JSON fields returned ? Or maybe the search is based on some of the fields (if yes, what are they ?) ?
- Is it possible that the search API looked for meta-description in the URLs, even if they are shortened ? Because I got a tweet containing a shortened URL, but the final URL (not the expanded_url in entities because it was shortened twice) included “POF” in the meta-description of the referenced website. Is the search API able to do such intelligent processings ?
For 3), it’s strange because this issue only happens when the keyword is neither a hashtag (#…) nor an account (@…). Moreover, it seems to only happen on short keywords (which returns many results).
- Why did this happen ? Because “POF” does not appear at all in the JSON, not in the URLs used in the tweet; in fact, nowhere (I checked multiple times)!
Finally, when I search for ‘AbcdEfg’ (1 word), I get tweets containing ‘abcd’ AND ‘efg’ (2 words, separated).
- Why did capital letters seem to act like spaces ?
Thanks for reading.