Because the Search API documentation includes the following:
“Before getting involved, it’s important to know that the Search API is
focused on relevance and not completeness. This means that some Tweets
and users may be missing from search results. If you want to match for
completeness you should consider using a Streaming API instead.”
I thought I’d compare the data from the Search API to the data from the Streaming API.
Using Python and the statuses//filter API I recorded tweets where track=@bostonglobe - I left this open for 7 hours (9:16 AM EST through 4:16 PM EST). I then used the search/tweets API where q=@bostonglobe and then filtered the resulting data down to only those tweets made between those times for which I had the streaming API open.
statuses/filter provided 1,734 total tweets that mentioned @bostonglobe
search/tweets provided 2,168 total tweets for the query @bostonglobe
There were 248 tweets in the streaming data that were not included in the search data, and there were 701 tweets that were in the search data that were not in the streaming data.
There was no particular time span that contributed to the discrepancy, and I haven’t identified any other pattern for tweets that are missing from one or the other. Any ideas as to why there is such a difference? I was actually expecting to see the search data to be substantially lower than the streaming data.