Random sampling in Search API


#1

I know that streaming API is using random sampling, what’s about Search API?

As search API is extracted tweets from last 7 days (include the day of making queries i believe), therefore returned tweets are in combination of current and past. However, as it returned 100 tweets per time max, what’s the logic behind to pick the 100 tweets? Will it count duplicate from previous request? has it implement random function to get the tweets from the data bank?


#2

The Search API is “optimised for relevance” - i’m not sure if tweets are randomly sampled to be included in the search in the first place - this is something i’d like to know more about too.

The 100 Tweets returned for a search are different, this is just a limit of tweets you get per request - the first “page” of results - to get more results you need to use max_id parameter (same as navigating timelines) to get more: see “next_results” in “search_metadata” in https://dev.twitter.com/rest/reference/get/search/tweets


#3

Thanks @IgorBrigadir. Yup the 100 tweets is just the first page, but I don’t think they are using 50 million tweets per day x 7 (Search API extracts tweets for the past 7 days) as the base to do the query request.

For optimization: I am interested to know what does Twitter means by optimization, what’s the logic behind. How to calculate the relevance - base on no of favorite counts or other machine learning algorithm?

For the 100 returned tweets, I wonder if there is max limit for the “more page”. (but due to the rules setup by twitter, every extra page is count for 1 request, and there is a rate limit) But it seems that if i made the request within a short timeframe, the returned 100 tweets with different results. May be i should made a simultaneous request to understand more the logic. However, I am thinking if there is randomness and extra logic built in.


#4

Unfortunately we are not able to share the details behind the implementation here.


#5

@andypiper oh…that’s sad to hear that. May I know if search API is also using random sampling as the streaming API then?


#6

There’s no random sample here - it is just that the search index for the API is not a complete one. There’s no connection between the streaming API, and the search API (the latter is part of the REST family of APIs on our side). Sorry that we can’t go into detail about the implementation of all of the parts of the system.


#7

@andypiper Hi Andy,
I have a question regarding the 100 tweets bit of this original post.

We have search set up on search/tweets but it seems to be only pulling in a maximum of 100 tweets per request.
Is there a specific reason why there is only being 100 returned?

Thanks, Luke


#8
count optional The number of tweets to return per page, up to a maximum of 100. Defaults to 15.

You can use cursoring to page through additional results.