Empty search result, while requests and tweets are consumed based on dashboard

user
python
premium
dashboard
search-tweets

#1

Hi,

I ran three queries on Friday to collect data around certain keywords from a list of users over a period of about 4 years. Very weirdly, surprisingly, and sadly the search result is empty with 0 tweets in it. This is while all of my requests with the basic paid premium account, full-archive, for this billing cycle (starting Feb 14) is consumed (>100), and about 48.5K tweets are consumed as well. This is according to my dashboard. These numbers has certainly gone up after I submitted those queries that bring back empty search results. Also, I am pretty sure that some (or indeed many) tweets match my query. Using twitter search box shows that (and the numbers on my twitter account dashboard).

I am using python searchtweets package and collect_results function to collect the tweets. Any thoughts as to why this is happening would be much appreciated.

Also, given >100 requests and 1000x tweets has been consumed, is there a way at all to access and retrieve those search results? esp. given my allowed quote for this month is already over, without having a single date point.
(* there are 500 results/request in paid prem., so the number tweets (48.5K) and requests (over 100) in my dashboard make sense.)

Also a couple related questions (on auth) that I would appreciate some help with:
1- If I run the exact same query again, does it consume new requests and tweets, or not?
2- If I renew my bearer token and run the exact same query, does it consume new requests and tweets? (please see below for a related note)
3-Can the empty search result be related to invalid bearer token? note that I did not receive any error messages on an invalid token when I ran the queries. Indeed I got no error messages, just an empty search result was returned. (The reason I am suspicious is at some point during the week before, I got the following message after sending a (different) query:
Error message:
{
“error”: {
“message”: “Invalid or expired token.”,
“sent”: “2019-02-06T16:22:05+00:00”,
“transactionId”: " … "
}
}
)
but there might have been a problem with my request then …

An update: just to add that I used the bearer token with a 30-day sandbox account and it seems to be valid. So I assume it is unlikely that the empty search result is due to the bearer token validation, esp. that I did not get an error on that.

Thank you very much for your time and attention.


#2

If you’ve a paid account do you have access to the Counts endpoint? does that match the results i wonder? https://developer.twitter.com/en/docs/tweets/search/api-reference/premium-search.html#CountsEndpoint (they’re estimated but still)


#3

Thank you very much for your reply and the great point. I do have access to Counts with my plan, but I have not used it so far. I took a look at the documentation and a couple points seems to be on my way to run a counts query:
1- First of all, I have ran out of requests for this billing cycle, so since I assume the counts and data requests aggregate, I would most probably get a 429 error that I get with submitting data requests (HTTP Error code: 429: Request exceeds account’s current package request limits. Please upgrade your package and retry or contact Twitter about enterprise access.)(I will test.)

2- I am looking at a period of about 5 years; the Counts endpoint’s largest bucket seem to be ‘day’, so two things:

a) If I understand correctly “the ‘counts’ API endpoint will return a timestamped array of counts for a maximum of a 31-day payload of counts.” https://developer.twitter.com/en/docs/tweets/search/api-reference/premium-search#Pagination
So, given I am collecting data for a 5 year-ish period, I assume I would need to submit about ~5*12months=60 separate requests to get the total count for the whole period. Or, let me know if this is not the case and counts request can be submitted for whatever period of time. (I can still try to run it for an example 30-day period maybe to probe, if/once I can submit new requests …)

b) Now just curious, how the pageination for the ‘counts’ endpoint work. So say the bucket is set to ‘day’, each day’s tweet count will be a ‘result’? So if I run a count request for a period of 30 days, there will be 30 result points in the response? https://developer.twitter.com/en/docs/tweets/search/api-reference/premium-search.html#CountsEndpointExample%20counts%20responses

So if my plan allows 500 tweets/request, and I am running minute-level granularity (assuming the allowed time buckets are day, minute, seconds), the above request will include 30d24h60m=43,200 results and will consume 43,200/500=87 requests?
(I am not interested in the minute granularity, just making the above example trying to deliver my question and how counts pagination works)

Thank you for your help.


#4

I don’t have a paid account to try unfortunately, but i know that requests are limited to 31 days, and 100 free or 500 paid tweets per request, so a 31 day period with 501 tweets “costs” 2 requests to process.


#5

That’s right for the data endpoint. I was not too sure how pagination for the ‘counts’ endpoint worked. Thank you for your suggestion to look at counts in the first place.


#6

I still hope to figure out why my search result was empty while my requests and tweets are consumed, and of course ways to retrieve the results or set my request quotes back to zero again.

I am now suspicious about rate limiting and whether the collect_results function of searchtweets handles that properly. I guess I set a very high max_results (of 500,000 maybe) when I ran the first query. Then I ran the second one without setting the max_results. (Unfortunately I did not keep record of the values that I used for max_results but I guess that was the case). Thinking this might be related to getting zero tweets back. (in both cases results_per_call=500, as allowed by paid premium)

rule = gen_rule_payload(query,
from_date=“2015-06-01 00:00”, #UTC 2017-09-01 00:00
to_date=“2019-02-15 00:00”,#UTC 2017-10-30 00:00
results_per_call=500)

tweets = collect_results(rule,
max_results=500000,
result_stream_args=premium_search_args)

tweets = collect_results(rule,
result_stream_args=premium_search_args)


#7

Following the discussion on the ‘counts’ endpoint, I just wanted to add that seems like each ‘counts’ request returns up to 30 results (https://github.com/twitterdev/search-tweets-python#counts-endpoint).


#8

This is not necessarily true. You can find the following in our docs:
https://developer.twitter.com/en/docs/tweets/search/api-reference/premium-search.html#CountsEndpoint

For higher volume queries, there is the potential that the generation of counts will take long enough to potentially trigger a response timeout. When this occurs you will receive less than 31 days of counts but will be provided a ‘next’ token in order to continue making requests for the entire payload of counts.


This seems odd. Can you please provide the query that you used for me to investigate?
Also, what is your @handle that is associated with your dev portal account?

It will consume new requests.

This will also consumer new requests.

This should not be the case.


#9

Thank you so much for replying to my questions and providing support for the problem. Is there a way to send you these information privately? I would rather not share publicly. Thank you so much.


#10

Yes there is. I will initiate a pm now.


#11

Thank you so much. I am sending you the info.


#12

I did a couple tests with my 30-day archive sandbox account, to examine the effect of results_per_call and max_results parameters in the gen_rule_payload and collect_results functions, respectively, on the request usage and tweets usage numbers on my dashboard. The main observation relevant to the above issue was:

1- Assume there are 0 results that match your query, results_per_call = 10, and max_results = 45. If you run such query, only one request will be consumed (and not 5), and tweets usage will not increase (as there are 0 matching results).This is as you would expect but the tests with the 30-day sandbox account kind of confirmed that.

So in my case, even though I used a very large number of max_results, if there has been a smaller amount of matching tweets out there (which is almost certainly the case), only the required number of requests to pull that data from twitter should have been consumed, and the number of tweets usage will only increase by that number. In other words, if there were no matching results, no extra requests would have been used.
This is what I had in mind when I was setting a big number for the max results, but seems like this toy example confirms that (unless the behaviour changes with very large numbers).

So now, the only thing which might have been an issue is the rate limit factor, assuming a large amount of data matches the query. I am not sure searchtweets collect_results function takes care of rate limiting (indeed I do not think it does as long as I can see in the underlying code). So any thoughts or experiences would be much appreciated, and whether this could be related to empty results.

Note: collect_results function takes care of pagination and handling next_tokens to get all of the results up to max_results in one shot .


closed #13

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.