Get Retweets of a tweet from 30 day search using searchtweets


#1

Trying to retrieve retweets from the 30 day premium archive using searchtweets is resulting in:


HTTP Error code: 422: Unprocessable Entity: This is returned due to invalid parameters in a query or when a query is too complex for us to process. –e.g. invalid PowerTrack rules or too many phrase operators, rendering a query too complex.

This is my workflow:

  1. Installed the python3 version of searchtweets using pip
  2. Cloned the repo and changed directory to the tools folder.
  3. Created a ~/.twitter_keys.yaml:
  4. Run the command
python3 search_tweets.py --filter-rule "972890661115457537 is:retweet" --print-stream --debug

Where 972890661115457537 is the tweet id of a tweet by Bernie Sanders. It was made on 11th MArch 2018.

This is the output

DEBUG:root:{
    "results_per_file": 0,
    "config_filename": null,
    "max_results": 500,
    "account_type": null,
    "from_date": null,
    "credential_yaml_key": null,
    "pt_rule": "972890661115457537 is:retweet",
    "credential_file": null,
    "debug": true,
    "env_overwrite": true,
    "max_pages": null,
    "to_date": null,
    "results_per_call": 100,
    "count_bucket": null,
    "filename_prefix": null,
    "print_stream": true
}
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "POST /oauth2/token HTTP/1.1" 200 152
WARNING:searchtweets.credentials:Grabbing bearer token from OAUTH
DEBUG:root:{
    "pt_rule": "972890661115457537 is:retweet",
    "results_per_file": 0,
    "bearer_token": "<Token>",
    "env_overwrite": true,
    "max_results": 500,
    "results_per_call": 100,
    "debug": true,
    "print_stream": true,
    "endpoint": "https://api.twitter.com/1.1/tweets/search/30day/TwitteRT.json"
}
DEBUG:root:{
    "pt_rule": "972890661115457537 is:retweet",
    "results_per_file": 0,
    "bearer_token": "<Token>",
    "env_overwrite": true,
    "max_results": 500,
    "results_per_call": 100,
    "debug": true,
    "print_stream": true,
    "endpoint": "https://api.twitter.com/1.1/tweets/search/30day/TwitteRT.json"
}
DEBUG:root:ResultStream: 
	{
    "rule_payload":{
        "maxResults":100,
        "query":"972890661115457537 is:retweet"
    },
    "tweetify":false,
    "username":null,
    "endpoint":"https:\/\/api.twitter.com\/1.1\/tweets\/search\/30day\/TwitteRT.json",
    "max_results":500
}
INFO:searchtweets.result_stream:using bearer token for authentication
DEBUG:searchtweets.result_stream:sending request
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "POST /1.1/tweets/search/30day/TwitteRT.json HTTP/1.1" 422 318
WARNING:searchtweets.result_stream:retrying request; current status code: 422
DEBUG:searchtweets.result_stream:sending request
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "POST /1.1/tweets/search/30day/TwitteRT.json HTTP/1.1" 422 317
WARNING:searchtweets.result_stream:retrying request; current status code: 422
DEBUG:searchtweets.result_stream:sending request
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "POST /1.1/tweets/search/30day/TwitteRT.json HTTP/1.1" 422 318
WARNING:searchtweets.result_stream:retrying request; current status code: 422
DEBUG:searchtweets.result_stream:sending request
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "POST /1.1/tweets/search/30day/TwitteRT.json HTTP/1.1" 422 317
ERROR:searchtweets.result_stream:HTTP Error code: 422: Unprocessable Entity: This is returned due to invalid parameters in a query or when a query is too complex for us to process. –e.g. invalid PowerTrack rules or too many phrase operators, rendering a query too complex.
ERROR:searchtweets.result_stream:rule payload: {'maxResults': 100, 'query': '972890661115457537 is:retweet'}
Traceback (most recent call last):
  File "search_tweets.py", line 190, in <module>
    main()
  File "search_tweets.py", line 184, in main
    for tweet in stream:
  File "/usr/local/lib/python3.5/dist-packages/searchtweets/result_stream.py", line 202, in stream
    self.execute_request()
  File "/usr/local/lib/python3.5/dist-packages/searchtweets/result_stream.py", line 253, in execute_request
    rule_payload=self.rule_payload)
  File "/usr/local/lib/python3.5/dist-packages/searchtweets/result_stream.py", line 101, in retried_func
    raise requests.exceptions.HTTPError
requests.exceptions.HTTPError

Also is it possible to get all the names of the retweeters of a tweet? This is required to create a network of how the information is dissipated.


#2

Hi @ItsMotwani, thanks for the question. I have a few pieces of information for you here.

First, the specific rule you’ve shared (972890661115457537 is:retweet) will not match Retweets of that Tweet id. Instead, it will match Retweets (is:retweet) that have that long number (literally) in the Tweet text. While there isn’t currently an operator for “Retweets of a specific Tweet,” a common strategy is to use match on Retweets that include a significant amount of the text of the original Tweet in a string match (something like is:retweet "some literal text from the original tweet" (plus or minus the character escaping needed to maintain the quoted phrase on your system).

Next, the error message you’re seeing (Error code: 422: Unprocessable Entity: ...) often means a rule is invalid. In this case, however, your rule looks ok and I suspect that the issue is your API package. The 30-day Search API can be used with a Sandbox or Premium package, and the is:retweet operator is not available in the Sandbox package.

Finally, the usage pattern that you’ve described here helped us update the library (and it’s documentation) to be more clear about the best ways to use the code. Ideally you shouldn’t need to clone the repo. In the latest release (available via pip as of earlier today!), when you pip install the code, the main script (search_tweets.py) should be available anywhere on your system via the environment path. If you have tab completion enabled in your shell, you can check this by starting to type out search_tweets.py (without the $ python part) and tabbing to see if it pops up. I encourage you to update your installation to this latest release so you can benefit from this simplification to your usage.