Hello!
I am working with the searchtweets (v2) Python module to extract multiple tweets about a topic of interest. Using the “ResultsStream” function, I was able to generate a list of tweets of interest, including a “next token” that I can use to get the next set of results. However, I am unclear as to how to call this “next token” such that I can get the corresponding tweets and iteratively produce the next token. I have tried looking through then solutions here as well as the documentation, but I am still uncertain as to where it goes (e.g., "gen_request_parameters" vs. ResultStream vs. somewhere else).
Any guidance would be greatly appreciated! Thank you!
ResultsStream returns a generator and handles pagination for you: search-tweets-python/result_stream.py at v2 · twitterdev/search-tweets-python · GitHub so you do not need to worry about the next token. Result stream will run until you’ve hit one of the specified limits for tweets or requests or there are no more results. So to get all tweets from a result stream, iterate over it somehow.
2 Likes
Wonderful! Thank you so much!
Hi I could do the same with the ResultsStream, but unable to understand how to save the generated list of tweets in a .csv format. Can I directly save it as a pandas dataframe?
Here is my code:
rs = ResultStream(request_parameters=query,
max_results=500,
max_pages=1,
**search_args)
print(rs)
tweets = list(rs.stream())
[print(tweet) for tweet in tweets[0:10]]
Any help will be much appreciated. Thanks
The latest development version includes an optional output format GitHub - twitterdev/search-tweets-python at v2
rs = ResultStream(request_parameters=query,
max_results=500,
max_pages=1,
output_format="a",
**search_args)
And then use pandas.json_normalize — pandas 1.5.0 documentation on the json objects to get a dataframe that you can export as CSV.
2 Likes
Thanks Igor, using the output_format = “a”, I am getting the following error.
TypeError Traceback (most recent call last)
in
2 max_results=500,
3 max_pages=10,
----> 4 output_format=“a”
5 **search_args)
6
TypeError: unsupported operand type(s) for ** or pow(): ‘str’ and ‘dict’. Am I missing something in the library calls?
Sorry, that was a syntax error, missed a ,
rs = ResultStream(request_parameters=query,
max_results=500,
max_pages=1,
output_format="a",
**search_args)
(but make sure that this is the latest v2 branch version, not the pypi latest version)
Thanks Igor for the response, I should have seen that syntax error. My bad for overthinking it!
Meanwhile, if you come across any robust json to csv example for the v2 twitter datasets, please do share the code link here. It will be of much-needed help.
I have a plugin for twarc2 that does this GitHub - DocNow/twarc-csv: A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
but it won’t run on very large datasets yet - i’ll get around to that later.
Thanks again. I have another question concerning ResultsStream. I am trying to get historical tweets by ‘Shell’ from 2007 until 2021. I am following the exact code as mentioned in the [Python searchtweets v2] tutorial. I changed the endpoint in the yaml as: endpoint: "https://api.twitter.com/2/tweets/search/all. However, I see that only the first 500 tweets are getting pulled, here is the code:
# import packages
import pandas as pd
import requests
import os
import json
import time
import ast
import yaml
import logging
import sys
import numpy as np
import math
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
from searchtweets import load_credentials
load_credentials(filename="./search_tweets_creds_example.yaml",
yaml_key="search_tweets_v2_example",
env_overwrite=False)
from searchtweets import ResultStream, gen_request_parameters, load_credentials, write_result_stream
query = gen_request_parameters("from:Shell", results_per_call=500, tweet_fields="created_at", start_time="2010-01-01", end_time="2021-03-30")
rs = ResultStream(request_parameters=query, **search_args)
from searchtweets import collect_results
search_args = load_credentials(filename="./search_tweets_creds_example.yaml",
yaml_key="search_tweets_v2_example",
env_overwrite=False)
rs = ResultStream(request_parameters=query,
max_results=500,
max_pages=20,
output_format="a",
**search_args)
print(rs)
stream = write_result_stream(rs, filename_prefix="test")
result = [tweet for tweet in stream]
As a way around, I am using postman to pull the historical tweets manually by using next_token in the query parameter. Is this a good practice? I am gathering this data for a paper on climate change, but this is very exhausting.
Could you please check why ResultStream is not retrieving all the historical tweets of ‘@Shell’?
Thank you, Igor, hopefully, it will be my last question.
Hi tweet4epi,
I was wondering if you can share your ResultStream code. Somehow, I cannot pull the entire tweet and the results stop at 500 tweets. You can see my code in the last comment to Igor. Your help will be much appreciated.
Thanks, Ramit
this limits the result stream to 500 tweets.
this will also limit the results, to under 10,000 because it’s 20 max calls, each with a max of 500 tweet results.
You can remove these to get all tweets but make sure your query is correct and your output works because this can use up nearly all of your quota for the month.
results_per_call=500 this in gen_request_parameters sets the 500 tweet results per call to the API, which you should leave in.