Hi,
I am working on an academic project and have full academic access. I am trying to pull tweets and then save them into a csv file. I am able to get tweets and save them into a jsonl file without an issue. Below is the code I am using:
> def main():
> # Specify the start time in UTC for the time period you want Tweets from
> start_time = datetime.datetime(2021, 6, 1, 0, 0, 0, 0, datetime.timezone.utc)
>
> # Specify the end time in UTC for the time period you want Tweets from
> end_time = datetime.datetime(2021, 6, 2, 0, 0, 0, 0, datetime.timezone.utc)
>
> # This is where we specify our query as discussed in module 5
> query = "(\"IPCC\" OR (\"Intergovernmental Panel\" (\"Climate change\" OR \"Global warming\")) OR (\"Intergovernmental Report\" (\"Climate change\" OR \"Global warming\")) OR ((\"United Nations Panel\" OR \"UN Panel\") (\"Climate change\" OR \"Global warming\")) OR ((\"United Nations Report\" OR \"UN Report\") (\"Climate change\" OR \"Global warming\"))) lang:en -is:retweet"
>
> # The search_all method call the full-archive search endpoint to get Tweets based on the query, start and end times
> search_results = client.search_all(query=query , start_time=start_time, end_time=end_time, max_results=10)
>
> # Twarc returns all Tweets for the criteria set above, so we page through the results
> for page in search_results:
> with open("results.jsonl", "a", encoding="utf8") as json_file:
> json.dump(page, json_file)
> json_file.write("\n")
> if __name__ == "__main__":
> main()
The jsonl file looks fine.
Then I use this code to convert the jsonl file to csv
#converting to csv from json, specifying output column for csv file through: output_columns="id,text,created_at,author_id"
with open("results.jsonl", "r", encoding="utf8") as infile:
with open("output.csv", "w", encoding="utf8") as outfile:
converter = CSVConverter(infile, outfile, json_encode_all=False, json_encode_lists=True, json_encode_text=False, inline_referenced_tweets=True, allow_duplicates=False, batch_size=1000, output_columns="id,text,created_at,author_id")
converter.process()
I get a csv file but the text in it changes–it is not just clearly readable alphabets like in the jsonl file.

Does anyone have suggestions for how I can avoid getting such text: “Äôs” “‚Äú”?
Also, when I don’t specify output column in the jsonl to csv cde, the csv data is mangled. For example:
The ID column fr example has a youtube link, or says mDFw/featured. Similarly, in some rows, other columns have information that should be in another column.
Thank you!