Hello,

I am learning the course “getting started with the twitter api v2 for academic research”. Specifically, how to write code in Python to get Twitter data (module 6).

I installed twarc, imported the twarc library and the Twarc2 and expansions classes. Then, I wrote the following in the IDLE of Python 3.9.6:

from twarc import Twarc2, expansions
import datetime
import json

# Replace your bearer token below (I replaced it)
client = Twarc2(bearer_token="XXXXX")


def main():
    # Specify the start time in UTC for the time period you want Tweets from
    start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)

    # Specify the end time in UTC for the time period you want Tweets from
    end_time = datetime.datetime(2021, 5, 30, 0, 0, 0, 0, datetime.timezone.utc)

    # This is where we specify our query as discussed in module 5
    query = "from:twitterdev -is:retweet"

    # The search_all method call the full-archive search endpoint to get Tweets based on the query, start and end times
    search_results = client.search_all(query=query, start_time=start_time, end_time=end_time, max_results=100)

    # Twarc returns all Tweets for the criteria set above, so we page through the results
    for page in search_results:
        # The Twitter API v2 returns the Tweet information and the user, media etc.  separately
        # so we use expansions.flatten to get all the information in a single JSON
        result = expansions.flatten(page)
        for tweet in result:
            # Here we are printing the full Tweet object JSON to the console
            print(json.dumps(tweet))


if __name__ == "__main__":
    main()

How can I continue this code to save results in a JSON file on my hard drive?

Please, could someone help me?

Thanks!

1 Like

twarc was actually designed to be used as a command line tool without having to write code, but this wasn’t documented in the twitter course - we have twarc docs here for that twarc2 (en) - twarc and Tutorials - twarc so if you like, you can skip the python entirely and do this in the terminal:

Setup twarc with a useful csv plugin:

pip install --upgrade twarc twarc-csv

Add your bearer token:

twarc2 configure

Make a search using the academic access endpoint (--archive) and save it as results.jsonl json file with 1 API response per line:

twarc2 search --archive --start-time "2021-01-01" --end-time "2021-05-30" "from:twitterdev -is:retweet" results.jsonl

And then convert the json to CSV to explore:

twarc2 csv results.jsonl results.csv

Alternatively if you want to stick with the code, if this example iterates over each tweet, printing it to screen,

    # Twarc returns all Tweets for the criteria set above, so we page through the results
    for page in search_results:
        # The Twitter API v2 returns the Tweet information and the user, media etc.  separately
        # so we use expansions.flatten to get all the information in a single JSON
        result = expansions.flatten(page)
        for tweet in result:
            # Here we are printing the full Tweet object JSON to the console
            print(json.dumps(tweet))

you can replace the print with a write to a file and write 1 tweet per line

    # Twarc returns all Tweets for the criteria set above, so we page through the results
    for page in search_results:
        # The Twitter API v2 returns the Tweet information and the user, media etc.  separately
        # so we use expansions.flatten to get all the information in a single JSON
        result = expansions.flatten(page)
        for tweet in result:
            with open("results.jsonl", "w+" ) as f:
                # Here we are writing 1 Tweet object JSON per line
                f.write(json.dumps(tweet) + "\n")

or write 1 API response per line:

    # Twarc returns all Tweets for the criteria set above, so we page through the results
    for page in search_results:
        # The Twitter API v2 returns the Tweet information and the user, media etc.  separately
        # Here we are writing 1 of these self contained responses with all metadata per line:
        with open("results.jsonl", "w+" ) as f:
            f.write(json.dumps(page) + "\n")

By default this is exactly what the command line in twarc2 command does (writes 1 original response per line) - if you want 1 tweet per line, the command:

twarc2 flatten results.jsonl tweet_per_line.jsonl

will convert this to 1 tweet per line

1 Like

@IgorBrigadir

I have tried with the code and it works. Next time I will try with the command line tool.

Thanks a lot! Cheers!

1 Like