Hi everyone, I am an Italian university student and I am working on my Master Degree final dissertation, which will be an analysis of the use of Twitter in covid time. For this reason, I should build a database of tweets that meet certain criteria (specific keywords) and that have been published in a specific period of time.
Is there someone who can help me out building this database? I would need the text of the tweets so that I can run a discourse analysis on this corpus. Of course I am not interested in anything else than the text of these tweets (not on the Identity, location or other info about their users). Can anyone use Twitter API for this purpose? Or does anyone already have a corpus that comply with these criteria?
Thank you for your time and attention, please don’t hesitate to contact me!

Francesca Braglia

This is a good tool to use: Collect Twitter Data with Twarc! · Learn Twarc!

And a good place to start with twitter corpora is Beginner's Guide to Twitter Data | Programming Historian

Depending on what kind of discourse analysis you’re planning on doing, it may be better to take an existing corpus of tweets, and “hydrate” it.

2 Likes

Si, ma non ho trovato corpus che presentino i testi dei tweet… potresti indicarmene qualcuno?

If google translate serves me right:

Tweet datasets are usually distributed as a list of IDs, so you will have to get this list, and “hydrate” it - download the full json of the tweet, and extract the tweet text yourself.

This way any tweets that are deleted or any profiles that became private will simply not retrieve.

Twarc can export this data as CSV, which you should be able to import into other corpus analysis tools: Twarc Utilities for Windows · Learn Twarc!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.