Hi all, i am a PHD student and have obtained a large set of tweets through the enterprise historic power track and have the data in JSON format. i am struggling to get the JSON file into a dataframe for analysis. I have a JSON file with roughly 60,000 lines in it and can open and read the file though cannot seem to progress much further without a trailing data error or extra data error. I am using Jupyter notebook for the analysis rather than command line and attached a screen shot of the tweet structure. Any guidance would be appreciated.

import pandas as pd
import json
import numpy as np

data =
with open(‘job1.json’, ‘r’) as f:
line = f.readline() # read only the first tweet/line
tweet = json.loads(line) # load it as Python dict
print(json.dumps(tweet, indent=4)) # pretty-print

How different is the enterprise historic power track tweet format compared to v1.1 formats (it looks like it’s the same, but i can’t be sure)? can you upload a small sample of tweets somewhere?

I think there’s a quick and hacky way to do this with twarc-csv (which is designed for v2 Format responses but might work for v1.1 with minimal changes)

Using GitHub - DocNow/twarc-csv: A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV. in a notebook as a command line tool:

In a cell:

!pip install twarc twarc-csv==0.3.1

You may need to restart the kernel after this. (Not this only works with an earlier version of twarc-csv, not latest.)

now:

!twarc2 csv --batch-size 1000 --no-input-tweet-columns job1.json > /dev/null

when you run that you should get back an error:

💔 ERROR: Unexpected Data: 
"retweeted_status.quoted_status.user.listed_count,retweeted_status.user.created_at,retweeted_status.place.bounding_box.type,quoted_status.user.location,retweeted_status.quoted_status.is_quote_status,retweeted_status.quoted_status_permalink.url,retweeted_status.quoted_status.source,retweeted_status.favorited,quoted_status.user.profile_background_image_url,in_reply_to_user_id,quoted_status.user.profile_link_color,retweeted_status.quoted_status.user.following,retweeted_status.quoted_status.user.follow_request_sent,quoted_status.extended_tweet.entities.user_mentions,quoted_status.user.default_profile,retweeted_status.quoted_status.id,user.is_translator,retweeted_status.user.profile_background_image_url_https,retweeted_status.quoted_status.user.time_zone,quoted_status.user.withheld_in_countries,quoted_status.quoted_status_id_str,retweeted_status.extended_entities.media,retweeted_status.user.contributors_enabled,user.profile_sidebar_fill_color,retweeted_status.user.utc_offset,retweeted_status.user.lang,retweeted_status.user.profile_sidebar_fill_color,quoted_status.quote_count,retweeted_status.quoted_status.extended_tweet.display_text_range,geo.type,user.profile_text_color,retweeted_status.quoted_status.user.screen_name,retweeted_status.place.full_name,retweeted_status.quoted_status.extended_entities.media,retweeted_status.is_quote_status,retweeted_status.quoted_status.in_reply_to_user_id,created_at,retweeted_status.filter_level,coordinates,retweeted_status.quoted_status.quoted_status_id,retweeted_status.user.followers_count,retweeted_status.user.profile_link_color,user.utc_offset,retweeted_status.quoted_status.entities.symbols,quoted_status.user.profile_text_color,user.name,user.verified,retweeted_status.user.profile_image_url_https,retweeted_status.quoted_status.user.url,quoted_status.user.followers_count,quoted_status.user.favourites_count,retweeted_status.extended_tweet.display_text_range,quoted_status.favorited,quoted_status.entities.symbols,user.url,retweeted_status.entities.media,retweeted_status.quoted_status.geo,retweeted_status.quoted_status.user.profile_sidebar_fill_color,retweeted_status.quoted_status.user.name,quoted_status.reply_count,retweeted_status.quoted_status.reply_count,quoted_status.id_str,user.followers_count,quoted_status.possibly_sensitive,retweeted_status.user.friends_count,quoted_status_permalink.expanded,entities.symbols,retweeted_status.user.translator_type,user.location,user.statuses_count,retweeted_status.quoted_status.lang,retweeted_status.quoted_status.entities.user_mentions,retweeted_status.quoted_status.user.profile_sidebar_border_color,user.profile_background_image_url_https,retweeted_status.quoted_status.user.profile_background_image_url,retweeted_status.quoted_status.user.location,retweeted_status.source,extended_tweet.full_text,retweeted_status.id_str,quoted_status.entities.hashtags,coordinates.type,retweeted_status.user.default_profile,user.friends_count,retweeted_status.quoted_status.favorited,retweeted_status.quoted_status_permalink.expanded,place.place_type,quoted_status.user.default_profile_image,quoted_status.is_quote_status,retweeted_status.quoted_status.user.profile_text_color,extended_tweet.entities.hashtags,quoted_status.lang,retweeted_status.truncated,retweeted_status.in_reply_to_user_id_str,retweeted_status.display_text_range,quoted_status.user.profile_image_url,quoted_status.text,quoted_status.user.following,retweeted_status.user.is_translator,quoted_status.user.created_at,user.profile_background_color,quoted_status_id_str,retweeted_status.retweeted,retweeted_status.quoted_status.filter_level,quoted_status.user.friends_count,in_reply_to_status_id_str,retweeted_status.quoted_status.user.id_str,quoted_status.extended_tweet.full_text,favorited,quoted_status.user.name,retweeted_status.quoted_status.user.is_translator,user.translator_type,retweeted_status.quoted_status.truncated,retweeted_status.entities.urls,retweeted_status.quoted_status.entities.media,retweeted_status.quoted_status.user.profile_banner_url,retweeted_status.extended_tweet.extended_entities.media,place.country_code,retweeted_status.place.name,quoted_status.user.notifications,retweeted_status.user.default_profile_image,quoted_status.user.verified,geo.coordinates,quoted_status.extended_tweet.entities.symbols,user.created_at,quoted_status.user.profile_image_url_https,retweeted_status.quoted_status.user.id,quoted_status.retweeted,place.name,quoted_status.user.url,quoted_status.entities.media,retweeted_status.quoted_status.user.geo_enabled,user.notifications,retweeted_status.quoted_status.user.followers_count,retweeted_status.user.url,retweeted_status.quoted_status.user.friends_count,extended_tweet.entities.urls,retweeted,retweeted_status.quoted_status.user.default_profile_image,retweet_count,retweeted_status.quoted_status.user.translator_type,place.bounding_box.type,quoted_status_permalink.url,retweeted_status.user.listed_count,retweeted_status.quoted_status.entities.hashtags,entities.urls,user.lang,retweeted_status.quoted_status.text,quoted_status.extended_tweet.entities.media,retweeted_status.retweet_count,is_quote_status,user.profile_background_image_url,user.screen_name,retweeted_status.entities.hashtags,quoted_status.user.description,retweeted_status.user.profile_background_color,user.default_profile,in_reply_to_status_id,quoted_status.extended_tweet.display_text_range,retweeted_status.place.bounding_box.coordinates,retweeted_status.user.id_str,quoted_status.filter_level,retweeted_status.favorite_count,quoted_status.user.geo_enabled,extended_tweet.entities.symbols,user.geo_enabled,retweeted_status.extended_tweet.entities.urls,retweeted_status.quoted_status_permalink.display,retweeted_status.user.geo_enabled,entities.media,retweeted_status.user.profile_text_color,timestamp_ms,retweeted_status.quote_count,retweeted_status.user.protected,place.url,id,retweeted_status.quoted_status.user.statuses_count,quoted_status_permalink.display,retweeted_status.quoted_status.contributors,retweeted_status.geo,quoted_status.user.time_zone,in_reply_to_screen_name,retweeted_status.extended_tweet.entities.media,entities.user_mentions,extended_tweet.entities.media,quoted_status.source,retweeted_status.quoted_status.favorite_count,retweeted_status.contributors,retweeted_status.user.id,quoted_status.user.profile_banner_url,user.profile_sidebar_border_color,user.follow_request_sent,retweeted_status.lang,user.description,filter_level,user.id,quoted_status.user.statuses_count,quoted_status.user.is_translator,user.profile_link_color,retweeted_status.quoted_status.user.profile_use_background_image,quoted_status.user.profile_sidebar_fill_color,retweeted_status.quoted_status.user.profile_background_image_url_https,retweeted_status.user.profile_sidebar_border_color,user.profile_image_url,retweeted_status.extended_tweet.full_text,retweeted_status.quoted_status.extended_tweet.entities.urls,retweeted_status.in_reply_to_status_id_str,quoted_status.in_reply_to_status_id,retweeted_status.user.screen_name,retweeted_status.in_reply_to_screen_name,quoted_status.user.follow_request_sent,quoted_status.user.profile_background_image_url_https,quoted_status.coordinates,quoted_status.retweet_count,id_str,quoted_status.in_reply_to_screen_name,retweeted_status.extended_tweet.entities.hashtags,retweeted_status.quoted_status.user.withheld_in_countries,withheld_in_countries,quoted_status.extended_tweet.extended_entities.media,quoted_status.in_reply_to_status_id_str,favorite_count,user.time_zone,quoted_status.user.protected,retweeted_status.quoted_status.user.description,retweeted_status.user.location,retweeted_status.reply_count,user.profile_image_url_https,retweeted_status.quoted_status.user.profile_background_color,retweeted_status.entities.symbols,quoted_status.user.utc_offset,retweeted_status.place.id,user.default_profile_image,quoted_status.extended_tweet.entities.urls,extended_entities.media,place,coordinates.coordinates,retweeted_status.place.url,quoted_status.id,retweeted_status.user.notifications,retweeted_status.quoted_status.user.profile_image_url_https,retweeted_status.quoted_status_id_str,retweeted_status.quoted_status.id_str,quote_count,quoted_status.user.profile_background_tile,place.country,source,retweeted_status.quoted_status.in_reply_to_screen_name,quoted_status.favorite_count,retweeted_status.quoted_status.in_reply_to_user_id_str,quoted_status.geo,retweeted_status.quoted_status.user.profile_image_url,user.contributors_enabled,user.profile_banner_url,retweeted_status.quoted_status.user.lang,retweeted_status.id,retweeted_status.quoted_status.created_at,retweeted_status.quoted_status.retweet_count,display_text_range,retweeted_status.quoted_status.extended_tweet.extended_entities.media,retweeted_status.in_reply_to_user_id,retweeted_status.user.profile_use_background_image,quoted_status.user.lang,quoted_status.created_at,quoted_status.user.id,retweeted_status.place.country_code,retweeted_status.user.profile_background_image_url,quoted_status.user.translator_type,quoted_status.user.id_str,retweeted_status.quoted_status.extended_tweet.full_text,extended_tweet.entities.user_mentions,retweeted_status.user.follow_request_sent,retweeted_status.quoted_status.retweeted,retweeted_status.user.statuses_count,extended_tweet.display_text_range,retweeted_status.user.following,place.id,retweeted_status.quoted_status.user.created_at,quoted_status.user.profile_background_color,reply_count,quoted_status_id,retweeted_status.quoted_status.user.profile_background_tile,quoted_status.quoted_status_id,retweeted_status.quoted_status.user.notifications,retweeted_status.entities.user_mentions,retweeted_status.place,quoted_status.user.profile_sidebar_border_color,retweeted_status.quoted_status.in_reply_to_status_id_str,quoted_status.extended_entities.media,possibly_sensitive,retweeted_status.extended_tweet.entities.symbols,retweeted_status.place.place_type,user.profile_use_background_image,retweeted_status.quoted_status.quote_count,lang,retweeted_status.quoted_status.display_text_range,retweeted_status.user.profile_banner_url,user.id_str,retweeted_status.user.favourites_count,retweeted_status.possibly_sensitive,retweeted_status.user.verified,extended_tweet.extended_entities.media,geo,user.favourites_count,entities.hashtags,quoted_status.place,place.bounding_box.coordinates,retweeted_status.quoted_status.user.protected,user.listed_count,retweeted_status.created_at,quoted_status.contributors,retweeted_status.quoted_status.user.default_profile,retweeted_status.quoted_status.extended_tweet.entities.symbols,retweeted_status.quoted_status.entities.urls,quoted_status.user.profile_use_background_image,quoted_status.user.listed_count,retweeted_status.quoted_status_id,quoted_status.in_reply_to_user_id,user.protected,retweeted_status.coordinates,quoted_status.extended_tweet.entities.hashtags,retweeted_status.text,retweeted_status.quoted_status.extended_tweet.entities.user_mentions,retweeted_status.quoted_status.extended_tweet.entities.hashtags,in_reply_to_user_id_str,retweeted_status.quoted_status.place,retweeted_status.quoted_status.user.profile_link_color,retweeted_status.quoted_status.in_reply_to_status_id,truncated,user.withheld_in_countries,user.profile_background_tile,retweeted_status.quoted_status.user.verified,quoted_status.truncated,retweeted_status.user.description,quoted_status.user.screen_name,retweeted_status.user.name,quoted_status.in_reply_to_user_id_str,quoted_status.display_text_range,user.following,retweeted_status.extended_tweet.entities.user_mentions,retweeted_status.user.profile_background_tile,retweeted_status.in_reply_to_status_id,retweeted_status.user.profile_image_url,retweeted_status.quoted_status.user.contributors_enabled,retweeted_status.quoted_status.extended_tweet.entities.media,quoted_status.entities.urls,quoted_status.user.contributors_enabled,retweeted_status.quoted_status.quoted_status_id_str,quoted_status.entities.user_mentions,retweeted_status.place.country,retweeted_status.user.withheld_in_countries,retweeted_status.user.time_zone,contributors,place.full_name,retweeted_status.quoted_status.possibly_sensitive,retweeted_status.quoted_status.user.favourites_count,retweeted_status.quoted_status.coordinates,text,retweeted_status.quoted_status.user.utc_offset"
 to fix, add these with --extra-input-columns. Skipping entire batch of 722 tweets!

A really long error with the unexpected v1 format columns.

Now you can copy paste this long string of column names, and run this to get a CSV:

!twarc2 csv --batch-size 1000 --no-input-tweet-columns --extra-input-columns "retweeted_status.user.time_zone,coordinates,user.followers_count,extended_tweet.entities.user_mentions,quoted_status.user.name,retweeted_status.quoted_status.user.translator_type,quoted_status.user.friends_count,quoted_status.retweeted,retweeted_status.quoted_status.user.utc_offset,user.default_profile,in_reply_to_screen_name,retweeted_status.user.following,coordinates.type,user.statuses_count,retweeted_status.entities.hashtags,retweeted_status.quoted_status.user.listed_count,retweeted_status.user.statuses_count,place.bounding_box.coordinates,in_reply_to_status_id,quoted_status_permalink.display,id,retweeted_status.quoted_status_permalink.expanded,user.friends_count,retweeted_status.user.protected,quoted_status.user.id,in_reply_to_status_id_str,retweeted_status.user.follow_request_sent,quoted_status.contributors,user.profile_banner_url,retweeted_status.entities.user_mentions,user.lang,retweeted_status.quoted_status.entities.urls,retweeted_status.extended_tweet.full_text,user.profile_background_image_url,quoted_status.user.profile_background_image_url,retweeted_status.user.url,retweeted_status.user.id_str,retweeted_status.user.screen_name,retweeted_status.user.profile_image_url,user.location,quoted_status.user.profile_background_color,quoted_status.reply_count,quoted_status.user.verified,quoted_status.extended_tweet.entities.hashtags,quoted_status.user.followers_count,user.listed_count,quoted_status.coordinates,quoted_status.possibly_sensitive,user.url,quoted_status.user.contributors_enabled,retweeted_status.extended_tweet.entities.symbols,user.profile_background_color,retweeted_status.in_reply_to_user_id_str,in_reply_to_user_id,retweeted_status.quoted_status.extended_entities.media,retweeted_status.quoted_status.user.time_zone,quoted_status.favorite_count,retweeted_status.is_quote_status,retweeted_status.retweeted,quoted_status.entities.hashtags,retweeted_status.quoted_status.retweet_count,user.time_zone,retweet_count,retweeted_status.quoted_status.extended_tweet.extended_entities.media,in_reply_to_user_id_str,place.country_code,user.translator_type,retweeted_status.in_reply_to_status_id_str,quoted_status.extended_tweet.entities.symbols,quoted_status_id_str,retweeted_status.quoted_status.user.followers_count,quoted_status.in_reply_to_user_id,retweeted_status.user.id,retweeted_status.user.geo_enabled,quoted_status.text,quoted_status.user.created_at,retweeted_status.quoted_status.user.name,retweeted_status.user.friends_count,quoted_status.quoted_status_id,retweeted_status.user.profile_sidebar_border_color,retweeted_status.quoted_status.filter_level,quoted_status.quote_count,quoted_status.user.profile_image_url,retweeted_status.quoted_status.user.notifications,favorite_count,retweeted_status.user.favourites_count,quoted_status.user.screen_name,place.id,user.created_at,user.name,display_text_range,user.id,retweeted_status.extended_tweet.entities.media,retweeted_status.user.listed_count,user.contributors_enabled,retweeted_status.user.default_profile_image,retweeted_status.quoted_status.user.friends_count,extended_tweet.full_text,retweeted_status.coordinates,entities.symbols,possibly_sensitive,retweeted_status.quoted_status.user.follow_request_sent,favorited,user.withheld_in_countries,quoted_status.geo,quoted_status.user.translator_type,retweeted_status.quoted_status.truncated,extended_tweet.entities.hashtags,retweeted_status.place,quoted_status.extended_tweet.display_text_range,quoted_status.favorited,retweeted_status.quoted_status.source,coordinates.coordinates,user.geo_enabled,retweeted_status.quoted_status.user.profile_text_color,retweeted_status.quoted_status.possibly_sensitive,place.url,user.protected,retweeted_status.quoted_status.user.favourites_count,quoted_status.place,filter_level,retweeted_status.filter_level,retweeted_status.quoted_status.display_text_range,quoted_status_permalink.expanded,quoted_status.user.id_str,extended_tweet.extended_entities.media,retweeted_status.quoted_status.extended_tweet.display_text_range,user.following,retweeted_status.quoted_status.user.created_at,quoted_status.in_reply_to_screen_name,quoted_status.is_quote_status,retweeted_status.in_reply_to_status_id,quoted_status.retweet_count,retweeted_status.quoted_status.extended_tweet.entities.media,retweeted_status.extended_tweet.entities.hashtags,retweeted_status.favorited,retweeted_status.quoted_status.entities.user_mentions,retweeted_status.display_text_range,retweeted_status.quoted_status.entities.symbols,quoted_status.user.utc_offset,user.profile_image_url_https,retweeted_status.extended_tweet.entities.urls,retweeted_status.user.profile_use_background_image,retweeted_status.entities.symbols,retweeted_status.user.default_profile,retweeted_status.entities.urls,retweeted_status.quoted_status.place,retweeted_status.quoted_status.extended_tweet.entities.hashtags,retweeted_status.quoted_status.user.profile_background_image_url_https,quoted_status.truncated,extended_tweet.entities.symbols,retweeted_status.quoted_status_permalink.display,retweeted_status.quoted_status.in_reply_to_screen_name,retweeted_status.user.profile_banner_url,retweeted_status.quoted_status.user.is_translator,retweeted_status.quoted_status.is_quote_status,retweeted_status.quoted_status.entities.media,quoted_status.user.listed_count,retweeted_status.quoted_status.favorite_count,retweeted_status.user.profile_image_url_https,timestamp_ms,is_quote_status,retweeted_status.quoted_status_id_str,retweeted_status.id_str,quoted_status.user.default_profile,retweeted_status.quoted_status.entities.hashtags,retweeted_status.place.bounding_box.coordinates,retweeted_status.user.description,retweeted_status.quoted_status.user.profile_image_url_https,place.full_name,retweeted_status.place.country_code,id_str,withheld_in_countries,quoted_status.in_reply_to_status_id,retweeted,user.follow_request_sent,retweeted_status.place.id,geo.type,quoted_status.entities.symbols,user.profile_link_color,quoted_status.entities.media,retweeted_status.quoted_status.user.geo_enabled,retweeted_status.quoted_status.in_reply_to_status_id_str,retweeted_status.user.lang,quoted_status.user.default_profile_image,retweeted_status.created_at,quoted_status.user.profile_text_color,retweeted_status.extended_tweet.extended_entities.media,retweeted_status.quoted_status.user.following,quoted_status.id_str,entities.media,retweeted_status.user.translator_type,lang,retweeted_status.quoted_status.user.statuses_count,retweeted_status.quoted_status.extended_tweet.full_text,user.default_profile_image,retweeted_status.quoted_status.user.profile_sidebar_border_color,quoted_status.user.profile_sidebar_fill_color,quoted_status.user.is_translator,retweeted_status.user.is_translator,reply_count,retweeted_status.quoted_status.in_reply_to_user_id_str,entities.user_mentions,retweeted_status.quoted_status.user.profile_image_url,quote_count,retweeted_status.user.created_at,retweeted_status.lang,retweeted_status.quoted_status.contributors,retweeted_status.place.country,retweeted_status.quoted_status.user.withheld_in_countries,quoted_status.user.time_zone,retweeted_status.user.profile_link_color,retweeted_status.quoted_status_permalink.url,retweeted_status.geo,retweeted_status.quoted_status.extended_tweet.entities.symbols,quoted_status_id,user.description,retweeted_status.quoted_status.favorited,retweeted_status.place.bounding_box.type,quoted_status.user.follow_request_sent,retweeted_status.quoted_status.text,quoted_status.user.lang,retweeted_status.quoted_status.created_at,user.verified,extended_tweet.entities.urls,quoted_status.user.following,retweeted_status.quoted_status.quoted_status_id,quoted_status.user.location,retweeted_status.in_reply_to_user_id,quoted_status.user.profile_image_url_https,quoted_status.user.profile_background_tile,retweeted_status.quoted_status.retweeted,quoted_status.lang,truncated,retweeted_status.entities.media,user.profile_image_url,retweeted_status.quoted_status.user.default_profile,quoted_status.created_at,quoted_status.extended_tweet.full_text,retweeted_status.extended_entities.media,user.id_str,user.favourites_count,retweeted_status.contributors,quoted_status.user.profile_use_background_image,created_at,retweeted_status.in_reply_to_screen_name,place.place_type,retweeted_status.place.url,retweeted_status.extended_tweet.display_text_range,quoted_status.extended_tweet.entities.urls,retweeted_status.quoted_status.reply_count,retweeted_status.quoted_status.user.lang,quoted_status.user.description,quoted_status.filter_level,quoted_status.user.statuses_count,retweeted_status.quoted_status.user.location,source,quoted_status.user.profile_background_image_url_https,quoted_status.quoted_status_id_str,retweeted_status.quoted_status.user.screen_name,quoted_status_permalink.url,place.name,retweeted_status.quoted_status.user.profile_banner_url,retweeted_status.quoted_status.extended_tweet.entities.user_mentions,quoted_status.extended_tweet.extended_entities.media,quoted_status.id,retweeted_status.user.utc_offset,retweeted_status.quoted_status.user.profile_background_image_url,quoted_status.user.notifications,user.notifications,quoted_status.user.profile_banner_url,retweeted_status.quoted_status.user.id,retweeted_status.user.notifications,retweeted_status.quote_count,contributors,place.country,retweeted_status.user.withheld_in_countries,retweeted_status.user.profile_background_image_url,quoted_status.user.url,retweeted_status.retweet_count,retweeted_status.user.profile_background_tile,quoted_status.in_reply_to_status_id_str,user.utc_offset,retweeted_status.user.profile_background_image_url_https,retweeted_status.quoted_status.user.profile_background_color,quoted_status.user.protected,entities.hashtags,retweeted_status.user.profile_text_color,retweeted_status.user.verified,quoted_status.display_text_range,retweeted_status.extended_tweet.entities.user_mentions,retweeted_status.quoted_status.quoted_status_id_str,text,quoted_status.user.favourites_count,retweeted_status.quoted_status.user.contributors_enabled,retweeted_status.source,quoted_status.extended_entities.media,retweeted_status.id,retweeted_status.reply_count,retweeted_status.place.name,quoted_status.user.geo_enabled,quoted_status.source,retweeted_status.quoted_status.id,user.profile_sidebar_fill_color,retweeted_status.user.name,retweeted_status.quoted_status.in_reply_to_user_id,retweeted_status.quoted_status.lang,quoted_status.entities.urls,retweeted_status.quoted_status.in_reply_to_status_id,retweeted_status.user.profile_background_color,quoted_status.entities.user_mentions,retweeted_status.quoted_status.user.profile_use_background_image,retweeted_status.user.profile_sidebar_fill_color,retweeted_status.quoted_status.id_str,retweeted_status.quoted_status.user.default_profile_image,user.profile_sidebar_border_color,retweeted_status.quoted_status.user.profile_link_color,retweeted_status.possibly_sensitive,user.profile_background_tile,extended_tweet.entities.media,retweeted_status.text,retweeted_status.user.followers_count,geo.coordinates,quoted_status.user.profile_sidebar_border_color,retweeted_status.quoted_status.user.profile_background_tile,geo,quoted_status.extended_tweet.entities.user_mentions,user.profile_background_image_url_https,retweeted_status.quoted_status.geo,retweeted_status.favorite_count,retweeted_status.user.contributors_enabled,retweeted_status.quoted_status.extended_tweet.entities.urls,retweeted_status.quoted_status_id,extended_entities.media,retweeted_status.place.place_type,retweeted_status.quoted_status.coordinates,place,retweeted_status.quoted_status.quote_count,retweeted_status.quoted_status.user.url,retweeted_status.user.location,retweeted_status.quoted_status.user.profile_sidebar_fill_color,quoted_status.user.profile_link_color,quoted_status.in_reply_to_user_id_str,extended_tweet.display_text_range,entities.urls,user.is_translator,user.profile_text_color,retweeted_status.quoted_status.user.verified,quoted_status.user.withheld_in_countries,retweeted_status.place.full_name,user.profile_use_background_image,user.screen_name,retweeted_status.quoted_status.user.id_str,retweeted_status.quoted_status.user.description,quoted_status.extended_tweet.entities.media,retweeted_status.quoted_status.user.protected,place.bounding_box.type,retweeted_status.truncated" job1.json job1.csv

It should now have a job1.csv and output some stats about the number of tweets etc.

So that above command without the long list is:

twarc2 csv --batch-size 1000 --no-input-tweet-columns --extra-input-columns "....long list of columns copied directly from the error above here...." input.jsonl output.csv

If you keep getting unknown column errors, adjust the --extra-input-columns to include them and maybe alter the --batch-size 1000 if needed. there are other options that may be useful, see twarc2 csv --help.

I have to point out that this is not the intended use of this tool, twarc2 is for v2 API ONLY, you have v1.1 API, which is not supported - this is just a hacky way to do it. It will not process quoted tweets or retweets as individual tweets, but treat them as being totally contained within the retweet for example.

Either way, feel free to reuse the code if you need to dive deeper: GitHub - DocNow/twarc-csv: A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV. internally it uses pandas json_normalize which may work for you - but you’ll definitely have to process the file in chunks.

Alternatively, you could:

Process your v1.1 file to extract the IDs:

twarc dehydrate job1.json > job1_ids.txt

Hydrate using v2 format (this will take a while):

twarc2 hydrate job1_ids.txt job1_v2.json

Now twarc2 csv will work properly (for this you do need the latest version… pip install --upgrade twarc-csv ):

twarc2 csv job1_v2.json job1_v2.csv

(remember that v2 format is totally different to v1)

H Igor, thank you for your help i will try it out and let you know how i go.

Sure - just don’t expect that exact command to work, you’ll have to follow the process described to let the script figure out the correct columns given your dataset.