How different is the enterprise historic power track tweet format compared to v1.1 formats (it looks like it’s the same, but i can’t be sure)? can you upload a small sample of tweets somewhere?
I think there’s a quick and hacky way to do this with twarc-csv (which is designed for v2 Format responses but might work for v1.1 with minimal changes)
Using GitHub - DocNow/twarc-csv: A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV. in a notebook as a command line tool:
In a cell:
!pip install twarc twarc-csv==0.3.1
You may need to restart the kernel after this. (Not this only works with an earlier version of twarc-csv, not latest.)
now:
!twarc2 csv --batch-size 1000 --no-input-tweet-columns job1.json > /dev/null
when you run that you should get back an error:
💔 ERROR: Unexpected Data:
"retweeted_status.quoted_status.user.listed_count,retweeted_status.user.created_at,retweeted_status.place.bounding_box.type,quoted_status.user.location,retweeted_status.quoted_status.is_quote_status,retweeted_status.quoted_status_permalink.url,retweeted_status.quoted_status.source,retweeted_status.favorited,quoted_status.user.profile_background_image_url,in_reply_to_user_id,quoted_status.user.profile_link_color,retweeted_status.quoted_status.user.following,retweeted_status.quoted_status.user.follow_request_sent,quoted_status.extended_tweet.entities.user_mentions,quoted_status.user.default_profile,retweeted_status.quoted_status.id,user.is_translator,retweeted_status.user.profile_background_image_url_https,retweeted_status.quoted_status.user.time_zone,quoted_status.user.withheld_in_countries,quoted_status.quoted_status_id_str,retweeted_status.extended_entities.media,retweeted_status.user.contributors_enabled,user.profile_sidebar_fill_color,retweeted_status.user.utc_offset,retweeted_status.user.lang,retweeted_status.user.profile_sidebar_fill_color,quoted_status.quote_count,retweeted_status.quoted_status.extended_tweet.display_text_range,geo.type,user.profile_text_color,retweeted_status.quoted_status.user.screen_name,retweeted_status.place.full_name,retweeted_status.quoted_status.extended_entities.media,retweeted_status.is_quote_status,retweeted_status.quoted_status.in_reply_to_user_id,created_at,retweeted_status.filter_level,coordinates,retweeted_status.quoted_status.quoted_status_id,retweeted_status.user.followers_count,retweeted_status.user.profile_link_color,user.utc_offset,retweeted_status.quoted_status.entities.symbols,quoted_status.user.profile_text_color,user.name,user.verified,retweeted_status.user.profile_image_url_https,retweeted_status.quoted_status.user.url,quoted_status.user.followers_count,quoted_status.user.favourites_count,retweeted_status.extended_tweet.display_text_range,quoted_status.favorited,quoted_status.entities.symbols,user.url,retweeted_status.entities.media,retweeted_status.quoted_status.geo,retweeted_status.quoted_status.user.profile_sidebar_fill_color,retweeted_status.quoted_status.user.name,quoted_status.reply_count,retweeted_status.quoted_status.reply_count,quoted_status.id_str,user.followers_count,quoted_status.possibly_sensitive,retweeted_status.user.friends_count,quoted_status_permalink.expanded,entities.symbols,retweeted_status.user.translator_type,user.location,user.statuses_count,retweeted_status.quoted_status.lang,retweeted_status.quoted_status.entities.user_mentions,retweeted_status.quoted_status.user.profile_sidebar_border_color,user.profile_background_image_url_https,retweeted_status.quoted_status.user.profile_background_image_url,retweeted_status.quoted_status.user.location,retweeted_status.source,extended_tweet.full_text,retweeted_status.id_str,quoted_status.entities.hashtags,coordinates.type,retweeted_status.user.default_profile,user.friends_count,retweeted_status.quoted_status.favorited,retweeted_status.quoted_status_permalink.expanded,place.place_type,quoted_status.user.default_profile_image,quoted_status.is_quote_status,retweeted_status.quoted_status.user.profile_text_color,extended_tweet.entities.hashtags,quoted_status.lang,retweeted_status.truncated,retweeted_status.in_reply_to_user_id_str,retweeted_status.display_text_range,quoted_status.user.profile_image_url,quoted_status.text,quoted_status.user.following,retweeted_status.user.is_translator,quoted_status.user.created_at,user.profile_background_color,quoted_status_id_str,retweeted_status.retweeted,retweeted_status.quoted_status.filter_level,quoted_status.user.friends_count,in_reply_to_status_id_str,retweeted_status.quoted_status.user.id_str,quoted_status.extended_tweet.full_text,favorited,quoted_status.user.name,retweeted_status.quoted_status.user.is_translator,user.translator_type,retweeted_status.quoted_status.truncated,retweeted_status.entities.urls,retweeted_status.quoted_status.entities.media,retweeted_status.quoted_status.user.profile_banner_url,retweeted_status.extended_tweet.extended_entities.media,place.country_code,retweeted_status.place.name,quoted_status.user.notifications,retweeted_status.user.default_profile_image,quoted_status.user.verified,geo.coordinates,quoted_status.extended_tweet.entities.symbols,user.created_at,quoted_status.user.profile_image_url_https,retweeted_status.quoted_status.user.id,quoted_status.retweeted,place.name,quoted_status.user.url,quoted_status.entities.media,retweeted_status.quoted_status.user.geo_enabled,user.notifications,retweeted_status.quoted_status.user.followers_count,retweeted_status.user.url,retweeted_status.quoted_status.user.friends_count,extended_tweet.entities.urls,retweeted,retweeted_status.quoted_status.user.default_profile_image,retweet_count,retweeted_status.quoted_status.user.translator_type,place.bounding_box.type,quoted_status_permalink.url,retweeted_status.user.listed_count,retweeted_status.quoted_status.entities.hashtags,entities.urls,user.lang,retweeted_status.quoted_status.text,quoted_status.extended_tweet.entities.media,retweeted_status.retweet_count,is_quote_status,user.profile_background_image_url,user.screen_name,retweeted_status.entities.hashtags,quoted_status.user.description,retweeted_status.user.profile_background_color,user.default_profile,in_reply_to_status_id,quoted_status.extended_tweet.display_text_range,retweeted_status.place.bounding_box.coordinates,retweeted_status.user.id_str,quoted_status.filter_level,retweeted_status.favorite_count,quoted_status.user.geo_enabled,extended_tweet.entities.symbols,user.geo_enabled,retweeted_status.extended_tweet.entities.urls,retweeted_status.quoted_status_permalink.display,retweeted_status.user.geo_enabled,entities.media,retweeted_status.user.profile_text_color,timestamp_ms,retweeted_status.quote_count,retweeted_status.user.protected,place.url,id,retweeted_status.quoted_status.user.statuses_count,quoted_status_permalink.display,retweeted_status.quoted_status.contributors,retweeted_status.geo,quoted_status.user.time_zone,in_reply_to_screen_name,retweeted_status.extended_tweet.entities.media,entities.user_mentions,extended_tweet.entities.media,quoted_status.source,retweeted_status.quoted_status.favorite_count,retweeted_status.contributors,retweeted_status.user.id,quoted_status.user.profile_banner_url,user.profile_sidebar_border_color,user.follow_request_sent,retweeted_status.lang,user.description,filter_level,user.id,quoted_status.user.statuses_count,quoted_status.user.is_translator,user.profile_link_color,retweeted_status.quoted_status.user.profile_use_background_image,quoted_status.user.profile_sidebar_fill_color,retweeted_status.quoted_status.user.profile_background_image_url_https,retweeted_status.user.profile_sidebar_border_color,user.profile_image_url,retweeted_status.extended_tweet.full_text,retweeted_status.quoted_status.extended_tweet.entities.urls,retweeted_status.in_reply_to_status_id_str,quoted_status.in_reply_to_status_id,retweeted_status.user.screen_name,retweeted_status.in_reply_to_screen_name,quoted_status.user.follow_request_sent,quoted_status.user.profile_background_image_url_https,quoted_status.coordinates,quoted_status.retweet_count,id_str,quoted_status.in_reply_to_screen_name,retweeted_status.extended_tweet.entities.hashtags,retweeted_status.quoted_status.user.withheld_in_countries,withheld_in_countries,quoted_status.extended_tweet.extended_entities.media,quoted_status.in_reply_to_status_id_str,favorite_count,user.time_zone,quoted_status.user.protected,retweeted_status.quoted_status.user.description,retweeted_status.user.location,retweeted_status.reply_count,user.profile_image_url_https,retweeted_status.quoted_status.user.profile_background_color,retweeted_status.entities.symbols,quoted_status.user.utc_offset,retweeted_status.place.id,user.default_profile_image,quoted_status.extended_tweet.entities.urls,extended_entities.media,place,coordinates.coordinates,retweeted_status.place.url,quoted_status.id,retweeted_status.user.notifications,retweeted_status.quoted_status.user.profile_image_url_https,retweeted_status.quoted_status_id_str,retweeted_status.quoted_status.id_str,quote_count,quoted_status.user.profile_background_tile,place.country,source,retweeted_status.quoted_status.in_reply_to_screen_name,quoted_status.favorite_count,retweeted_status.quoted_status.in_reply_to_user_id_str,quoted_status.geo,retweeted_status.quoted_status.user.profile_image_url,user.contributors_enabled,user.profile_banner_url,retweeted_status.quoted_status.user.lang,retweeted_status.id,retweeted_status.quoted_status.created_at,retweeted_status.quoted_status.retweet_count,display_text_range,retweeted_status.quoted_status.extended_tweet.extended_entities.media,retweeted_status.in_reply_to_user_id,retweeted_status.user.profile_use_background_image,quoted_status.user.lang,quoted_status.created_at,quoted_status.user.id,retweeted_status.place.country_code,retweeted_status.user.profile_background_image_url,quoted_status.user.translator_type,quoted_status.user.id_str,retweeted_status.quoted_status.extended_tweet.full_text,extended_tweet.entities.user_mentions,retweeted_status.user.follow_request_sent,retweeted_status.quoted_status.retweeted,retweeted_status.user.statuses_count,extended_tweet.display_text_range,retweeted_status.user.following,place.id,retweeted_status.quoted_status.user.created_at,quoted_status.user.profile_background_color,reply_count,quoted_status_id,retweeted_status.quoted_status.user.profile_background_tile,quoted_status.quoted_status_id,retweeted_status.quoted_status.user.notifications,retweeted_status.entities.user_mentions,retweeted_status.place,quoted_status.user.profile_sidebar_border_color,retweeted_status.quoted_status.in_reply_to_status_id_str,quoted_status.extended_entities.media,possibly_sensitive,retweeted_status.extended_tweet.entities.symbols,retweeted_status.place.place_type,user.profile_use_background_image,retweeted_status.quoted_status.quote_count,lang,retweeted_status.quoted_status.display_text_range,retweeted_status.user.profile_banner_url,user.id_str,retweeted_status.user.favourites_count,retweeted_status.possibly_sensitive,retweeted_status.user.verified,extended_tweet.extended_entities.media,geo,user.favourites_count,entities.hashtags,quoted_status.place,place.bounding_box.coordinates,retweeted_status.quoted_status.user.protected,user.listed_count,retweeted_status.created_at,quoted_status.contributors,retweeted_status.quoted_status.user.default_profile,retweeted_status.quoted_status.extended_tweet.entities.symbols,retweeted_status.quoted_status.entities.urls,quoted_status.user.profile_use_background_image,quoted_status.user.listed_count,retweeted_status.quoted_status_id,quoted_status.in_reply_to_user_id,user.protected,retweeted_status.coordinates,quoted_status.extended_tweet.entities.hashtags,retweeted_status.text,retweeted_status.quoted_status.extended_tweet.entities.user_mentions,retweeted_status.quoted_status.extended_tweet.entities.hashtags,in_reply_to_user_id_str,retweeted_status.quoted_status.place,retweeted_status.quoted_status.user.profile_link_color,retweeted_status.quoted_status.in_reply_to_status_id,truncated,user.withheld_in_countries,user.profile_background_tile,retweeted_status.quoted_status.user.verified,quoted_status.truncated,retweeted_status.user.description,quoted_status.user.screen_name,retweeted_status.user.name,quoted_status.in_reply_to_user_id_str,quoted_status.display_text_range,user.following,retweeted_status.extended_tweet.entities.user_mentions,retweeted_status.user.profile_background_tile,retweeted_status.in_reply_to_status_id,retweeted_status.user.profile_image_url,retweeted_status.quoted_status.user.contributors_enabled,retweeted_status.quoted_status.extended_tweet.entities.media,quoted_status.entities.urls,quoted_status.user.contributors_enabled,retweeted_status.quoted_status.quoted_status_id_str,quoted_status.entities.user_mentions,retweeted_status.place.country,retweeted_status.user.withheld_in_countries,retweeted_status.user.time_zone,contributors,place.full_name,retweeted_status.quoted_status.possibly_sensitive,retweeted_status.quoted_status.user.favourites_count,retweeted_status.quoted_status.coordinates,text,retweeted_status.quoted_status.user.utc_offset"
to fix, add these with --extra-input-columns. Skipping entire batch of 722 tweets!
A really long error with the unexpected v1 format columns.
Now you can copy paste this long string of column names, and run this to get a CSV:
!twarc2 csv --batch-size 1000 --no-input-tweet-columns --extra-input-columns "retweeted_status.user.time_zone,coordinates,user.followers_count,extended_tweet.entities.user_mentions,quoted_status.user.name,retweeted_status.quoted_status.user.translator_type,quoted_status.user.friends_count,quoted_status.retweeted,retweeted_status.quoted_status.user.utc_offset,user.default_profile,in_reply_to_screen_name,retweeted_status.user.following,coordinates.type,user.statuses_count,retweeted_status.entities.hashtags,retweeted_status.quoted_status.user.listed_count,retweeted_status.user.statuses_count,place.bounding_box.coordinates,in_reply_to_status_id,quoted_status_permalink.display,id,retweeted_status.quoted_status_permalink.expanded,user.friends_count,retweeted_status.user.protected,quoted_status.user.id,in_reply_to_status_id_str,retweeted_status.user.follow_request_sent,quoted_status.contributors,user.profile_banner_url,retweeted_status.entities.user_mentions,user.lang,retweeted_status.quoted_status.entities.urls,retweeted_status.extended_tweet.full_text,user.profile_background_image_url,quoted_status.user.profile_background_image_url,retweeted_status.user.url,retweeted_status.user.id_str,retweeted_status.user.screen_name,retweeted_status.user.profile_image_url,user.location,quoted_status.user.profile_background_color,quoted_status.reply_count,quoted_status.user.verified,quoted_status.extended_tweet.entities.hashtags,quoted_status.user.followers_count,user.listed_count,quoted_status.coordinates,quoted_status.possibly_sensitive,user.url,quoted_status.user.contributors_enabled,retweeted_status.extended_tweet.entities.symbols,user.profile_background_color,retweeted_status.in_reply_to_user_id_str,in_reply_to_user_id,retweeted_status.quoted_status.extended_entities.media,retweeted_status.quoted_status.user.time_zone,quoted_status.favorite_count,retweeted_status.is_quote_status,retweeted_status.retweeted,quoted_status.entities.hashtags,retweeted_status.quoted_status.retweet_count,user.time_zone,retweet_count,retweeted_status.quoted_status.extended_tweet.extended_entities.media,in_reply_to_user_id_str,place.country_code,user.translator_type,retweeted_status.in_reply_to_status_id_str,quoted_status.extended_tweet.entities.symbols,quoted_status_id_str,retweeted_status.quoted_status.user.followers_count,quoted_status.in_reply_to_user_id,retweeted_status.user.id,retweeted_status.user.geo_enabled,quoted_status.text,quoted_status.user.created_at,retweeted_status.quoted_status.user.name,retweeted_status.user.friends_count,quoted_status.quoted_status_id,retweeted_status.user.profile_sidebar_border_color,retweeted_status.quoted_status.filter_level,quoted_status.quote_count,quoted_status.user.profile_image_url,retweeted_status.quoted_status.user.notifications,favorite_count,retweeted_status.user.favourites_count,quoted_status.user.screen_name,place.id,user.created_at,user.name,display_text_range,user.id,retweeted_status.extended_tweet.entities.media,retweeted_status.user.listed_count,user.contributors_enabled,retweeted_status.user.default_profile_image,retweeted_status.quoted_status.user.friends_count,extended_tweet.full_text,retweeted_status.coordinates,entities.symbols,possibly_sensitive,retweeted_status.quoted_status.user.follow_request_sent,favorited,user.withheld_in_countries,quoted_status.geo,quoted_status.user.translator_type,retweeted_status.quoted_status.truncated,extended_tweet.entities.hashtags,retweeted_status.place,quoted_status.extended_tweet.display_text_range,quoted_status.favorited,retweeted_status.quoted_status.source,coordinates.coordinates,user.geo_enabled,retweeted_status.quoted_status.user.profile_text_color,retweeted_status.quoted_status.possibly_sensitive,place.url,user.protected,retweeted_status.quoted_status.user.favourites_count,quoted_status.place,filter_level,retweeted_status.filter_level,retweeted_status.quoted_status.display_text_range,quoted_status_permalink.expanded,quoted_status.user.id_str,extended_tweet.extended_entities.media,retweeted_status.quoted_status.extended_tweet.display_text_range,user.following,retweeted_status.quoted_status.user.created_at,quoted_status.in_reply_to_screen_name,quoted_status.is_quote_status,retweeted_status.in_reply_to_status_id,quoted_status.retweet_count,retweeted_status.quoted_status.extended_tweet.entities.media,retweeted_status.extended_tweet.entities.hashtags,retweeted_status.favorited,retweeted_status.quoted_status.entities.user_mentions,retweeted_status.display_text_range,retweeted_status.quoted_status.entities.symbols,quoted_status.user.utc_offset,user.profile_image_url_https,retweeted_status.extended_tweet.entities.urls,retweeted_status.user.profile_use_background_image,retweeted_status.entities.symbols,retweeted_status.user.default_profile,retweeted_status.entities.urls,retweeted_status.quoted_status.place,retweeted_status.quoted_status.extended_tweet.entities.hashtags,retweeted_status.quoted_status.user.profile_background_image_url_https,quoted_status.truncated,extended_tweet.entities.symbols,retweeted_status.quoted_status_permalink.display,retweeted_status.quoted_status.in_reply_to_screen_name,retweeted_status.user.profile_banner_url,retweeted_status.quoted_status.user.is_translator,retweeted_status.quoted_status.is_quote_status,retweeted_status.quoted_status.entities.media,quoted_status.user.listed_count,retweeted_status.quoted_status.favorite_count,retweeted_status.user.profile_image_url_https,timestamp_ms,is_quote_status,retweeted_status.quoted_status_id_str,retweeted_status.id_str,quoted_status.user.default_profile,retweeted_status.quoted_status.entities.hashtags,retweeted_status.place.bounding_box.coordinates,retweeted_status.user.description,retweeted_status.quoted_status.user.profile_image_url_https,place.full_name,retweeted_status.place.country_code,id_str,withheld_in_countries,quoted_status.in_reply_to_status_id,retweeted,user.follow_request_sent,retweeted_status.place.id,geo.type,quoted_status.entities.symbols,user.profile_link_color,quoted_status.entities.media,retweeted_status.quoted_status.user.geo_enabled,retweeted_status.quoted_status.in_reply_to_status_id_str,retweeted_status.user.lang,quoted_status.user.default_profile_image,retweeted_status.created_at,quoted_status.user.profile_text_color,retweeted_status.extended_tweet.extended_entities.media,retweeted_status.quoted_status.user.following,quoted_status.id_str,entities.media,retweeted_status.user.translator_type,lang,retweeted_status.quoted_status.user.statuses_count,retweeted_status.quoted_status.extended_tweet.full_text,user.default_profile_image,retweeted_status.quoted_status.user.profile_sidebar_border_color,quoted_status.user.profile_sidebar_fill_color,quoted_status.user.is_translator,retweeted_status.user.is_translator,reply_count,retweeted_status.quoted_status.in_reply_to_user_id_str,entities.user_mentions,retweeted_status.quoted_status.user.profile_image_url,quote_count,retweeted_status.user.created_at,retweeted_status.lang,retweeted_status.quoted_status.contributors,retweeted_status.place.country,retweeted_status.quoted_status.user.withheld_in_countries,quoted_status.user.time_zone,retweeted_status.user.profile_link_color,retweeted_status.quoted_status_permalink.url,retweeted_status.geo,retweeted_status.quoted_status.extended_tweet.entities.symbols,quoted_status_id,user.description,retweeted_status.quoted_status.favorited,retweeted_status.place.bounding_box.type,quoted_status.user.follow_request_sent,retweeted_status.quoted_status.text,quoted_status.user.lang,retweeted_status.quoted_status.created_at,user.verified,extended_tweet.entities.urls,quoted_status.user.following,retweeted_status.quoted_status.quoted_status_id,quoted_status.user.location,retweeted_status.in_reply_to_user_id,quoted_status.user.profile_image_url_https,quoted_status.user.profile_background_tile,retweeted_status.quoted_status.retweeted,quoted_status.lang,truncated,retweeted_status.entities.media,user.profile_image_url,retweeted_status.quoted_status.user.default_profile,quoted_status.created_at,quoted_status.extended_tweet.full_text,retweeted_status.extended_entities.media,user.id_str,user.favourites_count,retweeted_status.contributors,quoted_status.user.profile_use_background_image,created_at,retweeted_status.in_reply_to_screen_name,place.place_type,retweeted_status.place.url,retweeted_status.extended_tweet.display_text_range,quoted_status.extended_tweet.entities.urls,retweeted_status.quoted_status.reply_count,retweeted_status.quoted_status.user.lang,quoted_status.user.description,quoted_status.filter_level,quoted_status.user.statuses_count,retweeted_status.quoted_status.user.location,source,quoted_status.user.profile_background_image_url_https,quoted_status.quoted_status_id_str,retweeted_status.quoted_status.user.screen_name,quoted_status_permalink.url,place.name,retweeted_status.quoted_status.user.profile_banner_url,retweeted_status.quoted_status.extended_tweet.entities.user_mentions,quoted_status.extended_tweet.extended_entities.media,quoted_status.id,retweeted_status.user.utc_offset,retweeted_status.quoted_status.user.profile_background_image_url,quoted_status.user.notifications,user.notifications,quoted_status.user.profile_banner_url,retweeted_status.quoted_status.user.id,retweeted_status.user.notifications,retweeted_status.quote_count,contributors,place.country,retweeted_status.user.withheld_in_countries,retweeted_status.user.profile_background_image_url,quoted_status.user.url,retweeted_status.retweet_count,retweeted_status.user.profile_background_tile,quoted_status.in_reply_to_status_id_str,user.utc_offset,retweeted_status.user.profile_background_image_url_https,retweeted_status.quoted_status.user.profile_background_color,quoted_status.user.protected,entities.hashtags,retweeted_status.user.profile_text_color,retweeted_status.user.verified,quoted_status.display_text_range,retweeted_status.extended_tweet.entities.user_mentions,retweeted_status.quoted_status.quoted_status_id_str,text,quoted_status.user.favourites_count,retweeted_status.quoted_status.user.contributors_enabled,retweeted_status.source,quoted_status.extended_entities.media,retweeted_status.id,retweeted_status.reply_count,retweeted_status.place.name,quoted_status.user.geo_enabled,quoted_status.source,retweeted_status.quoted_status.id,user.profile_sidebar_fill_color,retweeted_status.user.name,retweeted_status.quoted_status.in_reply_to_user_id,retweeted_status.quoted_status.lang,quoted_status.entities.urls,retweeted_status.quoted_status.in_reply_to_status_id,retweeted_status.user.profile_background_color,quoted_status.entities.user_mentions,retweeted_status.quoted_status.user.profile_use_background_image,retweeted_status.user.profile_sidebar_fill_color,retweeted_status.quoted_status.id_str,retweeted_status.quoted_status.user.default_profile_image,user.profile_sidebar_border_color,retweeted_status.quoted_status.user.profile_link_color,retweeted_status.possibly_sensitive,user.profile_background_tile,extended_tweet.entities.media,retweeted_status.text,retweeted_status.user.followers_count,geo.coordinates,quoted_status.user.profile_sidebar_border_color,retweeted_status.quoted_status.user.profile_background_tile,geo,quoted_status.extended_tweet.entities.user_mentions,user.profile_background_image_url_https,retweeted_status.quoted_status.geo,retweeted_status.favorite_count,retweeted_status.user.contributors_enabled,retweeted_status.quoted_status.extended_tweet.entities.urls,retweeted_status.quoted_status_id,extended_entities.media,retweeted_status.place.place_type,retweeted_status.quoted_status.coordinates,place,retweeted_status.quoted_status.quote_count,retweeted_status.quoted_status.user.url,retweeted_status.user.location,retweeted_status.quoted_status.user.profile_sidebar_fill_color,quoted_status.user.profile_link_color,quoted_status.in_reply_to_user_id_str,extended_tweet.display_text_range,entities.urls,user.is_translator,user.profile_text_color,retweeted_status.quoted_status.user.verified,quoted_status.user.withheld_in_countries,retweeted_status.place.full_name,user.profile_use_background_image,user.screen_name,retweeted_status.quoted_status.user.id_str,retweeted_status.quoted_status.user.description,quoted_status.extended_tweet.entities.media,retweeted_status.quoted_status.user.protected,place.bounding_box.type,retweeted_status.truncated" job1.json job1.csv
It should now have a job1.csv and output some stats about the number of tweets etc.
So that above command without the long list is:
twarc2 csv --batch-size 1000 --no-input-tweet-columns --extra-input-columns "....long list of columns copied directly from the error above here...." input.jsonl output.csv
If you keep getting unknown column errors, adjust the --extra-input-columns to include them and maybe alter the --batch-size 1000 if needed. there are other options that may be useful, see twarc2 csv --help.
I have to point out that this is not the intended use of this tool, twarc2 is for v2 API ONLY, you have v1.1 API, which is not supported - this is just a hacky way to do it. It will not process quoted tweets or retweets as individual tweets, but treat them as being totally contained within the retweet for example.
Either way, feel free to reuse the code if you need to dive deeper: GitHub - DocNow/twarc-csv: A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV. internally it uses pandas json_normalize which may work for you - but you’ll definitely have to process the file in chunks.
Alternatively, you could:
Process your v1.1 file to extract the IDs:
twarc dehydrate job1.json > job1_ids.txt
Hydrate using v2 format (this will take a while):
twarc2 hydrate job1_ids.txt job1_v2.json
Now twarc2 csv will work properly (for this you do need the latest version… pip install --upgrade twarc-csv ):
twarc2 csv job1_v2.json job1_v2.csv
(remember that v2 format is totally different to v1)