I’m conducting a scientific research for the length of roughly 3 months using content from the stream API. I was wondering if I can recover all the captured data if I only save ID’s of the gathered tweets instead of saving the whole tweet content and creation time. Will I be able to get the full tweet status after 3 months from the ID’s including the content, creation time and possibly Location data if included?



Yes, Tweet IDs alone should be fine (created_at field saves creation time so you don’t need to store that either).

You can “hydrate” tweets later with statuses/lookup all the meta data will be there.

Some things to keep in mind:
Some tweets get deleted - you will not be able to recover these. Users sometimes delete their location information - this will remove Geo metadata from tweets that might have been there when you got the tweet in the stream, but it will no longer be there later. User meta data will change over time too - so the user object embedded in a tweet may be different to what it was 3 months earlier (follow, fave, tweet counts / description maybe etc.)


