I want to run a nightly CRON that gets & saves all tweets (by search term) since the last CRON run. Based on the documentation, this seems like the right CRON approach:
– query my DB to get the last stored tweet id and use this as the since_id in my initial twitter request. This says, “I only want tweets since this one.”
– Twitter then begins returning paginated tweets, starting with the MOST recent tweets, and working back toward my since_id. With each new page request, I then use [next_results] => ?max_id=########## to get the next page of tweets.
This should then keep returning pages of tweets, working toward my since_id. Is this correct?
The issue I see with this approach is that were something to interrupt the CRON, or were I to run out of requests before reaching my since_id, there will be a gap of tweets that were never gotten, and the next CRON run will set the since_id at the maximum id, never then retrieving that missing set.
I’m mainly trying to check whether I am understanding this correctly, and if there is a better way to accomplish what I want without missing tweets.