Hi, I’m running a full archive search with a premium account. I am attempting to archive all tweets from a user that was created in Dec 2017, with about 16k total. The user is a bot and active on a regular basis (dozens of tweets per week). A search of their tweets on the the full archive endpoint with a premium credential yields about 6k tweets from Nov 2019 until the present day, but absolutely zero before then, for any range of days and times between 2017 and Nov 2019. Searches with dates starting before Nov 2019 and ending after Nov 2019 yield the same tweets up until this weird barrier. An advanced search on the website yields the same thing.
It is contradictory that twitter reports that the user has 16k tweets and yet only 6k comes up during the most recent 7 months. Am I doing something wrong here?
That definitely appears to be strange, can you share the account you’re searching on? Is there a chance the username changed over time?
Hi, thanks for responding. The account is @checkMarkAlert. The available history ends on 15 Nov 2019, even though it should continue backwards to Dec 2017.
I am running the API call with the user id number, not the username, in case it matters. I don’t know if the username changed over time - is there a way to do that, and do user ID’s change at the same time? The twitter profile of checkMarkAlert indicates that it’s been live since december 2017.
You’re right to use the ID, IDs do not change over time.
I think that specific account was something else, but then got repurposed as the bot. There are some replies to the account to removed tweets https://twitter.com/search?q=to%3AcheckMarkAlert%20until%3A2019-02-01&src=typed_query which would suggest it wasn’t the bot all the time.
1 Like
Thank you. I’m new to the API and I blew almost an entire month’s queries trying to make sure that I was doing it right, using the right endpoint, etc. One more question: The profile page for Checkmark Alert states that it has 16k tweets. Does that number include potentially deleted tweets? That number is was what had me convinced that I was wrong: I thought I had identified 6k tweets through 15 Nov 2019 and that I had missed the remaining tweets through Dec 2017 (since we are talking about what I thought was always a bot, since 11/15/2019-present is about a third of the account’s 12/2017-present lifetime, and since I was not aware that deleted tweets might be counted on a user’s profile).
a) Does the number of tweets on a user’s profile include deleted tweets? I guess this means yes.
b) I read somewhere that a full archive request either returns tweets for up to a 30 day range or it returns up to the maximum number of results for that request, whichever comes first. Is that correct? I think I saw a request return queries for a longer range, but I could be mistaken. I was struggling with the API at that point. If so, then can you verify that it is pointless to ever make a historical query with a range longer than 30 days?
c) If I make a request in a date range, and there are more tweets that satisfy the search terms than my maximum request size, does the system guarantee that it returns tweets from the newer end of that range backwards, consecutively? It does not skip matching tweets, right? I am referring to historical searches and not searches up to the present, which might include new statuses that are tweeted as the search is running.
Does that number include potentially deleted tweets?
No but this number can be off when an account deleted a large number of tweets recently - eg: it could still show a count when there are 0 tweets, or something else that is eventually consistent. Another way this count can be off in search is if the account is made private, in that case tweets aren’t included in the index.
full archive request either returns tweets for up to a 30 day range or it returns up to the maximum number of results for that request, whichever comes first. Is that correct?
Yes, for the fullarchive endpoint, each “page”, or 1 request will only ever return 100 tweets in sandbox, or 500 tweets on a paid plan, or, 30 days worth of tweets if that’s smaller. Every successful http request counts, so it’s important to not waste requests making the same request twice, or not saving the next tokens to continue later.
So if an account tweeted 5 times every month for 12 months, it would need 12 requests to return them all. Also if you don’t set fromDate and toDate parameters for fullarchive it behaves like 30day
If there are more tweets than the max, you need to use the next token to paginate, and:
Tweet data are served in reverse chronological order, starting with the most recent Tweet that matches your query.
It won’t skip tweets (unless they were deleted). Hope that helps!
system
Closed
#9
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.