I’ve recently started using the twitter premium api and I’m having problems getting the data I need.

I’m trying to access the full archive endpoint to get all tweets for the named user (XXXXX in my code below.)

Twitter specifies that the premium sandbox mode allows for 100 results per request, but for some reason I’m getting 36, regardless of the username I choose?

I’m assuming that the problem is within my code below but I’m struggling to see where!

Many thanks,
Kieran

from searchtweets import load_credentials, gen_rule_payload, ResultStream
import json
import yaml
from datetime import datetime
API_KEY = "XXXXXXXXXXXXXX"
API_SECRET_KEY = "XXXXXXXXXXXXXXXXXX"
DEV_ENVIRONMENT_LABEL = 'prod'
API_SCOPE = 'fullarchive'  # 'fullarchive' for full archive, '30day' for last 31 days

SEARCH_QUERY = 'from: XXXXX'
RESULTS_PER_CALL = 100  # 100 for sandbox, 500 for paid tiers
# format YYYY-MM-DD HH:MM (hour and minutes optional)
FROM_DATE = '2008-01-01'  # format YYYY-MM-DD HH:MM (hour and minutes optional)
TO_DATE = '2021-09-12'

MAX_RESULTS = 8000  # Number of Tweets you want to collect

FILENAME = f"twitter_premium_api_tests/{datetime.now()}.jsonl"  # Where the Tweets should be saved

# Script prints an update to the CLI every time it collected another X Tweets
PRINT_AFTER_X = 10


config = dict(
    search_tweets_api=dict(
        account_type='premium',
        endpoint=fhttps://api.twitter.com/1.1/tweets/search/{API_SCOPE}/{DEV_ENVIRONMENT_LABEL}.json,
        consumer_key=API_KEY,
        consumer_secret=API_SECRET_KEY
    )
)

with open('twitter_keys.yaml', 'w') as config_file:
    yaml.dump(config, config_file, default_flow_style=False)

premium_search_args = load_credentials("twitter_keys.yaml",
                                       yaml_key="search_tweets_api",
                                       env_overwrite=False)

rule = gen_rule_payload(SEARCH_QUERY,
                        results_per_call=RESULTS_PER_CALL,
                        from_date=FROM_DATE,
                        to_date=TO_DATE
                        )

rs = ResultStream(rule_payload=rule,
                  max_results=MAX_RESULTS,
                  **premium_search_args)

with open(FILENAME, 'a', encoding='utf-8') as f:
    n = 0
    for tweet in rs.stream():
        n += 1
        if n % PRINT_AFTER_X == 0:
            print('{0}: {1}'.format(str(n), tweet['created_at']))
        json.dump(tweet, f)
        f.write('\n')
print('done')

It’s 100 tweets or 30 days worth of tweets, whichever is first - so that could maybe explain it? Your code appears correct. Another reason i can think of is maybe deletions or suspended content (country withheld or something) maybe that affects it too.

1 Like

Thanks for checking over our code and providing your thoughts, IgorBrigadir. Very much appreciated. Upon further testing, the programme appears to work on other accounts. But not this one.

I wonder if there is someone at Twitter that can help us to understand this a bit further?
If we are paying the premium fee, but then burn a load of requests on an account like this that doesn’t return the data requested (especially if the account we are looking at says there are 8K tweets in a user’s history), does Twitter provide refunds for what are essentially failed requests?

Or, do they help look into the source of the issue and why an account is not returning data? This would be preferential as a client has asked us to look at their data and we don’t seem to be able to access it.

Many thanks,
Kieran

1 Like

Just as a bit more context; when trying to go back to beginning of person’s history, our request (in Sandbox mode) returns the most recent 36 tweets then stops every time.

We tried searching for a small sample of 100 tweets at various stages prior to 2014 and it returned 0 tweets.

Something seems to go wrong when we try to request data that lies deeper in the user’s history.

1 Like

what’s the user? I’d like to try using v2 (what i suspect is happening is that there was maybe a block of deleted tweets, so you get few results and it’s breaking pagination)

1 Like

Hi IgorBrigadir, thanks again for your support.

I’ve sent a DM to you on Twitter with the username.

Feel free to post any answer here in case others can learn from what you manage to find, but if possible could you keep the user’s name confidential.

1 Like

I tried the same thing with v2 fullarchive search and got the same result - 37 tweets when the user has clearly many more. Pagination just doesn’t happen, counts endpoint reports the same 37 tweets, when the count on the web is a few thousand and timeline loads dozens more.

Interestingly, using GET /2/users/by/username/:username/tweets | Docs | Twitter Developer Platform I can get 3179 out of the 7822 from the user object which is as expected.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.