Hello to everyone!

This problem appeared without any reason:
rate limit exceeded: sleeping 837.8894987106323 secs

why do I see this message??

  1. I did not exceed the rate limit
  2. I had not used Twitter to download anlthing for two days.

Can I try again to download tweets or will it block me?

Thank you in advance!
Sofia

What library / code are you using and what endpoint are you calling? This looks like a message from a library, not the API - sometimes 1 call to a function can make many calls to the API

1 Like

@IgorBrigadir

Iam using Twarc2

Academic research product truck V2

Ah, what exact twarc command did you run to get that rate limit error?

In case it’s an older version - there was a bug that caused twarc to use more calls than needed, but the latest version fixed it.

pip install --upgrade twarc

to upgrade.

1 Like

@IgorBrigadir
I don’t think so… I use the latest version in Python code, not from the command line.

I am doing:

import datetime
import itertools

from twarc.client2 import Twarc2
from twarc.expansions import flatten

# Your bearer token here
t = Twarc2(bearer_token="AAAAA...zzzzz")

# Start and end times must be in UTC
start_time = datetime.datetime(2006, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 4, 27, 0, 0, 0, 0, datetime.timezone.utc)

# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
search_results = t.search_all(query="tatemodern lang:en -is:retweet", start_time=start_time,
                              end_time=end_time, max_results=10)

# Get just 1 page of results instead of iterating over everything in search_results:
for page in itertools.islice(search_results, 1):
    for tweet in flatten(page)['data']:
        # Do something with the tweet
        print(tweet)

Output:

Rate Limit:
rate limit exceeded: sleeping 882.8679494857788 secs

It downloads a number from tweets and stops at this point.
also does not store them in a JSON file. It only creates an empty json file, because it stops.

What can I do?
Can you help me?

Thank you!
Sofia

You would still need to update the twarc package - both the command line tool and using twarc as a library use the package version - these two commands, should show the same version, v2.1.1

It appears you have a slightly older version - 2.0.13 to upgrade run:

pip install --upgrade twarc

then check:

pip show twarc

should show:

Name: twarc
Version: 2.1.1
...
twarc2 version

should show:

twarc v2.1.1

The reason why there is no output file is that the code you’re running does not save any files - it prints tweets to output. Also you are requesting only 10 results per call, which is ok for a test but not optimal, so you will use a lot of calls to cover the time range (which is all of twitter’s history currently, as the dates are)

This is a more optimal code snippet that should work for you:

import datetime
import itertools
import json

from twarc.client2 import Twarc2
from twarc.expansions import flatten

# Your bearer token here
t = Twarc2(bearer_token="AAA...zzz")

# Start and end times must be in UTC
start_time = datetime.datetime(2006, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 4, 27, 0, 0, 0, 0, datetime.timezone.utc)

query = "tatemodern lang:en -is:retweet"

# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
search_results = t.search_all(query=query, start_time=start_time,
                              end_time=end_time, max_results=500)

# Write all results to `results.jsonl`, one response per line:
for page in search_results:
    with open("results.jsonl", "a", encoding="utf8") as json_file:
        json.dump(page, json_file)
        json_file.write("\n")

This will write a file results.jsonl that you can later process to extract whatever you need.

2 Likes

It works!!! Thank you so much for your help!!! :slightly_smiling_face: :slightly_smiling_face:

Sofia!

1 Like

@IgorBrigadir Thanks to your team I continue to collect data successfully for my PhD thesis!!

I get the data I need but I want it in a more pretty-print JSON format than the one below:

I use an automated JSONtoCSV Converter desktop app and it needs a better json structure as input file like that, or something like that below:

After that the converter will create a clearer, editable CSV file in columns.

Here is my code:

import datetime
import itertools
import json

from twarc.client2 import Twarc2
from twarc.expansions import flatten

# Your bearer token here
t = Twarc2(bearer_token="AA..zzzz")

# Start and end times must be in UTC
start_time = datetime.datetime(2006, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 4, 27, 0, 0, 0, 0, datetime.timezone.utc)

query = "olafureliasson lang:en -is:retweet"

# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
search_results = t.search_all(query=query, start_time=start_time,
                              end_time=end_time, max_results=100)


# Get just 1 page of results instead of iterating over everything in search_results:
for page in itertools.islice(search_results, 1):
    for tweet in flatten(page)[1]:
        # Do something with the tweet
        print(tweet)

# Write all results to `results.jsonl`, one response per line:
for page in search_results:
    with open("Olafur.json", "a", encoding="utf8") as json_file:
        json.dump(page, json_file)
        json_file.write("\n")

How can I create a pretty print json file with my code??
I want to use this json file as input to my converter.

I hope I have described it clearly!

Thank you in advance… :slightly_smiling_face:
Sofia

Great to hear the data collection works!

I do not recommend processing the json to “pretty print” format. It makes working with the data significantly more difficult.

twarc can convert the tweet specific data much more effectively using twarc-csv

pip install twarc-csv
twarc2 csv Olafur.json Olafur.csv

or to run in in a script:

from twarc_csv import CSVConverter

with open("input.jsonl", "r") as infile:
    with open("output.csv", "w") as outfile:
        converter = CSVConverter(infile, outfile, json_encode_all=False, json_encode_lists=True, json_encode_text=False, inline_referenced_tweets=True, allow_duplicates=False, batch_size=1000)
        converter.process()

should give the results to you in CSV you can read.

If you need to preview the data just to look at it, i can recommend jq jq or copy paste a random line of json into https://jsoneditoronline.org/ tree view to explore. But these are for debugging. You should use twarc-csv for converting to CSV, not any other third party tool - as twarc-csv does some tweet data specific processing steps.

1 Like

Why does this error return? I just copied and customized the script to my data… :neutral_face:

Oh, sorry - my snippet was for an older version of twarc-csv.

This should work instead:

from twarc_csv import CSVConverter

with open("input.jsonl", "r") as infile:
    with open("output.csv", "w") as outfile:
        converter = CSVConverter(infile, outfile, json_encode_all=False, json_encode_lists=True, json_encode_text=False, inline_referenced_tweets=True, allow_duplicates=False, batch_size=1000)
        converter.process()

(edited above post too)

2 Likes

Oh thank you!
It works, but it doesn’t creates a clear csv like:

created_at, author.description, author.entities.description, url … (etc)
17-8-2021, sofia v, abababab, http\…
17-8-2021, sofia a, vbvbvbvb, http\…

  • Currently output:
    out

is not editable in excel as csv… :neutral_face:
It confused.
Is it possible to make it in a better csv format with code ??

Sofia

That looks like it’s ok - but maybe there’s still some issues with formats in Excel. Does it work if you import the csv into google sheets?

By default it outputs everything, so it may help to specify a subset of columns. In code, that would be specified with output_columns:

from twarc_csv import CSVConverter

with open("input.jsonl", "r") as infile:
    with open("output.csv", "w") as outfile:
        converter = CSVConverter(infile, outfile, json_encode_all=False, json_encode_lists=True, json_encode_text=False, inline_referenced_tweets=True, allow_duplicates=False, batch_size=1000, output_columns="id,text,created_at,author_id")
        converter.process()

These are all the possible output columns: twarc-csv/dataframe_converter.py at main · DocNow/twarc-csv · GitHub

2 Likes

Good idea! very helpful answer!

I try to specify everything I need as output_columns=...
I hope it works! :blush:

thanks once again!

Hi Igor,

I’m using twarc to make ~5000 calls (each call being about 100 tweets and each call being a different handle) to twitter’s Academic Research V2 API for the full-archive search feature. I am having a somewhat similar issue to what Sofia had experienced.

According to Twitter’s V2 documentations, I should be able to make 300 requests in any 15 minute period, and at a top-speed of at most 1 request per second. However, as I’m running my program, I am regularly only seeing something like 10-50 requests made before I get error messages like ‘rate limit exceeded: sleeping 499 secs’, followed by ‘rate limit exceeded: sleeping 901 secs’, which is sometimes (and very inconsistently) followed by additional ‘rate limit exceeded: sleeping 901 secs’. So on more than one occassion, twarc has had my calls halted for over an hour but apparently made far fewer requests than the 15 minute rate limit allows. I’m quite uncertain why this would be the case or what I can do to address the issue. I have already checked that I have the most updated twarc installed and I have scoured for online resources addressing this issue and not found any working solutions. I hope you might be able to point me in the right direction.

Thanks for all the time and effort you put into answering these questions!

-Kaden

1 Like

We did see some inconsistency with the rate limit responses from Twitter a while ago, and never managed to resolve them, if this is the API side there’s little you or we can do because twarc takes those values from Twitter’s side. One thing that may help is maybe running the command on a server with a good fast connection, as opposed to your laptop or workstation, not because of your connection but maybe in a data center the request will hit a different Twitter server and get a more consistent response. That’s all I can think of right now.

1 Like

Hi Igor,

Thanks for that clarification! Definitely helps alleviate my concern that I was making some silly mistake. I’ll try using some google cloud credits I have available and hope that works faster and more consistently.

Thanks,
Kaden

1 Like

Hello to every one i face this issue again and again
[2022-12-01 10:05:53,820: WARNING/ForkPoolWorker-4] save for one time for testing
[2022-12-01 10:05:54,113: WARNING/ForkPoolWorker-3] Rate limit reached. Sleeping for: 892
[2022-12-01 10:05:54,152: WARNING/ForkPoolWorker-2] Rate limit reached. Sleeping for: 892
[2022-12-01 10:05:54,216: WARNING/ForkPoolWorker-1] Rate limit reached. Sleeping for: 892
[2022-12-01 10:05:54,330: WARNING/ForkPoolWorker-4] Rate limit reached. Sleeping for: 892

and I use tweepy in Twitter API please tell me any solution this error

Calling twitter endpoints from multiple threads generally makes rate limits very unpredictable, i would recommend calling the API in serially, one call after another, so that rate limits can be tracked and updated more exactly.

1 Like