Hello,
I’m new to the forum so apologies if this question has already been addressed.
I recently applied for and received approval to access the Academic Research product track - it was approved in a couple of days; very impressive Twitter.
In my developer account it shows my monthly cap as 10 million tweets. However when I click on the Subscriptions tab a dashboard appears that shows “Search Tweets - Full Archive / Sandbox”. On this dashboard it shows the limit of tweets as 5K with an option next to it to upgrade.
In order to access a higher volume of tweet is it necessary to “upgrade” even if I am using the Academic Research product track?
Any help would be greatly appreciated. Many thanks in advance.
Jason
Hi @jasonomcd
If your academic application was approved you should be able to see your academic project in your developer portal under: https://developer.twitter.com/en/portal/projects-and-apps
This article shows you how you can get started: Getting historical Tweets using the full-archive search endpoint - DEV Community 👩💻👨💻
1 Like
Hello @suhemparack,
Thanks so much for replying; didn’t expect to hear anything so quickly. 
I’ve seen that article before and it was VERY helpful however I’m just a little bit confused by the “Subscription” tab so I guess the question is do I need to purchase a subscription in order to begin collecting larger volumes (+5K) of tweets?
Thanks for your help.
Jason
Yeah it is a bit confusing with 2 different APIs, maybe this can clarify:
Academic Access is like v2 Recent Search, but you can go back to any time, not limited to last 7 days. And it’s totally separate from the v1.1 Premium subscriptions you’re seeing.
Maybe the UI should probably separate those two a bit more somehow.
2 Likes
Hello everyone,
Just another issue related to the original question.
When I run my query the following error is returned:
{
“error”: {
“message”: “Request exceeds account\u2019s current package request limits. Please upgrade your package and retry or contact Twitter about enterprise access.”,
“sent”: “2021-03-29T16:30:09+00:00”,
}
}
As I mentioned before I am signed up to the Academic API so, as I understood it, there is no limit to the number of requests? If there is a limit, does this mean I need to sign up to a subscription package?
I have already scraped several tweets (using my Academic Bearer token) so I am not sure why the number of tweets pulled is zero in both dashboards?
I’m very confused - any help would be greatly appreciated.
Thanks,
Jason
1 Like
I have a feeling this is an error message from the Premium API - how exactly are you calling it in your code? If you’re using tweepy, tweepy does not support v2 endpoints and the fullarchive search there is the v1.1 Premium fullarchive, not v2 Academic full archive search.
I recommend using twarc to get started with the Academic Access API (it’s not yet officially released but fully functional at this point):
1 Like
Hello,
Thank you for the replies.
I am using a website to access the Twitter API and paginate the Twitter data; the website is https://stevesie.com/.
The operator of the website has confirmed I should be able to set up a new endpoint to use V2. The website allows new endpoints to be created here: Log In | Stevesie Data
I have very limited knowledge of how to do this so perhaps someone could help?
I have set up the endpoint as follows:
I think I have included the correct data for the “Host” and “Path” fields but I am now sure how to change the “Slug” field and if the “Editor” section needs to change?
Any help @IgorBrigadir would be very much appreciated - thank you in advance.
Best wishes,
Jason
If you’re using a third party tool to access the API, you may have to manually specify more things, i’ve no way of knowing how stevesie.com works but here is the API reference you need: GET /2/tweets/search/all | Docs | Twitter Developer Platform
I’d still recommend trying to use twarc2 for calling the Academic API instead.
Hello,
Thanks for the advice.
I decided to try twarc and I’ve been able to run a simple query and generate a .json file. It surprised me how easy it was to do this as I have no experience of python or coding, or anything like that.
The instructions you sent were spot on!
Would you be able to offer any advice about the next steps to run a full archive search using twarc and perhaps how it is possible to transfer the data from the .json file to excel. I’m guessing it will be far more complex than running a simple search. 
Thanks for your help!
Jason
1 Like
Sure, I’m still writing the documentation for this.
One way to do it with twarc2 is with twarc-csv and import the csv into excel (but i’d warn of potential data corruption when working with excel and twitter data)
Hello @IgorBrigadir,
Thanks for pointing me in the right direction. I have run a search which seems to have been successful as a json file with data was created in the folder I set up. However, I wasn’t sure how to introduce the ‘twarc2 flatten’ command. Please see my attempt below which failed and I was prompted to insert my access credentials again?
In addition to this question if I want to run a full archive search for a defined period of time, what is the command I should use?
Thank you very much for your help.
Thanks,
Jason
Sure:
twarc search "query" will use the v1.1 API,
twarc2 search "query" will use the v2 Standard API
twarc2 search --archive "query" will use the Academic Access.
In your query, AND is not required - a space is an implicit logical AND. See here for more on how to build queries: Search Tweets - How to build a query | Docs | Twitter Developer Platform
To define the time, it’s --start-time and --end-time and there’s more options listed in
twarc2 search --help
So after it’s all configured,
twarc2 configure
the commands would be:
twarc2 search --archive --start-time "2020-12-01" --end-time "2021-01-01" --limit 500 "(boeing max) (lang:en)" boeing_search_output.json
twarc2 flatten boeing_search_output.json boeing_search_output_flat.json
twarc2 csv boeing_search_output_flat.json boeing_results.csv
(remove --limit 500 from the command to get all the tweets, i added it there just because it’s easy to burn through your limits with test calls)
1 Like
Oh wow - that worked and was so simple to do. I have spent a lot of time trying to do this over the past few weeks. Really appreciate your help.
1 Like
Just another quick question with respect to building the query.
I am looking to gather public opinion on a certain topic however when I have run my test search the majority of the results are from news handles. Is there a way to include something in the query to excludes such tweets and to focus on personal handles instead?
One idea I had was to maybe exclude verified handles?
Is it just possible to negate “is: verified” to read “-is:verified”
Thanks for your help and any other tips are welcome! 
Jason
Yes, you can add - to negate most operators, and also use () to group them , -is:verified will work. Unfortunately beyond that there’s no other parameter that would reliably exclude company / media accounts, but -is:verified is a very good heuristic. I’d add -is:nullcast too - these are ads - incidentally -is:nullcast is the only operator that must always be negated.
1 Like
Great, thank you. So, if I apply this to my query the command would read as follows - is this correct?
twarc2 search --archive --start-time "2020-12-01" --end-time "2021-01-01" "(boeing max) (lang:en) (-is:retweet) (-is:nullcast) (-is:verified)" boeing_search_output.json
Does every negated operator need to be grouped with ()?
Thanks again! 
No, but remember that a space is an implicit AND, also the order is that OR is processed first, so i’d always use parentheses to group ORs to make sure they’re correct Search Tweets - How to build a query | Docs | Twitter Developer Platform
Also, when running twarc queries in the command line, be sure to escape the " inside the query, for example:
The query:
lang:en "live laugh love" -is:retweet
should be run as:
twarc2 search "lang:en \"live laugh love\" -is:retweet"
1 Like
Hello @IgorBrigadir,
Hope you are well.
I’ve just finished running my query and generated the CSV file. In the CSV file I can see dozens of columns with labels - many of which are self explanatory like “text”, “created at” and “source”. However, others seem a little bit more obscure like “possibly.sensitive” and “author.sensitive”. Is there a link with info you can me that explains each of these fields?
In addition, a quick question about pagination. I am running my query without any pagination token and it seems to be working fine - the maximum number of tweets I pulled as part of one search was around 11,000. Is there a reason I should be using pagination or am I ok to continue as I am. What would you recommend - please bear in mind I know about how to set up a pagination token… 
Sorry to ask more questions and thanks as always.
Thanks,
Jason
1 Like
Sure - the fields are documented here: Tweet object | Docs | Twitter Developer Platform and in those linked object pages for users and places etc.
twarc2 does pagination for you, so you don’t need to do anything - but if you want to limit and stop collection - you can add --limit 500 to the command to set a limit on the number of results like this:
twarc2 search --archive --start-time "2020-12-01" --end-time "2021-01-01" --limit 500 "(boeing max) (lang:en) (-is:retweet) (-is:nullcast) (-is:verified)" boeing_search_output.json
1 Like
Hello @IgorBrigadir,
Hope all is well. Just another quick related question.
If I wanted to limit my results to 20,000 tweets how would I do this? Is the below correct:
twarc2 search --archive --start-time "2019-03-10" --end-time "2019-04-24" --limit 20000 "(boeing max) (lang:en) (-is:retweet) (-is:nullcast)" boeing_search_output.json
Thanks in advance.
Jason