Please help. I am doing my final college project about research regarding 2019 Presidential Election in Indonesia. I use data from Twitter to predict the results of the election. I want to extract tweets every 2 weeks from September 2018 to January 2019, but I cannot collect tweets past 7 days from now. So I apply for developer account to use the full archive search API to collect tweets from September 2018. My developer account already approved, I already create an app, and I already set the dev environment to my app. But I still cannot extract tweets past 7 days from now using my old code. So I search the solution and I found this code :

library(jsonlite)

#Create your own appication key at https://dev.twitter.com/apps
consumer_key = "insert_consumer_key";
consumer_secret = "insert_consumer_secret";

#Use basic auth
secret <- jsonlite::base64_enc(paste(consumer_key, consumer_secret, sep = ":"))
req <- httr::POST("https://api.twitter.com/oauth2/token",
  httr::add_headers(
    "Authorization" = paste("Basic", gsub("\n", "", secret)),
    "Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8"
  ),
  body = "grant_type=client_credentials"
);

#Extract the access token
httr::stop_for_status(req, "authenticate with twitter")
token <- paste("Bearer", httr::content(req)$access_token)

#Actual API call
url <- "https://api.twitter.com/1.1/statuses/user_timeline.json?count=10&screen_name=Rbloggers"
req <- httr::GET(url, httr::add_headers(Authorization = token))
json <- httr::content(req, as = "text")
tweets <- fromJSON(json)
substring(tweets$text, 1, 100)

So I change the customer key and secret for my own and the url, and it will be like this :

library(jsonlite)

#Create your own appication key at https://dev.twitter.com/apps
consumer_key = "my_consumer_key";
consumer_secret = "my_consumer_secret";

#Use basic auth
secret <- jsonlite::base64_enc(paste(consumer_key, consumer_secret, sep = ":"))
req <- httr::POST("https://api.twitter.com/oauth2/token",
                  httr::add_headers(
                    "Authorization" = paste("Basic", gsub("\n", "", secret)),
                    "Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8"
                  ),
                  body = "grant_type=client_credentials"
);

#Extract the access token
httr::stop_for_status(req, "authenticate with twitter")
token <- paste("Bearer", httr::content(req)$access_token)

#Actual API call
url <- "https://api.twitter.com/1.1/tweets/search/fullarchive/my_env_label.json"
req <- httr::GET(url, httr::add_headers(Authorization = token))
json <- httr::content(req, as = "text")
tweets <- fromJSON(json)
substring(tweets$text, 1, 100)

But when I run the code, nothing happen and only show this :

character(0)

So I try to change the url with the query like this :

url <- "https://api.twitter.com/1.1/tweets/search/fullarchive/my_env_label.json?query=%23JokowiLagi&fromDate=201809270000&toDate=201810010000"

But still nothing happen and the result is same as before. The only difference is this time my subscriptions details are reduced.

So please help me, tell me what I should do to solve this problem.
Thank you.

(Note : I am using R programming language to extract the data from Twitter and to run the above codes)

There’s no R implementation of the full archive search unfortunately, and writing your own R code for it might get a bit involved, also you don’t need to extract the access token every time - you can reuse the same one over and over.

If you just need tweets loaded into R for analysis give the command line version of this a try: GitHub - twitterdev/search-tweets-python: Python client for the Twitter 'search Tweets' and 'count Tweets' endpoints (v2/Labs/premium/enterprise). Now supports Twitter API v2 /recent and /all search endpoints.

eg:

search_tweets.py \
--max-results 100 \
--results-per-call 100 \
--filter-rule "#JokowiLagi" \
--start-datetime 2018-09-27 \
--end-datetime 2018-10-01 \
--filename-prefix test_search \
--print-stream

If you have a ~/.twitter_keys.yaml file like the readme describes (endpoint will be https://api.twitter.com/1.1/tweets/search/fullarchive/my_env_label.json), a file called test_search.json will be created with 100 tweets for #JokowiLagi for that date range in that example call.

Thanks for your reply. So do you mean that I can’t extract data from Twitter for more than 7 days using R even though I am using full archive search?

Oh no, you can still do it in R but you’ll have to implement it yourself - I’m only suggesting the Python way because it seems like less effort to get the data.

7 Days is the rough limit for the normal REST search API, using the full archive search properly will give you everything - but you’ll need to authenticate, paginate, and manage your own rate limits etc: Premium search APIs | Docs | Twitter Developer Platform

Ok, I try to use Python now and I try to follow the github:
https://github.com/twitterdev/search-tweets-python

I already installed searchtweets and I just create a file twitter_keys.yaml for the credential just like in the github. The .yaml file contains the same as the github (except for the endpoint, customer key and secret).

After that I try this code in the github:

from searchtweets import load_credentials

load_credentials(filename="./search_tweets_creds_example.yaml",
                 yaml_key="search_tweets_premium_example",
                 env_overwrite=False)

But I stuck in this point. I don’t know about filename or yaml_key in the load_credentials’ parameters. What should I input in the filename and yaml_key? And can you give me the example about the input?

Thanks for your help.

Have a look at Problem with .twitter_keys.yaml - #2 by jrmontag to see if it helps.

filename is the path to actual yaml file and yaml_key is there to specify what configuration to use, because one yaml file can have multiple configurations for multiple endpoints and apps.

Also, using --credential-file twitter_keys.yaml to that search_tweets.py command will work too - without having to modify the underlying code.

Ok, finally I found this file search_tweets.py and I try to run it. But I confuse about what should I change in that code:

#!C:\Python\python.exe
# Copyright 2017 Twitter, Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0
import os
import argparse
import json
import sys
import logging
from searchtweets import (ResultStream,
                          load_credentials,
                          merge_dicts,
                          read_config,
                          write_result_stream,
                          gen_params_from_config)

logger = logging.getLogger()
# we want to leave this here and have it command-line configurable via the
# --debug flag
logging.basicConfig(level=os.environ.get("LOGLEVEL", "ERROR"))


REQUIRED_KEYS = {"pt_rule", "endpoint"}


def parse_cmd_args():
    argparser = argparse.ArgumentParser()
    help_msg = """configuration file with all parameters. Far,
          easier to use than the command-line args version.,
          If a valid file is found, all args will be populated,
          from there. Remaining command-line args,
          will overrule args found in the config,
          file."""

    argparser.add_argument("--credential-file",
                           dest="credential_file",
                           default=None,
                           help=("Location of the yaml file used to hold "
                                 "your credentials."))

    argparser.add_argument("--credential-file-key",
                           dest="credential_yaml_key",
                           default=None,
                           help=("the key in the credential file used "
                                 "for this session's credentials. "
                                 "Defaults to search_tweets_api"))

    argparser.add_argument("--env-overwrite",
                           dest="env_overwrite",
                           default=True,
                           help=("""Overwrite YAML-parsed credentials with
                                 any set environment variables. See API docs or
                                 readme for details."""))

    argparser.add_argument("--config-file",
                           dest="config_filename",
                           default=None,
                           help=help_msg)

    argparser.add_argument("--account-type",
                           dest="account_type",
                           default=None,
                           choices=["premium", "enterprise"],
                           help="The account type you are using")

    argparser.add_argument("--count-bucket",
                           dest="count_bucket",
                           default=None,
                           help=("""Bucket size for counts API. Options:,
                                 day, hour, minute (default is 'day')."""))

    argparser.add_argument("--start-datetime",
                           dest="from_date",
                           default=None,
                           help="""Start of datetime window, format
                                'YYYY-mm-DDTHH:MM' (default: -30 days)""")

    argparser.add_argument("--end-datetime",
                           dest="to_date",
                           default=None,
                           help="""End of datetime window, format
                                 'YYYY-mm-DDTHH:MM' (default: most recent
                                 date)""")

    argparser.add_argument("--filter-rule",
                           dest="pt_rule",
                           default=None,
                           help="PowerTrack filter rule (See: http://support.gnip.com/customer/portal/articles/901152-powertrack-operators)")

    argparser.add_argument("--results-per-call",
                           dest="results_per_call",
                           help="Number of results to return per call "
                                "(default 100; max 500) - corresponds to "
                                "'maxResults' in the API")

    argparser.add_argument("--max-results", dest="max_results",
                           type=int,
                           help="Maximum number of Tweets or Counts to return for this session")

    argparser.add_argument("--max-pages",
                           dest="max_pages",
                           type=int,
                           default=None,
                           help="Maximum number of pages/API calls to "
                           "use for this session.")

    argparser.add_argument("--results-per-file", dest="results_per_file",
                           default=None,
                           type=int,
                           help="Maximum tweets to save per file.")

    argparser.add_argument("--filename-prefix",
                           dest="filename_prefix",
                           default=None,
                           help="prefix for the filename where tweet "
                           " json data will be stored.")

    argparser.add_argument("--no-print-stream",
                           dest="print_stream",
                           action="store_false",
                           help="disable print streaming")

    argparser.add_argument("--print-stream",
                           dest="print_stream",
                           action="store_true",
                           default=True,
                           help="Print tweet stream to stdout")

    argparser.add_argument("--extra-headers",
                           dest="extra_headers",
                           type=str,
                           default=None,
                           help="JSON-formatted str representing a dict of additional request headers")

    argparser.add_argument("--debug",
                           dest="debug",
                           action="store_true",
                           default=False,
                           help="print all info and warning messages")
    return argparser


def _filter_sensitive_args(dict_):
    sens_args = ("password", "consumer_key", "consumer_secret", "bearer_token")
    return {k: v for k, v in dict_.items() if k not in sens_args}

def main():
    args_dict = vars(parse_cmd_args().parse_args())
    if args_dict.get("debug") is True:
        logger.setLevel(logging.DEBUG)
        logger.debug("command line args dict:")
        logger.debug(json.dumps(args_dict, indent=4))

    if args_dict.get("config_filename") is not None:
        configfile_dict = read_config(args_dict["config_filename"])
    else:
        configfile_dict = {}
    
    extra_headers_str = args_dict.get("extra_headers")
    if extra_headers_str is not None:
        args_dict['extra_headers_dict'] = json.loads(extra_headers_str)
        del args_dict['extra_headers']

    logger.debug("config file ({}) arguments sans sensitive args:".format(args_dict["config_filename"]))
    logger.debug(json.dumps(_filter_sensitive_args(configfile_dict), indent=4))

    creds_dict = load_credentials(filename=args_dict["credential_file"],
                                  account_type=args_dict["account_type"],
                                  yaml_key=args_dict["credential_yaml_key"],
                                  env_overwrite=args_dict["env_overwrite"])

    dict_filter = lambda x: {k: v for k, v in x.items() if v is not None}

    config_dict = merge_dicts(dict_filter(configfile_dict),
                              dict_filter(creds_dict),
                              dict_filter(args_dict))

    logger.debug("combined dict (cli, config, creds) sans password:")
    logger.debug(json.dumps(_filter_sensitive_args(config_dict), indent=4))

    if len(dict_filter(config_dict).keys() & REQUIRED_KEYS) < len(REQUIRED_KEYS):
        print(REQUIRED_KEYS - dict_filter(config_dict).keys())
        logger.error("ERROR: not enough arguments for the program to work")
        sys.exit(1)

    stream_params = gen_params_from_config(config_dict)
    logger.debug("full arguments passed to the ResultStream object sans password")
    logger.debug(json.dumps(_filter_sensitive_args(stream_params), indent=4))

    rs = ResultStream(tweetify=False, **stream_params)

    logger.debug(str(rs))

    if config_dict.get("filename_prefix") is not None:
        stream = write_result_stream(rs,
                                     filename_prefix=config_dict.get("filename_prefix"),
                                     results_per_file=config_dict.get("results_per_file"))
    else:
        stream = rs.stream()

    for tweet in stream:
        if config_dict["print_stream"] is True:
            print(json.dumps(tweet))


if __name__ == '__main__':
    main()

I just try to change some parts in that code like you suggest before and the code are like this:

#!C:\Python\python.exe
# Copyright 2017 Twitter, Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0
import os
import argparse
import json
import sys
import logging
from searchtweets import (ResultStream,
                          load_credentials,
                          merge_dicts,
                          read_config,
                          write_result_stream,
                          gen_params_from_config)

logger = logging.getLogger()
# we want to leave this here and have it command-line configurable via the
# --debug flag
logging.basicConfig(level=os.environ.get("LOGLEVEL", "ERROR"))


REQUIRED_KEYS = {"pt_rule", "endpoint"}


def parse_cmd_args():
    argparser = argparse.ArgumentParser()
    help_msg = """configuration file with all parameters. Far,
          easier to use than the command-line args version.,
          If a valid file is found, all args will be populated,
          from there. Remaining command-line args,
          will overrule args found in the config,
          file."""

    argparser.add_argument(".twitter_keys.yaml",
                           dest="credential_file",
                           default=None,
                           help=("Location of the yaml file used to hold "
                                 "your credentials."))

    argparser.add_argument("search_tweets_premium",
                           dest="credential_yaml_key",
                           default=None,
                           help=("the key in the credential file used "
                                 "for this session's credentials. "
                                 "Defaults to search_tweets_api"))

    argparser.add_argument("--env-overwrite",
                           dest="env_overwrite",
                           default=True,
                           help=("""Overwrite YAML-parsed credentials with
                                 any set environment variables. See API docs or
                                 readme for details."""))

    argparser.add_argument("--config-file",
                           dest="config_filename",
                           default=None,
                           help=help_msg)

    argparser.add_argument("premium",
                           dest="account_type",
                           default=None,
                           choices=["premium", "enterprise"],
                           help="The account type you are using")

    argparser.add_argument("--count-bucket",
                           dest="count_bucket",
                           default=None,
                           help=("""Bucket size for counts API. Options:,
                                 day, hour, minute (default is 'day')."""))

    argparser.add_argument("2018-09-27",
                           dest="from_date",
                           default=None,
                           help="""Start of datetime window, format
                                'YYYY-mm-DDTHH:MM' (default: -30 days)""")

    argparser.add_argument("2018-10-01",
                           dest="to_date",
                           default=None,
                           help="""End of datetime window, format
                                 'YYYY-mm-DDTHH:MM' (default: most recent
                                 date)""")

    argparser.add_argument("#JokowiLagi",
                           dest="pt_rule",
                           default=None,
                           help="PowerTrack filter rule (See: http://support.gnip.com/customer/portal/articles/901152-powertrack-operators)")

    argparser.add_argument("100",
                           dest="results_per_call",
                           help="Number of results to return per call "
                                "(default 100; max 500) - corresponds to "
                                "'maxResults' in the API")

    argparser.add_argument("100", dest="max_results",
                           type=int,
                           help="Maximum number of Tweets or Counts to return for this session")

    argparser.add_argument("--max-pages",
                           dest="max_pages",
                           type=int,
                           default=None,
                           help="Maximum number of pages/API calls to "
                           "use for this session.")

    argparser.add_argument("--results-per-file", dest="results_per_file",
                           default=None,
                           type=int,
                           help="Maximum tweets to save per file.")

    argparser.add_argument("#JokowiLagi",
                           dest="filename_prefix",
                           default=None,
                           help="prefix for the filename where tweet "
                           " json data will be stored.")

    argparser.add_argument("--no-print-stream",
                           dest="print_stream",
                           action="store_false",
                           help="disable print streaming")

    argparser.add_argument("--print-stream",
                           dest="print_stream",
                           action="store_true",
                           default=True,
                           help="Print tweet stream to stdout")

    argparser.add_argument("--extra-headers",
                           dest="extra_headers",
                           type=str,
                           default=None,
                           help="JSON-formatted str representing a dict of additional request headers")

    argparser.add_argument("--debug",
                           dest="debug",
                           action="store_true",
                           default=False,
                           help="print all info and warning messages")
    return argparser


def _filter_sensitive_args(dict_):
    sens_args = ("password", "consumer_key", "consumer_secret", "bearer_token")
    return {k: v for k, v in dict_.items() if k not in sens_args}

def main():
    args_dict = vars(parse_cmd_args().parse_args())
    if args_dict.get("debug") is True:
        logger.setLevel(logging.DEBUG)
        logger.debug("command line args dict:")
        logger.debug(json.dumps(args_dict, indent=4))

    if args_dict.get("config_filename") is not None:
        configfile_dict = read_config(args_dict["config_filename"])
    else:
        configfile_dict = {}
    
    extra_headers_str = args_dict.get("extra_headers")
    if extra_headers_str is not None:
        args_dict['extra_headers_dict'] = json.loads(extra_headers_str)
        del args_dict['extra_headers']

    logger.debug("config file ({}) arguments sans sensitive args:".format(args_dict["config_filename"]))
    logger.debug(json.dumps(_filter_sensitive_args(configfile_dict), indent=4))

    creds_dict = load_credentials(filename=args_dict["credential_file"],
                                  account_type=args_dict["account_type"],
                                  yaml_key=args_dict["credential_yaml_key"],
                                  env_overwrite=args_dict["env_overwrite"])

    dict_filter = lambda x: {k: v for k, v in x.items() if v is not None}

    config_dict = merge_dicts(dict_filter(configfile_dict),
                              dict_filter(creds_dict),
                              dict_filter(args_dict))

    logger.debug("combined dict (cli, config, creds) sans password:")
    logger.debug(json.dumps(_filter_sensitive_args(config_dict), indent=4))

    if len(dict_filter(config_dict).keys() & REQUIRED_KEYS) < len(REQUIRED_KEYS):
        print(REQUIRED_KEYS - dict_filter(config_dict).keys())
        logger.error("ERROR: not enough arguments for the program to work")
        sys.exit(1)

    stream_params = gen_params_from_config(config_dict)
    logger.debug("full arguments passed to the ResultStream object sans password")
    logger.debug(json.dumps(_filter_sensitive_args(stream_params), indent=4))

    rs = ResultStream(tweetify=False, **stream_params)

    logger.debug(str(rs))

    if config_dict.get("filename_prefix") is not None:
        stream = write_result_stream(rs,
                                     filename_prefix=config_dict.get("filename_prefix"),
                                     results_per_file=config_dict.get("results_per_file"))
    else:
        stream = rs.stream()

    for tweet in stream:
        if config_dict["print_stream"] is True:
            print(json.dumps(tweet))


if __name__ == '__main__':
    main()

But after I run it, this messages appear:

Traceback (most recent call last):
  File "c:/Python Code/search_tweets.py", line 207, in <module>
    main()
  File "c:/Python Code/search_tweets.py", line 148, in main
    args_dict = vars(parse_cmd_args().parse_args())
  File "c:/Python Code/search_tweets.py", line 38, in parse_cmd_args
    help=("Location of the yaml file used to hold "
  File "C:\Python\lib\argparse.py", line 1334, in add_argument
    raise ValueError('dest supplied twice for positional argument')
ValueError: dest supplied twice for positional argument

What should I change in the search_tweets.py code?
What is the meaning of that ValueError?
Is there any mistake in the changes I made above?
It will be helpful if you can provide some examples about the changes in search_tweets.py.

Thanks for your help so far.

I want to extract data from 2017 and real-time also, and I faced the same your problem which is the limitation.
so I will get the Premium APIs soon, but I have some questions if you can answer me, please.
can you show me an example of json code for search in Twitter?

Are those the columns of the data I will get?
@mentions, Replies, Retweets, Quote Tweets, Retweets of Quoted Tweets, Likes, Direct Messages Sent, Direct Messages Received, Follows, Blocks, Mutes,Typing indicators and Read receipts.

is there any GitHub code show me how to implement it by myself in R?

Oh, there’s no need to edit the python code, you are changing the code that does the argument parsing for a command line utility, which you can just run without modifying, the command is:

python search_tweets.py \
--credential-file twitter_keys.yaml \
--max-results 100 \
--results-per-call 100 \
--filter-rule "#JokowiLagi" \
--start-datetime 2018-09-27 \
--end-datetime 2018-10-01 \
--filename-prefix test_search \
--print-stream

This assumes there is a twitter_keys.yaml file in the directory, and will print out 100 tweets for #JokowiLagi from the date range specified, and create a file called test_search.json with the tweets.

As for real-time, i’d suggest starting with something like Overview of Social Feed Manager • Social Feed Manager which is more user friendly - and once you identify what’s available and what exactly you need to do, you can build your own data collecting approach. Introduction | Docs | Twitter Developer Platform

1 Like

Thank you…
I used standard twitter API in R by used twitteR package for almost month. but as I mentioned before because of limitations I want to use premium API ,but It is my first time work with Json code and premium API so I am little confused, any advice or steps can help me?

I’d suggest starting with with the docs Twitter API | Products | Twitter Developer Platform and tools like GitHub - twitterdev/search-tweets-python: Python client for the Twitter 'search Tweets' and 'count Tweets' endpoints (v2/Labs/premium/enterprise). Now supports Twitter API v2 /recent and /all search endpoints. to see what’s possible and if it fits what you need to do.

I’ve no doubt R can effectively use the Premium search APIs, but since there’s no library for that yet it’ll be a good chunk of work to implement.

Also, GitHub - geoffjentry/twitteR: R based twitter client twitteR library is no longer maintained in favour of GitHub - mkearney/rtweet: 🐦 R client for interacting with Twitter's [stream and REST] APIs - i’d strongly suggest switching to rtweet in R.

2 Likes

Thank you so much

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.