TwitteR Loop Problems


#1

Hi Everyone,

We have a few problems in using a loop for retrieving tweets with TwitteR.
We are trying to search for all Tweets related to a Brand for one day. Our aim is to reach every single tweet of that day.
For this we are using the following code:

Search_Text = "_Name of the Brand_"

Day = "2016-11-13"
Next_Day = "2016-11-14"

Filename = paste0("Tweets_", Search_Text, "_of_", Day, ".RData")

New_Tweets = 1


while (length(New_Tweets) > 0)
{
 if (file.exists(Filename)) {
   
   load(Filename)
   
   Oldest_ID = tail(Tweets, n = 1) [[1]]$id
   
   New_Tweets = searchTwitteR(Search_Text, n=5000, lang = "en", maxID = Oldest_ID, since = Day, resultType = "recent", retryOnRateLimit = 1)
   
   New_Tweets[[1]] = NULL
   
   print(length(New_Tweets))
   
   print(head(New_Tweets, n=1)[[1]]$created)
   print(tail(New_Tweets, n=1)[[1]]$created)
   
   Tweets = append(Tweets, New_Tweets)
   
 } else { 
   
   Tweets = searchTwitteR(Search_Text, n = 5000, lang = "en", since = Day, until = Next_Day, resultType = "recent", retryOnRateLimit = 1)
   print(head(Tweets, n=1)[[1]]$created)
   print(tail(Tweets, n=1)[[1]]$created)
 }
  save(Tweets, file = Filename)
}

Unfortunately it isn’t working at all. Following errors occur:

Error in head(New_Tweets, n = 1)[[1]] : subscript out of bounds

In addition: Warning messages:
1: In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit
= retryOnRateLimit, :
5000 tweets were requested but the API can only return 614
2: In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit
= retryOnRateLimit, :
5000 tweets were requested but the API can only return 1

Error in head(New_Tweets, n = 1)[[1]] : subscript out of bounds
In addition: Warning messages:
1: In doAPICall(cmd, params = params, method = method, retryCount =
retryCount, :
Rate limit encountered & retry limit reached - returning partial
results
2: In doAPICall(cmd, params = params, method = method, retryCount =
retryCount, :
Rate limit encountered & retry limit reached - returning partial
results

The problem is that it retrieves only Tweets of the half day for example. This occurs when we search for a Brand Name with around 10,000 Tweets a Day. With a Search Text with less Tweets (around 2000) it works.

Do you think it’s because our laptop performance, the API restrictions or any mistake in the code?
We already changed the parameters n and retryonratelimit, but nothing helped.
We are total beginners in RStudio and would appreciate your help.

Thank you,
Kevin


#2

I’m not an expert in R so I’m only able to figure this out based on what I’m seeing in the error messages you’ve posted.

“subscript out of bounds” implies that you’re passing in a value that doesn’t match the expected parameter.

“Rate limit encountered & retry limit reached - returning partial results” suggests that you’re hitting the search/tweets rate limit.

Since you’re seeing this on high traffic terms I expect the fundamental problem is rate limiting.

You’d probably be better off using the streaming API to capture the data, instead of the search endpoint. I don’t have any information about how you would do that using R, but I’m sure the available libraries have documentation that could help.