Checking If A Large Number of Users Exist At Once


#1

Hello everyone,

I am working on an application that required me to check if a large number of twitter users exist, roughly 87000. I did have previous program that worked with about 1000 users and I ran the method LookUpUsers() in order to get them back in increments of 100. This worked great as I split up my calls and for 1000 I would only need 10 calls, and so on. But for large numbers of users such as 87000 I would need to make about 870 calls. Before I called this method I checked each user’s existence with a call to checkID and if a 404 came back I would remove that user from the my Array list.

So is there an easy way to check whether a user actually exists? Why isn’t there a method called exists() in the API? Any suggestions or tips on how to go forward would be a great help!

Thank you very much.


#2

users/lookup is the method to use. You can look them up in increments of 100. It’s the only way to accomplish your goals. Due to rate limiting, it will just take you time (and patience) to work through your data set.

What’s the use case you’re trying to solve for? (As in, why do you have 87,000 user IDs or screen names that you need to either map to Twitter accounts or to confirm the existence of?)


#3

I am doing research at a local university and have built a classifier and algorithm to figure out whether a user is male or female. it worked great for around 1000 users. I have a method called CallTwitterAPI that ultimately returns an array list of twitter4j.User objects for each 100 users. I am then adding this to an existing array list each time to keep track of all the users. I need the information of each user such as real name, profile description, and more.

Also, if one of the user’s out of 100 that I give LookUpUsers is null, what happens then? Does the method behave in a certain way?

Lastly when you say time to take, what are you eluding to? Is there a good way to delay my time when calling the API? Any help would be great.

Thanks again!!!


#4

When presented with a list of criteria, users/lookup will respond with objects that have a match. If an object has no match, it just won’t be returned. So if you provide 100 user names, but only 50 of them match Twitter accounts, you’ll get 50 records back and no indication as to the other 50 you were looking for.

By time I mean:

The REST API is rate limited. Each API method has a different rate limit. Using OAuth 1.0A, you can use users/lookup 180 times per 15 minutes. With 100 lookups per call, that allows you 18,000 lookups per 15 minutes. Once you’ve looked up 18,000, you’ll have to wait until a 15 minute window is complete before resuming.

All the information around rate limiting is communicated via HTTP headers. You can code your application to listen to these HTTP headers and throttle requests appropriately.

When performing bulk user lookup like this, it’s best to be especially polite with the API – respond to error codes and rate limiting conditions… if the API tells you it’s having trouble processing your requests, exponentially back off.


#5

Thanks for your help. I truly appreciate it.

I am using Twitter4J by the way. So, if I’m able to perform 180 queries to the API every 15 minutes with 100 users each time if using LookUpUsers, then that means I can still get a large amount of information right? I do have to wait for the time to reset my RateLimitExceeded variable to go back to normal don’t I? So perhaps I could just do a Thread.sleep() during that time to cope with it. So, 18000 users for each time before I hit my rate limit again. Maybe it’s doable but I wish Twitter would grant better limits for people conducting research.

Thank you again.