How to get followers ids for a user with large number followers with out crossing the api rate limit


#1

Hi all
I am trying to get followers/friends for a given user name,it works fine,however when a we try to get the followers/friends for a user with large number of followers/friends let say 35000 or more than the code breaks,which seems cross the rate limits.Since to get all follower/friend ids we have to make cursored call and number of times we make call depends on the followers/friends count,so I am not sure how we can get the same with our crossing the rate limits. Thanks in advance


#2

What I did was to go through all the returned pages and summing up the amount of follower ids. When I hit the threshold I made the script sleep for 15 minutes and then to continue to the next page after that.

This is pretty bad since it took around 30 minutes to fetch the total of 102k follower ids. However, this number matched exactly what the Twitter analytics page showed :smile:

If anyone knows a better way to do this please let me know.


#3

@drupalopenid You can make the API call by passing the arguments like user_id:123456, cursor=cursor, count=5000
Initialize the cursor to -1 and make a loop like while(cursor != 0) within the loop you you should update the cursor. By the way for how many users you are crawling the friend ids data , because I am also doing the same thing I am also facing some trouble. Which twitter library and what programming language are you using.


#4

@jonataschagas3 I am afraid twitter allows us to extract data at this rate only in order to avoid any abuse to its server. By the way for how many users are you crawling friend ids. I am also doing the same for large number of user ids around 100K users. If you are able to do that then please help me.
Thank you in advance.


#5

@PaulBachu I’m crawling around 100k or more friend ids. I’ve used the same “while (cursor != 0)” approach you are using. I understand that the API rates are meant to avoid abuse, I just wish there were available API calls to pull aggregates such as totals of followers per day, etc. Other public APIs such as Youtube analytics API and Facebook insights API offer aggregated data but with this API we have to “work around”…
I’m using python to pull this data. Since we’re using the same approach, I wonder if there is anything I can help you with?

Cheers


#6

@jonataschagas3 This sounds great. I am also using python and the Twython library for making calls to twitter API. But there occurs a problem to me. I am able to generate friend list of around 70 users only and then this message displays every time Error no 429(Rate Limit exceeded) . I know this is a common error that occurs only when we hit the rate limit (15 calls) in per 15 mins windows. But in my case even after 15 mins the error continues to display. I have no idea why it is happening so. Also how to check the Rate-limit-remaining? I have searched the internet but unable to find valid syntax. If you could please help me. Thanks again for your response.


#7

It’s a rolling 15 minutes so it counts as the first time you’ve made a request.

The best way to approach this is:

  • get->followers/ids count = 5000, cursor = 0
  • loop 15 times setting the cursor to the returned value “next_cursor”
  • store the very last “next_cursor” and start with this cursor the next 15 minutes (instead of 0)

Think of a cursor as a bookmark, since you can’t read a whole book in one sitting you bookmark your page to save your spot next time you have time to read.


#8

@trace I am not able to figure you out. Could you please be more specific or if you could provide me the whole code for it. Thanks for your immediate respone.


#9

@PaulBachu If you show me what you have in Python now and what Twitter account you’re trying to get the follower ids for I can help you further. In PHP it’d be something ROUGHLY similar to:


#10

@trace This is my program. Plz review it.

#!/usr/bin/env python

from twython import Twython
import time
import sys

argList = sys.argv

fp1 = open('error23.txt','w')
fp2 =  open(argList[4],'w')  # outputfile containing the list of all friends
fp3 =  open(argList[3],'r')  # input file containing the list of specified userIds 
APP_KEY =  argList[1]    # App Key or Consumer Key (API Key)
APP_SECRET = argList[2]   # App secret or Consumer Secret (API Secret)

twitter = Twython(APP_KEY,APP_SECRET,oauth_version=2)
print twitter
ACCESS_TOKEN = twitter.obtain_access_token()
print 'access_token= ',ACCESS_TOKEN
twitter = Twython(APP_KEY,access_token=ACCESS_TOKEN)
print twitter
for user in fp3:
            
            user = user.rstrip()
            #print 'user= ',user
            fp2.write(user + ',')
            cursor = -1
            while cursor != 0:
                try:
                    
                    print 'in WHILE'
                    time.sleep(100)
                    response = twitter.get_friends_ids(user_id=int(user), cursor=cursor, count=5000)
                    
                    ids = response['ids']
                    cursor = response['next_cursor']
                    #print ids
                    print cursor
                    for x in ids:
                    	fp2.write(str(x) + ',')
                    if cursor == 0:
                    	fp2.write('\n')
                except Exception as e:
                        print e
                        print 'exception occurs for ',user
                        fp1.write('\nexception occurs for ' + user + '\ne')
                       
fp1.close()
fp2.close()
fp3.close()

#11

I’m not a Python programer but it looks like it will work correctly. How many followers does the user you’re trying to parse have? If it’s over 75,000 then you need to split this into two different requests due to API limits.

To do this:

  • Change (line 27) while cursor != 0 to while cursor != 0 and i < 15.
  • Instantiate i to 0 after line 26.
  • Save next_cursor to a file after the while loop (line 46, i believe).
  • Load the next_cursor value from the file to line 26

#12

can you please send ma the code of this loop in which u use (cursor!=0)
please!


#13

paul plz help me ! i want to know that the id passed in query such as user_id:12345 is integer or string ??

:’( :’( :’(


#14

Generally these parameters are represented as strings in JSON objects. What issue are you having? You might want to open a new topic with some example code and explain your specific problem, this is an old thread.


#15

hi all! i’m trying to do the same thing with the same approach!
i’m using the while loop with next_cursor != 0 and making the program sleep but it seems that it never came out of sleep!
can you take a look at my code and tell me what’s wrong?

#sleep between each api call        
next_cursor = "-1"     
while (next_cursor != "0"):
    r['followers'] = twapi.get_followers_list(screen_name=r['screen_name'],count=200,cursor=next_cursor)
    next_cursor = r['followers']["next_cursor"]
    sleep(901) #15 min + 1 sec

this is the critical piece.
help me if you can!
thanks!!!


#16

I think you should put it inside a while … if … else loop.


#17

The best way, In my opinion, Would be to use a combination of methods.
If you are only processing a followers list for 1 account, than there is little you can do to speed this up.

If you are running lists for several accounts there are some work arounds.

Rate limiting is on a per access token basis. If you hit a wall with an account access token, you can switch over to your application token. This will give you double the processing power.

When processing multiple account follow lists, and you hit a rate limit, you can set it to sleep, and spawn a new process, continuing with the next account.

You can check the total followers count, and divide that against the 5000 ids limit, to see if you have more than 15 pages worth of results.

If your under 15 pages, than you can perform the request without hitting the rate limits.
If you know you have more than 15 pages, than you can set the sleep to process each request, and sleep instead of batch processing the max and sleeping for 15.

So I would set to sleep 61 seconds between requests.


#18

Make sure you are running from the command line, or set your max execution time to 0;