I want to conduct a study and I am not sure if the rate limits will allow me to do what I want to do.
Here is the idea:
I want to collect a social network graph of 600K+ Twitter users. This means being able to find out the presence or absence of directed edges among all of the 600K+ users. I would estimate that each users has an average of 200 followers (indegree edges) and let’s say 500 friends (outdegree edges). Many of these users are expected to have common followers. How long, approximately, would it take to gather the follower/friend information for 600 users to put into an adjacency list?
The edges/ties/relations in the network will be follower/followed relationships.
Start with a specific list of approximately 600 Twitter users, chosen because they all are from all of the news outlets in a large city.
Collect all of the followers and friends (people they follow) for all 600 users. These users probably have an average number of followers of 2,000 each. They probably have an average number of friends (people they follow) of 500.
Since these followers of the 600 are all in the same city, it is expected that many of these followers would be the same users following these 600 people. So let’s approximate and guess that these 600 users have approximately 600,000 followers and friends in total. So this would be a subgraph/network of 600,600 total Twitter users. So once I have collected all of the 600,000 followers and friends of all of these 600 people, I want to be able to construct a social network of all of these 600,600 people AND their followers. This would require me to be able to at least find all of the directed edges amongst these 600,600 users (whether or not each of these 600,600 users follow each other).
With Twitter rate limits, will this kind of data mining be feasible? How long, approximately, would it take to gather the follower/friend information for 600 users to put into an adjacency list?