Best way to obtain tweets from past month from multiple users


#1

I’m using Get statuses/user_timeline to obtain tweets for a single user for the past month or so. This works great. However, I need to extend this to obtain tweets for a series of theoretically up to 100 users for the past month. I’m wondering what the best way to do this is? And to add more complexity to this, they need to be georefenced tweets (contain coordinates) so I can place them on a map. Is there a better REST request for this purpose? Or is it a matter of sending multiple requests using Get Statuses/User_timeline, one per user (seems time consuming and I’ll quickly hit my Rate Limit).

I tried mucking around with the Search API but it seems very inconsistent in the data it brings back, and it seems to me that it only pulls back tweets from the past week or so only???

Ideas?


#2

@robine_k, doing this retroactively is a bit difficult as you describe, but you have the general idea. You will need to page through statuses/user_timeline for each user until you have the Tweets you require. To work within rate limits, I usually do some calculations up front and insert sleeps in my loops and just let it run as I go about my day.

The best way to do this and what we usually suggest is not doing this retroactively, but in real-time. Use the statuses/filter API to pull in Tweets in realtime and save the relevant data to a database. You won’t have a large data set right away, but you will eventually. This removes any concern over rate-limits. I’m not absolutely sure if it will support 100 users, but I just asked someone on my team and will find out.

If you are building a product or a business around this, just make sure you have reviewed the developer terms of use carefully. If you absolutely need all inclusive historical search and you have a healthy budget, check out Gnip for enterprise grade Twitter APIs.


#3

Thanks for the information. I actually learned about using user lists for grouping users, and then querying against the list. I think that’s the plan for now. But your right about doing a sleep, and then resending the request to obtain more data. Guessing I’ll have to write some code to generate a json file.