Hi there, ok i want storage all incoming tweets of a whole day in a mysql database, already made the code on php for this, i can save tweets on db,I just want to save text of the latter one 24 hours, this is to make analysis and search of keywords. this is the best way? i mean data storage is so huge you know. I can be active all day keeping tweets in db without restriction?


Yes you can be active all day with Streaming API without restriction as i know.


Ya once you connect as using the streaming api, you can stay connected given you do not have any disconnections on your own application. I have streaming connections using Phirehose connected for weeks straight with no problems.


ok good to know, and what about save on database all tweets every day? i mean create a table puts tweets in rows, storage only 24 hours activity, this is the way to do it? them i can search for keywords and make analisys like how tweets containing that keyword was tweeted between 3 pm and 4 pm?

i just want to know the best way to do, especially how storing tweets.


The way is to use the Phirehose library to fetch a sample (GET statuses/sample) of public tweets and save these tweets into your database. After your php script stop running (you can set the maximum execution time) you can make whatever you want.

The “best” way to storing tweets into a database (mySql, postresql etc) is one and simple ( insert into…).
You need first of all to make a model ( E-R model ) of your tables and their association between them: (e.g) a user can upload many tweets ALSO a tweet belongs to a user.


I have used that library, already did work, no problem here, i save tweets in my db.
your advice about E-R model is very good, then i ask: approximately how much would “weigh” (MB) all tweets from a single day? cause i want to know which is the capacity that I need on my host service? Thx theochry


It depends on what you want to store, you want entire tweets or some objects only (eg user object, tweet object, place object etc…) In my research i save ENTIRE tweets (~8GB per day), so i can’t answer to this question but instead you can try to save tweets for one day and then check the capacity.


ok, just keep tweets text, now I’m testing it storage capacity. what about rate limits on this?, i run a cron job for this every hour.


this is what happens, I have been monitoring the amount of stored tweets per hour, so every hour i run a cron job with php code that connects Streaming API and start saving tweets… but I’ve noticed is connected only by less than 30 minutes sometimes less, example: php connects 2 oclock, them start saving tweets in database, then disconnect at 2:24… until at 3 oclock that is when cron job starts php again. so, from 2:24 to 2:59 nothing happens and stop storing tweets. what happens, why disconnect?, if i run cron job every half hour or less? how about rate limits? or, is there any function to reconnect? thanks!


Storing isn’t usually the problem, the searching and presenting the database is where you are going to need to make sure you keep track of your resources and may look at a cloud infrastructure. A script I have been running a week that stores tweets and has about 120,000 rows of tweet data is 18MB so it does build up quick. Searching those for specific values is where you will have problems and would be the concern depending on what you are running it on.


good advice boxxa, but still have disconnect problem… i dont know because if i run cron job for connect API every 15 minutes (just for test) is the same because can only connect every hour, at 22:00 was connected only for 8 minutes…then disconnect…if i force the code to restart nothing happens, but at 23:00 can reconnetc and starts storing tweets…to not know when. 8 minutes, 14 minutes, 26 ( maximum time I was connected)




