Tweet reference corpus


Twitter and tweets are a great subject for (linguistic) research and the api allows me to stream tweets and focus my research on those that match my criterias. For example i could link in the stream and calculate the average length of tweets in english language sended thanksgiving around 8pm. I think that linguistic have far more interesting questions but i think that using the api is a big problem for some non digital natives. What they need would be a large corpus with a decent ui to filter relevant tweets. I understand and agree that i am not allow to stream and save tweets and texts. But saving the tweet-ids for a later re-retrieval using the api should be fine? If so, can I save anonymous data together with the tweet-ids (i.e.: date, language, word count). That would allow to filter out a subset of relevant tweets without having to store the tweets itself which would be fetched afterwards.

I really looking forward to build such an UI to allow others to create personal tweet samples and would appreciate to here if such a system would work with in the rules of the road.