My current project is working with Druid - https://github.com/metamx/druid/. Currently, we are in the benchmarking phase and would like to use tweets as sample data.
Firehose is relevant because Druid handles realtime ingestion of data through a firehose data stream (that gets pushed onto kafta queues). Ultimately, I was wondering whether Twitter would allow people such as myself access to the firehose api to run these benchmarks? Maybe there is some way where I can connect Twitter’s firehose to Druid? If not, what is the best way (if there is one) to get access to a data set of tweets (~ 50gb?)
Thank you very much for your time.
Jay