Use of Twitter Firehose API?


My current project is working with Druid - Currently, we are in the benchmarking phase and would like to use tweets as sample data.

Firehose is relevant because Druid handles realtime ingestion of data through a firehose data stream (that gets pushed onto kafta queues). Ultimately, I was wondering whether Twitter would allow people such as myself access to the firehose api to run these benchmarks? Maybe there is some way where I can connect Twitter’s firehose to Druid? If not, what is the best way (if there is one) to get access to a data set of tweets (~ 50gb?)

Thank you very much for your time.



It can be difficult to obtain Firehose permissions.

We do have the sample hose which is a great alternative for a project like this: See [node:10390].