Wednesday, February 11, 2015

Apache Storm: Integration with Kafka using Kafka Spout

If you have been playing with either Apache Kafka or Apache Storm, you would have read so much articles about integration between the two. From my experience, reading too much can be a bad thing sometimes (pun intended :). In this case, there were multiple efforts that try to offer such integration. Thus, it might caused confusion about which is the best or standard way to do it.

It is good to know that starting from version 0.9.2-incubating, Apache Storm has decided to include such support officially. Read more here.

Anyway, how does such integration work?

In this blog entry, I am only going to share information about using Kafka as a Storm spout. Yes, starting from Storm version 0.9.3, you can use Kafka as a bolt too. If you want to know more about Topology, Spout and Bolt, read this.

Basically, the classes you need for the Storm-Kafka integration are available under storm.kafka.* package. 

If you want to get up to speed quick, try out the sandbox offered by Hortonworks here. After you have downloaded the sandbox (or if you are gutsy enough to install the system through Ambari), it is advisable to try out the tutorial too. If you want to jump straight to the tutorial related to the Storm-Kafka integration, you can go here. Please take note that the tutorial contains the source codes too, so make sure you check them out!

Once you get a hang of it, you can move over to this website to learn more about Storm Kafka.

If you do not want to compile the Storm Kafka package yourself, you can download it from the Hortonworks maven repository.

The information offered here should get you going for a while, and I will share some tips and traps regarding the integration in future entries.


Happy hacking!

No comments:

Post a Comment