Thursday, February 27, 2014

Apache Camel meets Kafka!

 LinkedIn engineers in their work in [1] have introduced Kafka, a distributed messaging system that was initially developed for collecting and delivering high volumes of log data with low latency, which was later on donated to Apache Software Foundation (ASF).

 Kafkadue to its architecture, performance and scalability characteristics, has proven to be revolutionary in today’s messaging technologies and has been used with success in several domains and projects - most notably among them the LinkedIn’s real-time activity data pipeline [2].

 It is the Kafka design (more details and background are given in the relevant paper[1]) along with benchmarks, comparisons and discussion found in the community, the reason that I wanted to get involved and learn more about this technology. In the process to learn more about Kafkahow it works and create some toy projetcs , I started also working on an Apache Camel component.

 Apache Camel is another ASF project, one of my most favourite ones, which I consider as an absolute must in many cases, especially when it comes to distributed systems and more over to integration projects.

 This way, I decided to bring together Apache Kafka with Apache Camel, in a similar way like I was doing in my previous work on Camel-Gora component for NoSQL databases.

 All in all, this post is about announcing camel-kafka and providing links to the source code and wiki  pages.  Camel-kafka component gives the ability to utilize Apache Kafka through camel and therefore integrate it in your stack.

Note: documentation and examples to come soon!

References 

[1] Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log processing.Proceedings of the NetDB. 2011.

[2] Goodhope, Ken, et al. "Building LinkedIn's Real-time Activity Data Pipeline."IEEE Data Eng. Bull. 35.2 (2012): 33-45.In 


3 comments:

  1. Hi Ioannis
    I was trying to use your component in one of the example projects I am trying. I am very new to Kafka and Camel But I am not sure what is the Syntax on this. I used the route .to("kafka:localhost:9092?topic=kafkatopic") I get the error No Partition Key set. Am I missing something here? Thanks
    Lakchani

    ReplyDelete
  2. Apologies to the delay to answer.

    This is not the official Apache Camel component (i have contributed the code and lets hope i.e CAMEL-7339 :D). The syntax is a bit different to the official.

    The official component's syntac is : kafka:{SERVER}?zkConnect={ZOOKEEPER}&topic={TOPIC}
    While mine is : kafka:{TOPIC}?metadataBrokerList={SERVER}&zkConnect={ZOOKEEPER}

    This mean that after the from/to: you need to include the topic name and not the server. For examples and use case you could refer to the integration tests @ https://github.com/ipolyzos/camel-kafka/tree/master/src/test/java/org/apache/camel/component/kafka/itests.

    Hope you enjoy using it!

    ReplyDelete
  3. Documentation updated, you could refer to https://github.com/ipolyzos/camel-kafka/wiki for more.

    ReplyDelete