download

What is Apache Kafka ?

No, Kafka is not only the famous author (en.wikipedia.org/wiki/Franz_Kafka), it’s an open-source distributed pub-sub messaging system with powerful skills like scalability and fault tolerance. It’s also a stream processing platform (near real-time) for the streaming datasources. The design of Apache Kafka is strongly influenced by the commit logs. Apache Kafka was originally developed by Linkedin and was subsequently open sourced in early 2011.

The installation is pretty simple but need to be rigorous .

Binaries installation

    • Prerequisites
      Get a Linux server (I have chosen Centos 7.3.1611), it could run on a small config. (memory 1G min.)
      Connect as a sudo user or root
    • Update your system and reboot
      [root@osboxes ~]# yum update
      Loaded plugins: fastestmirror, langpacks
      Loading mirror speeds from cached hostfile
       * base: mirror.switch.ch
       * epel: mirror.uni-trier.de
       * extras: mirror.switch.ch
       * updates: mirror.switch.ch
      No packages marked for update
  • Install the latest openjdk and set your environment
    [root@osboxes ~]# yum install java-1.8.0-openjdk
    Loaded plugins: fastestmirror, langpacks
    Loading mirror speeds from cached hostfile
     * base: mirror.switch.ch
     * epel: mirror.imt-systems.com
     * extras: mirror.switch.ch
     * updates: mirror.switch.ch
    Package 1:java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64 already installed and latest version
    Nothing to do
    
    #Check it:
    [root@osboxes ~]# java -version
    openjdk version "1.8.0_131"
    OpenJDK Runtime Environment (build 1.8.0_131-b12)
    OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)
    
    
    #Update your bash_profile:
    export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
    export JRE_HOME=/usr/lib/jvm/jre
    # and source your profile:
    [root@osboxes ~]# . ./.bash_profile
    [root@osboxes ~]# echo $JAVA_HOME
    /usr/lib/jvm/jre-1.8.0-openjdk
    [root@osboxes ~]# echo $JRE_HOME
    /usr/lib/jvm/jre
    
  • The Confluent Platform is an open source platform that contains all the components you need
    to create a scalable data platform built around Apache Kafka.
    Confluent Open Source is freely downloadable.
    Install the public key from Confluent

    rpm --import http://packages.confluent.io/rpm/3.2/archive.key
  • Add the confluent.repo to your /etc/yum.repos.d with this content
    [Confluent.dist]
    name=Confluent repository (dist)
    baseurl=http://packages.confluent.io/rpm/3.2/7
    gpgcheck=1
    gpgkey=http://packages.confluent.io/rpm/3.2/archive.key
    enabled=1
    
    [Confluent]
    name=Confluent repository
    baseurl=http://packages.confluent.io/rpm/3.2
    gpgcheck=1
    gpgkey=http://packages.confluent.io/rpm/3.2/archive.key
    enabled=1
  • Clean your yum caches
    yum clean all
  • And finally install the open source version of Confluent
    yum install confluent-platform-oss-2.11
    Transaction Summary
    ============================================================================================================================================================================
    Install  1 Package (+11 Dependent packages)
    
    Total download size: 391 M
    Installed size: 446 M
    Is this ok [y/d/N]: y
    Downloading packages:
    (1/12): confluent-common-3.2.1-1.noarch.rpm                                                                                                          | 2.0 MB  00:00:06
    (2/12): confluent-camus-3.2.1-1.noarch.rpm                                                                                                           |  20 MB  00:00:28
    (3/12): confluent-kafka-connect-elasticsearch-3.2.1-1.noarch.rpm                                                                                     | 4.3 MB  00:00:06
    (4/12): confluent-kafka-2.11-0.10.2.1-1.noarch.rpm                                                                                                   |  38 MB  00:00:28
    (5/12): confluent-kafka-connect-jdbc-3.2.1-1.noarch.rpm                                                                                              | 6.0 MB  00:00:07
    (6/12): confluent-kafka-connect-hdfs-3.2.1-1.noarch.rpm                                                                                              |  91 MB  00:01:17
    (7/12): confluent-kafka-connect-s3-3.2.1-1.noarch.rpm                                                                                                |  92 MB  00:01:18
    (8/12): confluent-kafka-rest-3.2.1-1.noarch.rpm                                                                                                      |  16 MB  00:00:16
    (9/12): confluent-platform-oss-2.11-3.2.1-1.noarch.rpm                                                                                               | 6.7 kB  00:00:00
    (10/12): confluent-rest-utils-3.2.1-1.noarch.rpm                                                                                                     | 7.1 MB  00:00:06
    (11/12): confluent-schema-registry-3.2.1-1.noarch.rpm                                                                                                |  27 MB  00:00:23
    (12/12): confluent-kafka-connect-storage-common-3.2.1-1.noarch.rpm                                                                                   |  89 MB  00:01:08
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                                                                                                       2.2 MB/s | 391 MB  00:03:00
    Running transaction check
    Running transaction test
    Transaction test succeeded
    Running transaction
      Installing : confluent-common-3.2.1-1.noarch                                                                                                                         1/12
      Installing : confluent-kafka-connect-storage-common-3.2.1-1.noarch                                                                                                   2/12
      Installing : confluent-rest-utils-3.2.1-1.noarch                                                                                                                     3/12
      Installing : confluent-kafka-rest-3.2.1-1.noarch                                                                                                                     4/12
      Installing : confluent-schema-registry-3.2.1-1.noarch                                                                                                                5/12
      Installing : confluent-kafka-connect-s3-3.2.1-1.noarch                                                                                                               6/12
      Installing : confluent-kafka-connect-elasticsearch-3.2.1-1.noarch                                                                                                    7/12
      Installing : confluent-kafka-connect-jdbc-3.2.1-1.noarch                                                                                                             8/12
      Installing : confluent-kafka-connect-hdfs-3.2.1-1.noarch                                                                                                             9/12
      Installing : confluent-kafka-2.11-0.10.2.1-1.noarch                                                                                                                 10/12
      Installing : confluent-camus-3.2.1-1.noarch                                                                                                                         11/12
      Installing : confluent-platform-oss-2.11-3.2.1-1.noarch                                                                                                             12/12
      Verifying  : confluent-kafka-connect-storage-common-3.2.1-1.noarch                                                                                                   1/12
      Verifying  : confluent-platform-oss-2.11-3.2.1-1.noarch                                                                                                              2/12
      Verifying  : confluent-rest-utils-3.2.1-1.noarch                                                                                                                     3/12
      Verifying  : confluent-kafka-connect-elasticsearch-3.2.1-1.noarch                                                                                                    4/12
      Verifying  : confluent-kafka-connect-s3-3.2.1-1.noarch                                                                                                               5/12
      Verifying  : confluent-kafka-rest-3.2.1-1.noarch                                                                                                                     6/12
      Verifying  : confluent-camus-3.2.1-1.noarch                                                                                                                          7/12
      Verifying  : confluent-kafka-connect-jdbc-3.2.1-1.noarch                                                                                                             8/12
      Verifying  : confluent-schema-registry-3.2.1-1.noarch                                                                                                                9/12
      Verifying  : confluent-kafka-2.11-0.10.2.1-1.noarch                                                                                                                 10/12
      Verifying  : confluent-kafka-connect-hdfs-3.2.1-1.noarch                                                                                                            11/12
      Verifying  : confluent-common-3.2.1-1.noarch                                                                                                                        12/12
    
    Installed:
      confluent-platform-oss-2.11.noarch 0:3.2.1-1
    
    Dependency Installed:
      confluent-camus.noarch 0:3.2.1-1                           confluent-common.noarch 0:3.2.1-1                           confluent-kafka-2.11.noarch 0:0.10.2.1-1
      confluent-kafka-connect-elasticsearch.noarch 0:3.2.1-1     confluent-kafka-connect-hdfs.noarch 0:3.2.1-1               confluent-kafka-connect-jdbc.noarch 0:3.2.1-1
      confluent-kafka-connect-s3.noarch 0:3.2.1-1                confluent-kafka-connect-storage-common.noarch 0:3.2.1-1     confluent-kafka-rest.noarch 0:3.2.1-1
      confluent-rest-utils.noarch 0:3.2.1-1                      confluent-schema-registry.noarch 0:3.2.1-1
    
    Complete!

Ok , the binaries are installed now. The next operation will be to configure and launch Zookeeper and Kafka itself !

    • First , take a look at the Zookeeper configuration :
      [root@osboxes kafka]# cat /etc/kafka/zookeeper.properties
      # Licensed to the Apache Software Foundation (ASF) under one or more
      # contributor license agreements.  See the NOTICE file distributed with
      # this work for additional information regarding copyright ownership.
      # The ASF licenses this file to You under the Apache License, Version 2.0
      # (the "License"); you may not use this file except in compliance with
      # the License.  You may obtain a copy of the License at
      #
      #    http://www.apache.org/licenses/LICENSE-2.0
      #
      # Unless required by applicable law or agreed to in writing, software
      # distributed under the License is distributed on an "AS IS" BASIS,
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      # See the License for the specific language governing permissions and
      # limitations under the License.
      # the directory where the snapshot is stored.
      dataDir=/var/lib/zookeeper
      # the port at which the clients will connect
      clientPort=2181
      # disable the per-ip limit on the number of connections since this is a non-production config
      maxClientCnxns=0
    • Don’t change the configuration file (the default values are okay to start with)  and launch Zookeeper
      /usr/bin/zookeeper-server-start /etc/kafka/zookeeper.properties
      ...
      [2017-06-08 14:05:02,051] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
    • Keep the session with Zookeeper and open a new terminal for the Kafka part
      /usr/bin/kafka-server-start /etc/kafka/server.properties
      ...
      [2017-06-08 14:11:31,333] INFO Kafka version : 0.10.2.1-cp1 (org.apache.kafka.common.utils.AppInfoParser)
      [2017-06-08 14:11:31,334] INFO Kafka commitId : 80ff5014b9e74a45 (org.apache.kafka.common.utils.AppInfoParser)
      [2017-06-08 14:11:31,335] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
      [2017-06-08 14:11:31,350] INFO Waiting 10062 ms for the monitored broker to finish starting up... (io.confluent.support.
      metrics.MetricsReporter)
      [2017-06-08 14:11:41,413] INFO Monitored broker is now ready (io.confluent.support.metrics.MetricsReporter)
      [2017-06-08 14:11:41,413] INFO Starting metrics collection from monitored broker... (io.confluent.support.metrics.Metric
      sReporter)
    • Like Zookeeper , let the Kafka Terminal open and launch a new session for the topic creation.

 Topic creation

  • Messages in Kafka are categorized into Topics, it’s like a db table or a directory in a file system.
    At first , we are going to create a new topic.

    [root@osboxes ~]# /usr/bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic dbi
    Created topic "dbi".
  • Check if the topic has been effectively created
    [root@osboxes ~]# /usr/bin/kafka-topics --list --zookeeper localhost:2181
    dbi
  • 
    

    Nice , we can now produce some messages using the topic “dbi”

    [root@osboxes ~]# kafka-console-producer --broker-list localhost:9092 --topic dbi
    be passionate
    be successful
    be responsible
    be sharing
  • Open a new terminal and act like a consumer with the console
    /usr/bin/kafka-console-consumer --zookeeper localhost:2181 --topic dbi --from-beginning
    be passionate
    be successful
    be responsible
    be sharing
  • Et voilà ! the messages produced with the producer appeared now in the consumer windows. You can type a new message in the producer console , it will display immediately in the other terminal.If you want to stop all the consoles , you can press Ctrl-C.

Now the most difficult thing is still to be done, configure Kafka with multiple producers / consumers within a complex broker topology.

cluster_architecture