Apache Kakfa Part II – Ubuntu Installation

Introduction

In the first part of this section, I provided a general overview of Kafka. In this post, we get a bit more hands-on, and install Kafka on an Ubuntu virtual machine.

With Kakfa, we can create multiple types of clusters, such as the following:

  • A single node cluster with a single broker
  • A single node cluster with multiple brokers
  • Multiple nodes clusters with multiple brokers

We will start out implementing the first configuration (single node, single broker), and we will then migrate our setup to a single node, multiple brokers configuration.

Operating System

The configuration I used for this post is very simple:

  • My Host machine is a Dell laptop with 32 GB of memory.
  • I am using Oracle VirtualBox. The guest OS is Ubuntu 14.04 LTS.
  • I allocated 12 GB to the guest OS, which should be plenty for running Kafka in both single- and multi broker setups.

This post picks up right after the OS installation, I had not additional software installed at this time. As a first step, we will need to install Java, after which we can proceed to the Kafka install.

Installing Java

In this case we will install the Java SE edition, since the enterprise edition is not required for a Kafka installation. We will use apt-get for our installation.

First, we need to make sure that the package index is up to date:

Let’s make sure that java is not already installed by running java -version:

If this returns “The program java cannot be found in the following package…“, Java has not been installed yet, in which case we can install it as follows:

This will install the Java Runtime Environment (JRE). If you also need the Java Development Kit (JDK), which is usually needed to compile Java applications (for example Apache Ant, Apache Maven, Eclipse, IntelliJ etc..), then you can you issue the following command:

At this point, Java should be fully installed and ready to go. We need to ensure that we have our JAVA_HOME environment variable set correctly, which is covered in the next section.

Adding the JAVA_HOME environment variable

We want to  make sure that we set the JAVA_HOME variable in a persistent way, so we don’t have to reset it after a login or reboot. Follow these steps to set your JAVA_HOME environment variable:

  • Edit the /etc/environment file with your favorite editor
  • Add the JAVA_HOME statement and point it to the correct location. In this case we want to use /usr/bin, since this is the default installation location for apt-get:

JAVA_HOME

  • Use source to re-load the variables, as is shown in the figure below:

JAVA_HOME_consumed

At this point our Java installation is complete, and we are ready to download Kafka.

Downloading Kafka

To download Kafka, got to: http://kafka.apache.org/downloads.html. I downloaded version 0.8.2.1, with Scale Version 2.10, as is shown in the figure below:

DownloadKafka

On the next page, use the default default mirror site. Once we start the download, the file will be copied to your ~/Downloads directory.

We will install Kafka off our /opt directory. Follow these steps:

  • Go to the /opt directory:

  • Copy the kafka.tgz files from your download directory:

  • Unzip the files:

After this, the kafka directory will look as follows:

KafkaDirectory

Finally, we need to update the /etc/environment file to set the KAKFA_HOME path, and to add Kafka to the PATH environment variable itself:

KAFKA_HOME

Once we source the /etc/environment variable we should be ready to go:

KAFKA_HOME_Consumed

Setting up a Single Node, Single Broker Cluster

Introduction

In the previous section, we installed Kafka on a single machine. In this section we will setup a single node – single broker Kafka cluster, as is shown in the figure below:

SingleNodeKafkaCluster

Start ZooKeeper Service

Like I mentioned in the previous post, Kafka uses ZooKeeper, so  we first need to start  a ZooKeeper server.  The setting for ZooKeeper are maintained in the config/zookeeper.properties file. The more relevant parts of this file are shown below:

By default, the ZooKeeper server will listen on the 2181 TCP port. For detailed information on setting up ZooKeeper servers, please visit  http://zookeeper.apache.org. Start the ZooKeeper server as follows:

The ZooKeeper server will startup, as is shown in the figure below:

ZooKeeperStart

I recommend that you use different terminal windows, that way you can keep track of what is happening across the different servers that you will be starting as part of this post.

Start the Kafka Broker

Now we are ready to start the Kafka broker. The configuration properties of the broker are defined in the config/server.properties configuration file. The most relevant portions of this file are shown below:

Here, we see that the id of the broker is set to zero. All brokers in a cluster will be assigned sequential id’s starting with zero. The broker itself will listen on port 9092, and it will store its logs in the /tmp/kafka-logs directory. We also see that the connectivity string for ZooKeeper, using the same client port 2181.

To start the broker, launch a new terminal window, and issue the following command:

The broker will start, showing its output in the terminal window:

brokerStart

Creating a Kafka Topic

Kafka provides a command line utility named kafka-topcs.sh to create topics on the server. To create a topic named telemetry with a single partition and only one replica. Start a new terminal instance and issue the following command:

The kafka-topic.sh utility will create the topci, and show a successful creation message:

KafkaTopicCreated

In the broker terminal window, we can watch the topic being created:

BrokerTopicCreated

The log for the new topic has been created in /tmp/kafka-logs/ as was specified in the config/server.properties file:

KafkaLogsCreated

To get a list of topics and any Kafka server, we can use the following command in our terminal window:

The utility will output the name of our telemetry topic as is shown below:

KafkaListTopics

At this point, we have:

  1. Started the ZooKeeper server, which provides the coordination services for our Kafka Cluster.
  2. Started a single Kafka broker.
  3. Created a topic

We are now ready  to start sending and receiving messages. Kafka provides some out-of-the-box utilities which can act and message producers and consumers, let’s move on to those next.

Starting a Producer to send messages

Kafka provides users with a command line producer client that accepts input from the command line and publishes them as a message to the Kafka cluster. By default, each new line is published as a new message. Start (yet another) new Terminal window, and issue the following command to start the producer:

The following parameters are required for the producer command line client:

  • The broker list. This is the list of brokers that we want to send the messages to. In this case we only have one broker. The format in which each broker should be specified in <node_address:port>, which is our case is localhost:9092, since we know our broker is listening on port 9092, as specified in config/server.properties.
  • The second parameter is the name of the topic, which in our case is telemetry.

The producer will wait on input from stdin. Go ahead and type a few lines of messages, as is shown below:

KafkaProducer

The default properties for the producer are defined in the  config.producer.properties file. The most important properties are listed below:

The metadata.broker.list property allows us to specify the list of brokers. In our case we only have one broker at port 9092. At the moment, we are not using any compression, so we specify none as the compression codec.

In later posts, we will cover the details of writing producers for Kafka in detail.

Starting a consumer to consume messages

Kafka also provides a command line consumer client for message consumption. The following command is used to start a console-based consumer that shows the output in a terminal window as soon as it registers to the telemetry topic created in the Kafka broker:

When you used to producer in the previous section, you should get the following output:

KafkaConsumer

As one can see, the consumer will each the text typed in the producer. The default properties for the consumer are defined in the config/consumer.properties file. the most important properties are shown below:

We will cover consumer groups and the details of writing consumers in later posts.

Conclusion

By running all four components (zookeeper, broker, producer and consumers) in different terminals, we are able to enter messages from the producer’s terminal and see them appearing in the consumer’s terminal. We now have a very good understanding of the basics of Kafka.

In the next post, we will modify our configuration to use a single node – multiple broker setup.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *