Friday, February 27, 2015

Apache Mesos: When All Becomes One

First of all, for the benefits of those unfamiliar with Apache Mesos, this is the "what is" taken from its official website:

What is Mesos?

A distributed systems kernel

Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments.

Beside the fact that I have always wanted to learn a technology that can merge all the resources available on my multiple machines into one, I am out to learn Apache Mesos for the following 2 reasons (currently):
(1) Google Kubernetes
(2) Docker

Installing Apache Mesos isn't too hard if you follow the instruction available on its official website, but I would like to share an alternative installation method which I found is more straightforward:

* Instructions only suitable for RHEL/CentOS 6 (tested on CentOS 6.6). For other platforms, refer here.
** Run all instructions as 'root' user for simplicity.

(1) On the node or VM image that you would like to designate as the Master and all slave nodes, execute the following command to create the Mesosphere repository:

rpm -Uvh

(2) On the Master and all the slave nodes, install Mesos:

yum -y install mesos

(3) Even though you only plan to have a single Master node, it is advisable to install Zookeeper (just in case you want to expand in the future):

rpm -Uvh 

yum -y install zookeeper-server

* You can install the Zookeeper server on either the Master node (preferred for ease of maintenance) or any of the slave node.
** You need to have Java installed for Zookeeper to work properly.

(4) On the Master node, initialize Zookeeper:

service zookeeper-server init 

echo 1 | sudo tee -a /var/lib/zookeeper/myid >/dev/null

(5) On the Master node, stop and disable mesos-slave:

initctl stop mesos-slave

cd /etc/init/ 

mv mesos-slave.conf mesos-slave.disable

(6) On all the slave nodes, stop and disable mesos-master:

initctl stop mesos-master

cd /etc/init/ 

mv mesos-master.conf mesos-master.disable

(7) On the Master node, set the IP address:

echo <IP of the Master node> | sudo tee /etc/mesos-master/ip

(8) On the Master node, set the name of the cluster:

echo <cluster name> | sudo tee /etc/mesos-master/cluster 

(9) On the Master and all slave nodes, set the URL of the Zookeeper server:

echo zk://<IP of the Zookeeper server>:2181/mesos | sudo tee /etc/mesos/zk

(10) On all the slave nodes, set their respective IP address:

echo <IP of the Slave node>  | sudo tee /etc/mesos-slave/ip  

(11) On the Master node, restart mesos-master and Zookeeper (if it is installed there):

service zookeeper restart 

initctl restart mesos-master

(12) On all the slave nodes, restart mesos-slave:

initctl restart mesos-slave

(12) Verify that the Master is running and all slaves are registered with it:

http://<IP of the Master node>:5050

When the Master is first initialized
When the first slave joined
When the second slave joined
When the third slave joined
All the slaves

Monday, February 16, 2015

Docker: How to Create Your Own Base Image (CentOS/RHEL)

Normally, we will pull base images from the Docker Hub registry to build our own images.

However, there might be times when you want more control over the base image (size, packages, etc.). Luckily, Docker provides a way to do that.

For general information about building your own image, refer here. Since I am a fan of CentOS/RHEL, I would normally go here. For a more "friendly" version of the script, go here.

To create your own CentOS/RHEL image, follow these instructions:
(1) Copy or download the script to a running CentOS/RHEL system with yum properly setup. 
NOTE: I tested the script on CentOS 6.5.

#!/usr/bin/env bash
# Create a base CentOS Docker image.
# This script is useful on systems with yum installed (e.g., building
# a CentOS image on CentOS).  See contrib/ for a way
# to build CentOS images on other systems.

usage() {
    cat <<EOOPTS
  -y <yumconf>  The path to the yum config to install packages from. The
                default is /etc/yum.conf.
    exit 1

# option defaults
while getopts ":y:h" opt; do
    case $opt in
            echo "Invalid option: -$OPTARG"
shift $((OPTIND - 1))

if [[ -z $name ]]; then


target=$(mktemp -d --tmpdir $(basename $0).XXXXXX)

set -x

mkdir -m 755 "$target"/dev
mknod -m 600 "$target"/dev/console c 5 1
mknod -m 600 "$target"/dev/initctl p
mknod -m 666 "$target"/dev/full c 1 7
mknod -m 666 "$target"/dev/null c 1 3
mknod -m 666 "$target"/dev/ptmx c 5 2
mknod -m 666 "$target"/dev/random c 1 8
mknod -m 666 "$target"/dev/tty c 5 0
mknod -m 666 "$target"/dev/tty0 c 4 0
mknod -m 666 "$target"/dev/urandom c 1 9
mknod -m 666 "$target"/dev/zero c 1 5

yum -c "$yum_config" --installroot="$target" --releasever=/ --setopt=tsflags=nodocs \
    --setopt=group_package_types=mandatory -y groupinstall Core
yum -c "$yum_config" --installroot="$target" -y clean all

cat > "$target"/etc/sysconfig/network <<EOF

# effectively: febootstrap-minimize --keep-zoneinfo --keep-rpmdb
# --keep-services "$target".  Stolen from
#  locales
rm -rf "$target"/usr/{{lib,share}/locale,{lib,lib64}/gconv,bin/localedef,sbin/build-locale-archive}
#  docs
rm -rf "$target"/usr/share/{man,doc,info,gnome/help}
#  cracklib
rm -rf "$target"/usr/share/cracklib
#  i18n
rm -rf "$target"/usr/share/i18n
#  sln
rm -rf "$target"/sbin/sln
#  ldconfig
rm -rf "$target"/etc/
rm -rf "$target"/var/cache/ldconfig/*

if [ -r "$target"/etc/redhat-release ]; then
    version="$(sed 's/^[^0-9\]*\([0-9.]\+\).*$/\1/' "$target"/etc/redhat-release)"

if [ -z "$version" ]; then
    echo >&2 "warning: cannot autodetect OS version, using '$name' as tag"

tar --numeric-owner -c -C "$target" . | docker import - $name:$version
docker run -i -t $name:$version echo success

rm -rf "$target" 

(2) Save the script and give it execute permission (chmod 700 <script>).

(3) Login as 'root'.

(4) Amend the script as necessary to suit your purpose (eg. more packages, etc).

(5) Execute the script (eg. centos).
WHERE "centos" is the name that you would like the resultant image to have
NOTE: The script will tag the image using the version of the OS (or the "name" provided if the version cannot be determined).

(6) Once the script returns, run "docker images" to confirm the image is created.

(7) Verify that the image is ok by launching a container using "docker run -t -i <image> /bin/bash".

If everything works out fine, you would now have your custom built base image!

Wednesday, February 11, 2015

Apache Storm: Integration with Kafka using Kafka Spout

If you have been playing with either Apache Kafka or Apache Storm, you would have read so much articles about integration between the two. From my experience, reading too much can be a bad thing sometimes (pun intended :). In this case, there were multiple efforts that try to offer such integration. Thus, it might caused confusion about which is the best or standard way to do it.

It is good to know that starting from version 0.9.2-incubating, Apache Storm has decided to include such support officially. Read more here.

Anyway, how does such integration work?

In this blog entry, I am only going to share information about using Kafka as a Storm spout. Yes, starting from Storm version 0.9.3, you can use Kafka as a bolt too. If you want to know more about Topology, Spout and Bolt, read this.

Basically, the classes you need for the Storm-Kafka integration are available under storm.kafka.* package. 

If you want to get up to speed quick, try out the sandbox offered by Hortonworks here. After you have downloaded the sandbox (or if you are gutsy enough to install the system through Ambari), it is advisable to try out the tutorial too. If you want to jump straight to the tutorial related to the Storm-Kafka integration, you can go here. Please take note that the tutorial contains the source codes too, so make sure you check them out!

Once you get a hang of it, you can move over to this website to learn more about Storm Kafka.

If you do not want to compile the Storm Kafka package yourself, you can download it from the Hortonworks maven repository.

The information offered here should get you going for a while, and I will share some tips and traps regarding the integration in future entries.

Happy hacking!

Monday, February 9, 2015

Docker: Why delete does not reduce the image size?

If you wonder why your Docker image does not shrink in size after you have deleted some very large files, then you might want to read further for a brief explanation and solution!

In short, the cause of all these drama is the union filesystem (read here). If you work on an image and have performed multiple commits (either through DockerFile or manually), then you might run into this problem later in the stage. The large files might have been committed in one of the layers during the earlier stage.

(1) We have this image called "centos63:omnibus731fp8-hm2000" which is about 2.11GB in size.

(2) Next, let us run the image and delete a very large directory in there.

(3) To persist the change, I committed the image and tagged the new image as "centos63:omnibus731fp8-hm2000-rm".

(4) Surprise, surprise! The size is still 2.11GB!?!?

(5) What happened? Let's check it out. Is the large directory still there? Nope!!!

(6) Then why didn't the "rm" work?

You can see that the large directory was most probably (sorry I can't tell for sure because the image was created with a previous very bad habit of manual commits :) added in image "28729cfb27de" that was created very early in the stage. Ever since, there were multiple commits (represented by those multiple different images in the history). 

Another thing to pay attention to is the "rm" command that was persisted in image "ae11f957b955" (latest - highlighted in BLUE box) and the size was only 7 bytes!!! It shows clearly that Docker has only recorded the "rm" command and not the result of it.

That is the "power" of an union filesystem. But sorry, your large directory was still technically in there.

(7) So, what now? How do you trim those extra fat off? The answer is with this awesome tool called docker-squash!

I executed the tool with an option to name the output image "centos63:omnibus731fp8-hm2000-squashed".

You can see how the tool successfully determined all the layers belonging to the image (check the UUID).

(8) From the output of the tool, it seems that it has successfully trimmed down the image.

(9) Let's verify whether the new image "omnibus731fp8-hm2000-squashed" is significantly smaller in size. 

It seems that the tool does deliver after all! :)

I have read that Docker Inc is working on having this feature (shrinking the image size) built-in. So, while waiting for it, you have this tool!

Thursday, February 5, 2015

Java: Telnet Client

I was going through my personal projects in search for a topic to write for this blog when I stumbled upon this very old codes.

I wrote this back in Year 2003 and that was 12 years ago!

Back then, I was learning Java and somehow got hooked on reading RFCs. So, I guess it was natural for me to write a Telnet Client based on Telnet RFCs. I have to admit upfront that I did not implement all of them, but basic things do work :)


Just for the fun of it, I installed a telnet server on a CentOS 7 VM image and had the Telnet Client connects to it.

Guess what? It still works (phew :).


If you want to have a good laugh on some codes written by a Java newbie back in Year 2003, you can download it here.

Happy hacking!

Monday, February 2, 2015

Apache Kafka: A simple producer

If you are into big data and analytic, you must have heard of Apache Kafka lately.

I have and set out to learn this cool technology.

Before proceeding further, let me share a little bit about Apache Kafka (excerpt taken from its official website):
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.

In a summary, this is what I did:
(1) Installed Hortonworks HDP 2.2 (with Ambari)
(2) Installed Apache Kafka (and all required components) through Ambari
(3) Configured 3 Kafka brokers.
(4) Downloaded a large sample data set (about 10GB).
NOTE: If you would like to download some sample data sets, you can refer here.
(5) Wrote a simple producer (see below).
(6) Executed the producer to load data into Kafka.
(7) Executed the console consumer to check on the data.
Eg. --topic datatest --zookeeper hdp1:2181 --from-beginning

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties; 
import kafka.producer.*;
import kafka.javaapi.producer.*;
public class KafkaProducer {
 private final static String TOPIC = "datatest";
 private final static String DELIMITER = "~~";

 public static void main(String[] argv)
  Properties properties = new Properties();

  ProducerConfig producerConfig = new ProducerConfig(properties);
  kafka.javaapi.producer.Producer<String, String> producer = new kafka.javaapi.producer.Producer<String, String>(producerConfig);

  KeyedMessage<String, String> message = null;

   FileReader fr = new FileReader(new File("/opt/data/movies.txt"));
   BufferedReader br = new BufferedReader(fr);
   String s = null;
   String msg = null;
   long ctr = 0;

   while ((s = br.readLine()) != null)
    if (s.length() != 0)
     if (msg == null)
      msg = s;
      msg = msg + DELIMITER + s;
     message = new KeyedMessage<String, String>(TOPIC, msg);
     msg = null; 
     message = null;
  catch (IOException fie)