Tuesday, January 26, 2016

Docker: How to start a container on a specific node in a Swarm cluster?

If you are using a Docker Swarm cluster and are wondering how you can start a container on a specific node, this post is for you!

The short answer is to custom label the docker node(daemon):

[root@hdp1 ~]# cat /etc/sysconfig/docker 
OPTIONS="-H tcp://0.0.0.0:2375 --label nodename=n1.manfrix.com"
[root@hdp1 ~]# 
NOTE: If you want to know more about the "label" option of docker daemon, you can refer here.

Make sure you restart the docker service after the changes:

systemctl restart docker

Verify that the daemon is now started with a label:

[root@hdp1 ~]# ps -ef | grep docker
root      50961      1  0 20:38 ?        00:00:02 /usr/bin/docker daemon -H fd:// -H tcp://0.0.0.0:2375 --label nodename=n1.manfrix.com

Make sure you remember to re-join the nodes to the swarm cluster after you restarted docker:

docker run -d swarm join --addr=[IP of the node]:2375 etcd://[IP of etcd host]:[port]/[optional path prefix]

After you have labeled all the nodes (daemons), then you can proceed to test:

[root@hdp1 ~]# docker -H tcp://localhost:9999 run -d --name centos-1 -p 80 -e constraint:nodename==n3.manfrix.com centos /bin/bash
82d42f3052da181ebb876d79e2aeeb68787c17045c625367cced067107f3cb08
[root@hdp1 ~]# docker -H tcp://localhost:9999 run -d --name nginx-1 -p 80 -e constraint:nodename==n2.manfrix.com nginx
68664b5046b1dc031b015c9241a2f16f1e663f0b384d395d810d36b46f317839


For more information about Swarm node constraints, you can refer here.


Friday, January 15, 2016

Docker: /etc/default/docker or /etc/sysconfig/docker does not work anymore under systemd?

If you question why all of the sudden your OPTIONS under /etc/default/docker or /etc/sysconfig/docker no longer work, I can assure you that you are not alone!

Let's put it simply, it is due to the /usr/lib/systemd/system/docker.service shipped with newer version of Docker (most probably v1.7 and above).

The file looks like this now =>
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/docker daemon -H fd://
MountFlags=slave<
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity

[Install]
WantedBy=multi-user.target

 You can see clearly that it no longer contains any environment variable for you to customize stuff.

To fix that, we need to create a systemd drop-in file for docker.service.

(1) Create a directory "/etc/systemd/system/docker.service.d" on ALL servers.

mkdir /etc/systemd/system/docker.service.d

(2)  In the directory, create a file - "local.conf" - using your favorite text editor with the following lines:

[Service]
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// $OPTIONS \
      $DOCKER_STORAGE_OPTIONS \
      $DOCKER_NETWORK_OPTIONS \
      $BLOCK_REGISTRY \
      $INSECURE_REGISTRY

(3) Create or edit the /etc/sysconfig/docker or /etc/sysconfig/docker-storage or /etc/sysconfig/docker-network to specify your customized options.

Eg.
[/etc/sysconfig/docker]
OPTIONS="-H tcp://0.0.0.0:2375 --label nodename=node1.mytestmac.com"


(4) Reload systemd.

systemctl daemon-reload

(5) Verify that docker.service is now aware of its environment files.

systemctl show docker | grep -i env

Eg.
[root@hdp1 ~]# systemctl show docker | grep -i env
EnvironmentFile=/etc/sysconfig/docker (ignore_errors=yes)
EnvironmentFile=/etc/sysconfig/docker-storage (ignore_errors=yes)
EnvironmentFile=/etc/sysconfig/docker-network (ignore_errors=yes)
[root@hdp1 ~]# 

(6) Restart docker.

systemctl restart docker

(7) Verify the changes.

ps -ef | grep docker

Hope that helps to resolve your headaches! :)

Saturday, November 7, 2015

Docker: How to setup Swarm with etcd?

Alright, let me start by sharing some information about my test environment:
(i) 3 nodes (hdp1, hdp2 and hdp3) running CentOS 7.1.1503
(ii) Docker v1.9.0
(iii) etcd 2.1.1 (running only on hdp1) listening for client connection at port 4001

Node "hdp1" will be Swarm Master.

(1) Firstly, let's reconfigure the Docker daemon running on ALL the nodes by adding "-H tcp://0.0.0.0:2375".

Eg.
vi /usr/lib/systemd/system/docker.service
[Amend the line] ExecStart=/usr/bin/docker daemon -H fd:// -H tcp://0.0.0.0:2375


(2) You would have to reload systemctl and restart docker on ALL the nodes after the above changes.

systemctl daemon-reload
systemctl restart docker
[To verify] ps -ef | grep docker

(3)  Make sure "etcd" is running. If not, please start it and make sure it's listening on the intended port (4001 in this case).

systemctl start etcd
[To verify] netstat -an | grep [port] | grep LISTEN

(4) On the other nodes (non-swarm-master - in this case "hdp2" and "hdp3"), execute the following command to join them to the cluster:

docker run -d swarm join --addr=[IP of the node]:2375 etcd://[IP of etcd host]:[port]/[optional path prefix]

WHERE (in this example)
[IP of the node] = IP address of node "hdp2" and "hdp3"
[IP of etcd host] = IP address of node "hdp1" where the only etcd instance is running
[port] = Port that etcd uses to listen to incoming client connection (in this example = 4001)
[optional path prefix] = Path that etcd uses to store data about the registered Swarm nodes

The final command:
docker run -d swarm join --addr=192.168.0.171:2375 etcd://192.168.0.170:4001/swarm
docker run -d swarm join --addr=192.168.0.172:2375 etcd://192.168.0.170:4001/swarm


(5) You can verify that the nodes are registered with the following command:

etcdctl ls /swarm/docker/swarm/nodes


(6) If all nodes are registered successfully, you can now start up the Swarm Master (in this example, on node "hdp1").

docker run -p [host port]:2375 -d swarm manage -H tcp://0.0.0.0:2375 etcd://[IP of etcd host]:4001/swarm


WHERE [in this example]
[host port] = 9999 (or any other free network port you selected - you will use this port to communicate with Swarm Master)
[IP of etcd host = IP address of node "hdp1" where the only etcd instance is running

Eg.

docker run -p 9999:2375 -d swarm manage -H tcp://0.0.0.0:2375 etcd://192.168.0.170:4001/swarm

(7) To verify that the Swarm cluster is now working properly, execute the following command:

docker -H [IP of the host where Swarm Master is running]:[port] info

WHERE (in this example)
[IP of the host where Swarm Master is running] = Node "hdp1" (192.168.0.170)
[port] = 9999 (refer to Step (6) above)

NOTE: You can use any Docker CLI as normal with Swarm cluster =>

docker -H [IP of the host where Swarm Master is running]:[port] ps -a
docker -H [IP of the host where Swarm Master is running]:[port] logs [container id]
docker -H [IP of the host where Swarm Master is running]:[port] inspect [container id]

(8) Let's spin up a container and see how your Swarm cluster handles it.

docker -H [IP of the host where Swarm Master is running]:[port] run -d --name nginx-1 -p 80 nginx


(9) Let's check where the container is running.

docker -H [IP of the host where Swarm Master is running]:[port] ps -a

(10) You can stop the running container by issuing:

docker -H [IP of the host where Swarm Master is running]:[port] stop [container id]



Tuesday, November 3, 2015

Mesos/Kubernetes: How to install and run Kubernetes on Mesos with your local cluster?

First of all, let me share with you my test environment:
(1) CentOS 7.1.1503 (nodes = hdp1, hdp2 and hdp3)
(2) HDP 2.3.2 (re-using the installed Zookeeper)
(3) Docker v1.8.3
(4) golang 1.4.2
(5) etcd 2.1.1

The official documentation for Kubernetes-Mesos integration can be found here. It uses Google Compute Engine (GCE), but this blog entry will share about deploying Kubernetes-Mesos integration on a local cluster.

Ok, let's begin...

Prerequisites

(1) A working local Mesos cluster
NOTE: To build one, please refer to this.

(2) Install Docker on ALL nodes.
(a) Make sure yum has access to the official Docker repository.
(b) Execute "yum install docker-engine"
(c) Enable docker.service with "systemctl enable docker.service"
(d) Start docker.service with "systemctl start docker.service"

(3) Install "golang" on the node which you wish to install and deploy Kubernetes-Mesos integration.
(a) Execute "yum install golang"

(4) Install "etcd" on a selected node (preferably on the node that host the Kubernetes-Mesos integration for testing purposes).
(a) Execute "yum install etcd"
(b) Amend file "/usr/lib/systemd/system/etcd.service" (see below):
[FROM]
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"
[TO]
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --listen-client-urls http://0.0.0.0:4001 --advertise-client-urls http://[node_ip]:4001"
WHERE
[node_ip] = IP Address of the node (hostname -i)

(c) Reload systemctl daemon with "systemctl daemon-reload".
(d) Enable etcd.service with "systemctl enable etcd.service".
(e) Start etcd.service with "systemctl start etcd.service".

***

Build Kubernetes-Mesos

NOTE: Execute the following on the node selected to host the Kubernetes-Mesos integration.
cd [directory to install kubernetes-mesos]
git clone https://github.com/kubernetes/kubernetes
cd kubernetes
export KUBERNETES_CONTRIB=mesos
make

***

Export environment variables

(1) Export the following environment variables:

export KUBERNETES_MASTER_IP=$(hostname -i)
export KUBERNETES_MASTER=http://${KUBERNETES_MASTER_IP}:8888
export MESOS_MASTER=[zk://.../mesos]
export PATH="[directory to install kubernetes-mesos]/_output/local/go/bin:$PATH"

WHERE
[zk://.../mesos] = URL of the zookeeper nodes (Eg. zk://hdp1:2181,hdp2:2181,hdp3:2181/mesos)
[directory to install kubernetes-mesos] = Directory used to perform "git clone" (see "Build Kubernetes-Mesos" above).

(2) Amend .bash_profile to make the variables permanent.
(3) Remember to source the .bash_profile file after amendment (. ~/.bash_profile).

***

Configure and start Kubernetes-Mesos service

(1) Create a cloud config file mesos-cloud.conf in the current directory with the following contents:
$ cat <<EOF >mesos-cloud.conf
[mesos-cloud]
        mesos-master        = ${MESOS_MASTER}
EOF
NOTE:
If you have not set ${MESOS_MASTER}, it should be like (example) "zk://hdp1:2181,hdp2:2181,hdp3:2181/mesos".

(2) Create a script to start all the relevant components (API server, controller manager, and scheduler):

km apiserver \
  --address=${KUBERNETES_MASTER_IP} \
  --etcd-servers=http://${KUBERNETES_MASTER_IP}:4001 \
  --service-cluster-ip-range=10.10.10.0/24 \
  --port=8888 \
  --cloud-provider=mesos \
  --cloud-config=mesos-cloud.conf \
  --secure-port=0 \
  --v=1 >apiserver.log 2>&1 &

sleep 3

km controller-manager \
  --master=${KUBERNETES_MASTER_IP}:8888 \
  --cloud-provider=mesos \
  --cloud-config=./mesos-cloud.conf  \
  --v=1 >controller.log 2>&1 &

sleep 3

km scheduler \
  --address=${KUBERNETES_MASTER_IP} \
  --mesos-master=${MESOS_MASTER} \
  --etcd-servers=http://${KUBERNETES_MASTER_IP}:4001 \
  --mesos-user=root \
  --api-servers=${KUBERNETES_MASTER_IP}:8888 \
  --cluster-dns=10.10.10.10 \
  --cluster-domain=cluster.local \
  --contain-pod-resources=false \
  --v=2 >scheduler.log 2>&1 &

NOTE:
Since CentOS uses systemd, you will hit this issue. Hence, you need to add the "--contain-pod-resources=false" to the scheduler (in bold above).

(3) Give execute permission to the script (chmod 700 <script>).
(4) Execute the script.

***

Validate Kubernetes-Mesos services
$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
# NOTE: Your service IPs will likely differ
$ kubectl get services
NAME             LABELS                                    SELECTOR   IP(S)          PORT(S)
k8sm-scheduler   component=scheduler,provider=k8sm         <none>     10.10.10.113   10251/TCP
kubernetes       component=apiserver,provider=kubernetes   <none>     10.10.10.1     443/TCP
(4) Lastly, look for Kubernetes in the Mesos web GUI by pointing your browser to http://[mesos-master-ip:port]. Go to the Frameworks tab, and look for an active framework named "Kubernetes".

Kubernetest framework is registered with Mesos
***

Let's spin up a pod

(1) Write a JSON pod description to a local file:
$ cat <<EOPOD >nginx.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
EOPOD
(2) Send the pod description to Kubernetes using the "kubectl" CLI:
$ kubectl create -f ./nginx.yaml
pods/nginx
Submitted pod through kubectl
(3) Wait a minute or two while Docker downloads the image layers from the internet. We can use the kubectl interface to monitor the status of our pod:
$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
nginx     1/1       Running   0          14s
(4) Verify that the pod task is running in the Mesos web GUI. Click on the Kubernetes framework. The next screen should show the running Mesos task that started the Kubernetes pod.
Mesos WebGUI shows active Kubernetes task

Mesos WebGUI shows that the Kubernetes task is RUNNING
Click through "Sandbox" link of the task to get to the "executor.log"

An example of "executor.log"

Connected to the node where the container is running

Getting Kubernetes to work on Mesos can be rather challenging at this point of time.

However, it is possible and hopefully, over time, Kubernetes-Mesos integration can work seamlessly.

Have fun!


Saturday, October 10, 2015

Docker: All Containers Get Automatically Updated /etc/hosts (!?!?!?)

While I have always wanted such feature (automatically updated /etc/hosts for all running containers), I understand that Docker does not provide it natively (just yet - or at least AFAIK). I also understand some security issues that might come with such feature (not all running containers want other containers to connect to it).

Anyway, about 2 weeks ago, while I was dockerizing a system automation application that requires at least 2 running nodes (containers), I found that the feature was silently available*.

* I went through the Release Notes of almost all recent releases and could not find such feature being mentioned. If I got it wrong, please point me to the proper Release Notes. Thanks!

Before I forget, let me share with you the reason I have always wanted such feature. My reason is simple - I need all my running containers to know the "existence" of other related containers and have a way to communicate with them (in this case, through /etc/hosts).

My test environment was on CentOS 7.1 and Docker 1.8.2.

(1) Firstly, I started a container without any hostname and container name. You can see that the /etc/hosts file was updated with:
(a) the container ID as the hostname
(b) the container name as the hostname



(2) Next, I started another container with an assigned hostname of "node1". You can see that now the /etc/hosts was updated with:
(a) the assigned hostname
(b) the container name as the hostname


(3) To spice it up a little bit more, I started another container with an assigned hostname and gave the container a name. You can see that the /etc/hosts was updated with:
(a) the assigned hostname
(b) the given container name


(4) The next test was to start up 3 containers with different hostname and given name and left all of them running. You can see that "node1" was started and its /etc/hosts was updated accordingly.


(5) Next, I started "node2" and its /etc/hosts was updated with details of "node1" too.


(6) What happened to the /etc/hosts of "node1" at this moment? Surprise, surpirse...you can see that its /etc/hosts was updated with details of "node2" too.


(7) Just to make sure it's not devaju. I started the third container ("node1" and "node2" were still running). This time I wasn't surprised to see that the /etc/hosts of "node3" was updated with details of "node1" and "node2".


(8) Lastly, let's check the /etc/hosts of "node1" and "node2". Viola, they are updated too!


Seriously, I am not sure whether this feature has been available for some time or it is an experimental feature. Anyway, I like it...for the thing I do! So, I am not speaking for you :)

Sunday, October 4, 2015

Docker: Sharing Kernel Module

This is more like a request-for-help entry :)

I was working on a small project to dockerize a system automation application that supports clustering. Everything was working fine until the very last minute - during system startup!!!

The database was working fine in a primary-standby setup. The system automation application was running fine on both of the nodes in the cluster. Configuration went fine too.

However, when I started the automation monitoring, KABOOOOMMM, problem happened!

What happened was that the first node came up fine, but the second node seems to be reporting weird status. A quick read of the log file told me that the second node was unable to load the "softdog" kernel module and would not be able to initialize. The reason is very likely because the first node has already gotten hold of the kernel module.

Ok, I have to admit that this is a silly mistake I made and an oversight before I started the project.

Anyway, I would like to share what I learned from this experience:
(1) It is possible to have access to kernel module by:
(a) mounting the /lib/modules directory read-only into the container (-v /lib/modules:/lib/modules:ro).
(b) the kernel version of the host machine must be the same with the kernel version of the image/container.
(c) run the container with --privileged option.

For those people out there that are trying to dockerize applications that require kernel module, please be reminded that all containers on a same host would share the same underlying (host) kernel. Hence, you have to make sure that the application would still work under such condition.

I am not a kernel expert and have not ventured into this area in Docker previously, so any help is appreciated - SOS!!!!!

Saturday, September 12, 2015

Running Apache Spark on Mesos

Running Apache Spark on Mesos is easier than I thought!

I have 3 nodes at home (hdp1, hdp2 and hdp3) which are running Hortonworks Data Platform (HDP). I have HDFS, Zookeeper and Spark installed on the cluster.

To test running Spark on Mesos, I decided to reuse the cluster for simplicity's sake.

All I did was:
(1) install Mesos master on node hdp1.
(2) install Mesos slave on node hdp2 and hdp3.
(3) configure the master and slaves accordingly.

NOTE:
(i) For more information about the installation and configuration of Mesos, you can refer to this blog entry.
(ii) I reuse Zookeeper that was installed with HDP.

Once Mesos is up and running, I decided to carry out a very simple test that is described as follow:
(1) Use "spark-shell" as the Spark client to connect to the Mesos cluster.
(2) Load a sample text file into HDFS.
(3) Use spark-shell to perform a line count on the loaded text file.

First, let's see what is the command that you can use to connect spark-shell to a running Mesos cluster.

TIPS: All you need to do is to use the "--master" option to point to the Mesos cluster at "mesos://<IP-or-hostname of Mesos master>:5050".




Then, let's load a sample text file into HDSF using the "hadoop fs -put" command.




Once the sample text file is loaded, let's create a spark RDD using it.

TIPS: You can use the "sc.textFile" function and points it to "hdfs://<namenode>:8020/<path to file>".




After the RDD is created, let's run a count on it using the "count()" function.

You can see from the screenshot above and below that, Spark (through Mesos) has submitted 2 tasks that are executed on node hdp2 and hdp3 respectively.

You can also see from the screenshot above that the end result is returned as "4" (which means 4 lines in the file - which is correct!).






So, that is how easy it is to run Spark on Mesos.

Hope that you are going to try it out!