Anything Linux, Big Data, Containerization, Graph Database and Cloud: integration

Showing posts with label integration. Show all posts

Tuesday, November 3, 2015

Mesos/Kubernetes: How to install and run Kubernetes on Mesos with your local cluster?

First of all, let me share with you my test environment:
(1) CentOS 7.1.1503 (nodes = hdp1, hdp2 and hdp3)
(2) HDP 2.3.2 (re-using the installed Zookeeper)
(3) Docker v1.8.3
(4) golang 1.4.2
(5) etcd 2.1.1

The official documentation for Kubernetes-Mesos integration can be found here. It uses Google Compute Engine (GCE), but this blog entry will share about deploying Kubernetes-Mesos integration on a local cluster.

Ok, let's begin...

Prerequisites

(1) A working local Mesos cluster
NOTE: To build one, please refer to this.

(2) Install Docker on ALL nodes.
(a) Make sure yum has access to the official Docker repository.
(b) Execute "yum install docker-engine"
(c) Enable docker.service with "systemctl enable docker.service"
(d) Start docker.service with "systemctl start docker.service"

(3) Install "golang" on the node which you wish to install and deploy Kubernetes-Mesos integration.
(a) Execute "yum install golang"

(4) Install "etcd" on a selected node (preferably on the node that host the Kubernetes-Mesos integration for testing purposes).
(a) Execute "yum install etcd"
(b) Amend file "/usr/lib/systemd/system/etcd.service" (see below):


[FROM]
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"
[TO]
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --listen-client-urls http://0.0.0.0:4001 --advertise-client-urls http://[node_ip]:4001"
WHERE
[node_ip] = IP Address of the node (hostname -i)

(c) Reload systemctl daemon with "systemctl daemon-reload".
(d) Enable etcd.service with "systemctl enable etcd.service".
(e) Start etcd.service with "systemctl start etcd.service".

***

Build Kubernetes-Mesos

NOTE: Execute the following on the node selected to host the Kubernetes-Mesos integration.

cd [directory to install kubernetes-mesos]
git clone https://github.com/kubernetes/kubernetes
cd kubernetes
export KUBERNETES_CONTRIB=mesos
make

***

Export environment variables

(1) Export the following environment variables:

export KUBERNETES_MASTER_IP=$(hostname -i)
export KUBERNETES_MASTER=http://${KUBERNETES_MASTER_IP}:8888
export MESOS_MASTER=[zk://.../mesos]
export PATH="[directory to install kubernetes-mesos]/_output/local/go/bin:$PATH"

WHERE
[zk://.../mesos] = URL of the zookeeper nodes (Eg. zk://hdp1:2181,hdp2:2181,hdp3:2181/mesos)
[directory to install kubernetes-mesos] = Directory used to perform "git clone" (see "Build Kubernetes-Mesos" above).

(2) Amend .bash_profile to make the variables permanent.
(3) Remember to source the .bash_profile file after amendment (. ~/.bash_profile).

***

Configure and start Kubernetes-Mesos service

(1) Create a cloud config file mesos-cloud.conf in the current directory with the following contents:

$ cat <<EOF >mesos-cloud.conf
[mesos-cloud]
        mesos-master        = ${MESOS_MASTER}
EOF

NOTE:
If you have not set ${MESOS_MASTER}, it should be like (example) "zk://hdp1:2181,hdp2:2181,hdp3:2181/mesos".

(2) Create a script to start all the relevant components (API server, controller manager, and scheduler):

km apiserver \
  --address=${KUBERNETES_MASTER_IP} \
  --etcd-servers=http://${KUBERNETES_MASTER_IP}:4001 \
  --service-cluster-ip-range=10.10.10.0/24 \
  --port=8888 \
  --cloud-provider=mesos \
  --cloud-config=mesos-cloud.conf \
  --secure-port=0 \
  --v=1 >apiserver.log 2>&1 &

sleep 3

km controller-manager \
  --master=${KUBERNETES_MASTER_IP}:8888 \
  --cloud-provider=mesos \
  --cloud-config=./mesos-cloud.conf  \
  --v=1 >controller.log 2>&1 &

sleep 3

km scheduler \
  --address=${KUBERNETES_MASTER_IP} \
  --mesos-master=${MESOS_MASTER} \
  --etcd-servers=http://${KUBERNETES_MASTER_IP}:4001 \
  --mesos-user=root \
  --api-servers=${KUBERNETES_MASTER_IP}:8888 \
  --cluster-dns=10.10.10.10 \
  --cluster-domain=cluster.local \
  --contain-pod-resources=false \
  --v=2 >scheduler.log 2>&1 &

NOTE:
Since CentOS uses systemd, you will hit this issue. Hence, you need to add the "--contain-pod-resources=false" to the scheduler (in bold above).

(3) Give execute permission to the script (chmod 700 <script>).
(4) Execute the script.

***

Validate Kubernetes-Mesos services

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE

# NOTE: Your service IPs will likely differ
$ kubectl get services
NAME             LABELS                                    SELECTOR   IP(S)          PORT(S)
k8sm-scheduler   component=scheduler,provider=k8sm         <none>     10.10.10.113   10251/TCP
kubernetes       component=apiserver,provider=kubernetes   <none>     10.10.10.1     443/TCP

(4) Lastly, look for Kubernetes in the Mesos web GUI by pointing your browser to http://[mesos-master-ip:port]. Go to the Frameworks tab, and look for an active framework named "Kubernetes".

Kubernetest framework is registered with Mesos

***

Let's spin up a pod

(1) Write a JSON pod description to a local file:

$ cat <<EOPOD >nginx.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
EOPOD

(2) Send the pod description to Kubernetes using the "kubectl" CLI:

$ kubectl create -f ./nginx.yaml
pods/nginx

Submitted pod through kubectl

(3) Wait a minute or two while Docker downloads the image layers from the internet. We can use the kubectl interface to monitor the status of our pod:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
nginx     1/1       Running   0          14s

(4) Verify that the pod task is running in the Mesos web GUI. Click on the Kubernetes framework. The next screen should show the running Mesos task that started the Kubernetes pod.

Mesos WebGUI shows active Kubernetes task

Mesos WebGUI shows that the Kubernetes task is RUNNING

Click through "Sandbox" link of the task to get to the "executor.log"

An example of "executor.log"

Connected to the node where the container is running

Getting Kubernetes to work on Mesos can be rather challenging at this point of time.

However, it is possible and hopefully, over time, Kubernetes-Mesos integration can work seamlessly.

Have fun!

Saturday, September 12, 2015

Running Apache Spark on Mesos

Running Apache Spark on Mesos is easier than I thought!

I have 3 nodes at home (hdp1, hdp2 and hdp3) which are running Hortonworks Data Platform (HDP). I have HDFS, Zookeeper and Spark installed on the cluster.

To test running Spark on Mesos, I decided to reuse the cluster for simplicity's sake.

All I did was:
(1) install Mesos master on node hdp1.
(2) install Mesos slave on node hdp2 and hdp3.
(3) configure the master and slaves accordingly.

NOTE:
(i) For more information about the installation and configuration of Mesos, you can refer to this blog entry.
(ii) I reuse Zookeeper that was installed with HDP.

Once Mesos is up and running, I decided to carry out a very simple test that is described as follow:
(1) Use "spark-shell" as the Spark client to connect to the Mesos cluster.
(2) Load a sample text file into HDFS.
(3) Use spark-shell to perform a line count on the loaded text file.

First, let's see what is the command that you can use to connect spark-shell to a running Mesos cluster.

TIPS: All you need to do is to use the "--master" option to point to the Mesos cluster at "mesos://<IP-or-hostname of Mesos master>:5050".

Then, let's load a sample text file into HDSF using the "hadoop fs -put" command.

Once the sample text file is loaded, let's create a spark RDD using it.

TIPS: You can use the "sc.textFile" function and points it to "hdfs://<namenode>:8020/<path to file>".

After the RDD is created, let's run a count on it using the "count()" function.

You can see from the screenshot above and below that, Spark (through Mesos) has submitted 2 tasks that are executed on node hdp2 and hdp3 respectively.

You can also see from the screenshot above that the end result is returned as "4" (which means 4 lines in the file - which is correct!).

So, that is how easy it is to run Spark on Mesos.

Hope that you are going to try it out!

Monday, April 6, 2015

Apache Storm: HBase Bolt

Starting with version 0.9.2-incubating, Storm included support for HBase as bolt. You no longer need to seek third party packages to do that, which is a great news!

To refer to the APIs and documentation, you can go here.

I have coded a simple Storm application that takes Kafka as spout and HBase as bolt. In other words, the application will get its data from Kafka and then write to HBase.

One interesting item within Apache Storm is stream grouping (how tuples produced by spouts will be handled by available bolts). To read more about stream grouping, you can refer here.

Lastly, if you want to try it out yourself, you can download the sample codes here.

Storm UI that shows Kafka spout and HBase bolt processing

Wednesday, February 11, 2015

Apache Storm: Integration with Kafka using Kafka Spout

If you have been playing with either Apache Kafka or Apache Storm, you would have read so much articles about integration between the two. From my experience, reading too much can be a bad thing sometimes (pun intended :). In this case, there were multiple efforts that try to offer such integration. Thus, it might caused confusion about which is the best or standard way to do it.

It is good to know that starting from version 0.9.2-incubating, Apache Storm has decided to include such support officially. Read more here.

Anyway, how does such integration work?

In this blog entry, I am only going to share information about using Kafka as a Storm spout. Yes, starting from Storm version 0.9.3, you can use Kafka as a bolt too. If you want to know more about Topology, Spout and Bolt, read this.

Basically, the classes you need for the Storm-Kafka integration are available under storm.kafka.* package.

If you want to get up to speed quick, try out the sandbox offered by Hortonworks here. After you have downloaded the sandbox (or if you are gutsy enough to install the system through Ambari), it is advisable to try out the tutorial too. If you want to jump straight to the tutorial related to the Storm-Kafka integration, you can go here. Please take note that the tutorial contains the source codes too, so make sure you check them out!

Once you get a hang of it, you can move over to this website to learn more about Storm Kafka.

If you do not want to compile the Storm Kafka package yourself, you can download it from the Hortonworks maven repository.

The information offered here should get you going for a while, and I will share some tips and traps regarding the integration in future entries.

Happy hacking!