Sebastien Goasguen

Running Kubernetes on a Raspberry PI

2015-09-16T05:24:00.001-07:00

Running the Docker engine on Raspberry Pi is a breeze thanks to the Docker pirates from Hypriot, just download the image and flash it on your Pi and you are off to the races. I am not going to cover this installation process, it is well documented on the Hypriot website and I also wrote a recipe in the Docker cookbook. Roughly, download the .img file and dd it to your SD card, then boot your PI.
Having Docker on Raspberry Pi offers tons of possibilities for hobbyist and home devices. It also triggered my interest because Kubernetes, one of the Docker orchestrators, can be run standalone on a single node using Docker containers. I wrote a post several months ago about doing it using docker-compose. So I decided to give it a try last week-end, running Kubernetes on a PI using the Hypriot image that has the Docker engine.

Getting `etcd` to run

The first issue is that Kubernetes currently uses etcd, and that you need to run it on ARM. I decided to get the etcd source directly on the PI and updated the Dockerfile to build it directly there. Etcd uses a Golang ONBUILD image and it was causing me grief. So I copied the content of the ONBUILD image and created a new Dockerfile based on hypriot/rpi-golang to build it directly. You can see the Dockerfile. With that you have a Docker container running etcd on ARM.

Getting the Hyperkube to run on ARM

Now, I needed the hyperkube binary to run on ARM. Hyperkube is a single binary that allows you to start all the Kubernetes components. Thankfully there are some binaries already available for ARM. That was handy because I struggled to compile Kubernetes directly on the PI.
With that hyperkube binary on hand, I built an image based on the resin/rpi-raspbian:wheezy image. Quite straightforward:

FROM resin/rpi-raspbian:wheezy

RUN apt-get update
RUN apt-get -yy -q install iptables ca-certificates

COPY hyperkube /hyperkube

The Kubelet systemd unit

The Kubernetes agent running on all nodes in a cluster is called the Kubelet. The Kubelet is in charge of making sure that all containers supposed to be running on the node actually do run. It can also be used with a manifest to start some specific containers at startup. There is a good post from Kelsey Hightower about it. Since The Hypriot image uses systemd I took the systemd unit that creates a Kubelet service directly from Kelsey's post:

[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/bin/kubelet  \
--api-servers=http://127.0.0.1:8080 \
--allow-privileged=true \
--config=/etc/kubernetes/manifests \
--v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

The kubelet binary is downloaded directly from the same location as hyperkube. The manifest is a Kubernetes Pod definition that starts all the containers to get a Kubernetes controller running. It starts etcd, the API server, the scheduler, the controller and the service proxy, all using the hyperkube image built above.

Now the dirty hack

Kubernetes does something interesting. All containers in a Pod actually use the same IP address. This is done by running a fake container that just does nothing. The other containers in the Pod just share the same network namespace as this fake container. This is actually called the pause container. I did not find a way to specify a different image for the pause container in Kubernetes, it seems hard coded to gcr.io/google_containers/pause:0.8.0 which off course is supposed to run on x86_64.
So the dirty trick consisted in taking the pause Goland code from the Kubernetes source, compiling it on the PI using the hypriot/rpi-golang and sticking the binary in a SCRATCH image and tagging it locally to appear as gcr.io/google_containers/pause:0.8.0 and avoid the download of the real image that runs on x86_64. Yeah...right...I told you dirty but that was the quickest way I could think of.

Putting it all together

Now that you have all the images ready directly on the PI, plus a Kubelet service, you can start it. The containers will be created and you will have a single node Kubernetes cluster on the PI. All is left is to use the kubectl CLI to use it. You can download an ARM version of Kubectl form the official Kubernetes release.

HypriotOS: root@black-pearl in ~
$ docker images
REPOSITORY                       TAG         
hyperkube                        latest
gcr.io/google_containers/pause   0.8.0
etcd                             latest
resin/rpi-raspbian               wheezy   
hypriot/rpi-golang               latest 

HypriotOS: root@black-pearl in ~
$ ./kubectl get pods
NAME                          READY     STATUS    RESTARTS   AGE
kube-controller-black-pearl   5/5       Running   5          5m
HypriotOS: root@black-pearl in ~
$ ./kubectl get nodes
NAME          LABELS                               STATUS
black-pearl   kubernetes.io/hostname=black-pearl   Ready

Get it

Everything is on GitHub at https://github.com/skippbox/k8s4pi including a horrible bash script that does the entire build :)

Introducing Kmachine, a Docker machine fork for Kubernetes.

2015-06-15T05:44:00.001-07:00

Docker machine is a great tool to easily start a Docker host on most public Cloud providers out there. Very handy as a replacement to Vagrant if all you want is a Docker host in the Cloud.

It automatically installs the Docker daemon and sets up the TLS authentication so that you can communicate with it using your local Docker client. It also has some early features to start a Swarm cluster (i.e Multiple Docker hosts).

Since I have been playing with Kubernetes lately, and that there is a single node install available, all based on Docker images...

I thought, let's hack Docker machine a bit so that in addition to installing the docker daemon it also pulls a few images and starts a few containers on boot.

The result is kmachine. The usage is exactly the same as Docker machine. It goes something like this on exoscale (did not take time to open 6443 on all providers...PR :)):

$ kmachine create -d exoscale foobar
$ kmachine env foobar
kubectl config set-cluster kmachine --server=https://185.19.29.23:6443 --insecure-skip-tls-verify=true
kubectl config set-credentials kuser --token=abcdefghijkl
kubectl config set-context kmachine --user=kuser --cluster=kmachine
kubectl config use-context kmachine
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://185.19.29.23:2376"
export DOCKER_CERT_PATH="/Users/sebastiengoasguen/.docker/machine/machines/foobar5"
export DOCKER_MACHINE_NAME="foobar5"
# Run this command to configure your shell: 
# eval "$(kmachine_darwin-amd64 env foobar5)"

You see that I used kubectl, the Kubernetes client to automatically setup the endpoint created by machine. The only gotcha right now is that I hard coded the token...Easily fixed by a friendly PR. We could also setup proper certificates and TLS authentication. But I opted for the easy route for now. If you setup your env. You will have access to Kubernetes, and Docker of course, the original docker-machine functionality is not broken.

$ eval "$(kmachine env foobar)"
$ kubectl get pods
POD                    IP        CONTAINER(S)         IMAGE(S)                                     
kubernetes-127.0.0.1             controller-manager   gcr.io/google_containers/hyperkube:v0.17.0 
                                 apiserver            gcr.io/google_containers/hyperkube:v0.17.0                                             
                                 scheduler            gcr.io/google_containers/hyperkube:v0.17.0                                             
$ kubectl get nodes
NAME        LABELS        STATUS
127.0.0.1   Schedulable   <none>    Ready

Since all Kubernetes components are started as containers, you will see all of them running from the start. etcd, the kubelet, the controller, proxy etc.

$ docker ps
CONTAINER ID        IMAGE                                        COMMAND               
7e5d356d31d7        gcr.io/google_containers/hyperkube:v0.17.0   "/hyperkube controll     
9cc05adf2b27        gcr.io/google_containers/hyperkube:v0.17.0   "/hyperkube schedule               
7a0e490a44e1        gcr.io/google_containers/hyperkube:v0.17.0   "/hyperkube apiserve               
6d2d743172c6        gcr.io/google_containers/pause:0.8.0         "/pause"                            
7950a0d14608        gcr.io/google_containers/hyperkube:v0.17.0   "/hyperkube proxy --                                                                                               
55fc22c508a9        gcr.io/google_containers/hyperkube:v0.17.0   "/hyperkube kubelet                                                                                                
c67496a47bf3        kubernetes/etcd:2.0.5.1                      "/usr/local/bin/etcd

Have fun ! I think it is very handy to get started with Kubernetes and still have the Docker machine setup working. You get the benefit of both, easy provisioning of a Docker host in the Cloud and a fully working Kubernetes setup to experience with. If we could couple it with Weave of Flannel, we could setup a full Kubernetes cluster in the Cloud, just like Swarm.

Building an S3 object store with Docker, Cassandra and Kubernetes

2015-06-09T04:36:00.001-07:00

Docker makes building distributed applications relatively painless. At the very least deploying existing distributed systems/framework is made easier since you only need to launch containers. Docker hub is full of MongoDB, Elasticsearch, Cassandra images etc ... Assuming that you like what is inside those images, you can just grab them and run a container and you are done.

With a cluster manager/container orchestration system like Kubernetes, running clustered version of these systems where you need to operate multiple containers and multiple nodes is also made dead simple. Swear to God, it is !

Just check the list of examples and you will find everything that is needed to run a Redis, a Spark, a Storm, an Hazelcast even a Glusterfs cluster. Discovery of all the nodes can be a challenge but with things like Etcd, Consul, registrator, service discovery has never been easier.

What caught my eye in the list of Kubernetes examples is the ability to run an Apache Cassandra cluster. Yes, a Cassandra cluster based on Docker containers. It caught my eye especially that my buddies at exoscale have written an S3 compatible object store that uses Cassandra for storage. It's called Pithos and for those interested is written in Clojure.

So I wondered, let's run Cassandra in Kubernetes, then let's create a Docker image for Pithos and run it in Kubernetes as well. That should give me a S3 compatible object store, built using Docker containers.

To start we need a Kubernetes cluster. The easiest is to use Google container engine. But keep an eye on Kubestack which is a Terraform plan to create one. It could easily be adapted for different cloud providers. If you are new to Kubernetes check my previous post, or get the Docker cookbook in early release I just pushed a chapter on Kubernetes. Whatever technique you use, before proceeding you should be able to use the kubectl client and list the nodes in your cluster. For example:

$ ./kubectl get nodes
NAME                              LABELS                                                   STATUS
k8s-cookbook-935a6530-node-hsdb   kubernetes.io/hostname=k8s-cookbook-935a6530-node-hsdb   Ready
k8s-cookbook-935a6530-node-mukh   kubernetes.io/hostname=k8s-cookbook-935a6530-node-mukh   Ready
k8s-cookbook-935a6530-node-t9p8   kubernetes.io/hostname=k8s-cookbook-935a6530-node-t9p8   Ready
k8s-cookbook-935a6530-node-ugp4   kubernetes.io/hostname=k8s-cookbook-935a6530-node-ugp4   Ready

Running Cassandra in Kubernetes

You can use the Kubernetes example straight up or clone my own repo, you can explore all the pods, replication controllers and service definition there:

$ git clone https://github.com/how2dock/dockbook.git
$ cd ch05/examples

Then launch the Cassandra replication controller, increase the number of replicas and launch the service:

$ kubectl create -f ./cassandra/cassandra-controller.yaml
$ kubectl scale --replicas=4 rc cassandra
$ kubectl create -f ./cassandra/cassandra-service.yaml

Once the image is downloaded you will have your Kubernetes pods in running state. Note that the image currently used comes from the Google registry. That's because this image contains a Discovery class specified in the Cassandra configuration. You could use the Cassandra image from Docker hub but would have to put that Java class in there to allow all cassandra nodes to discover each other. As I said, almost painless !

$ kubectl get pods --selector="name=cassandra"

Once Cassandra discovers all nodes and rebalances the database storage you will get something like:

$ ./kubectl exec cassandra-5f709 -c cassandra nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.16.2.4  84.32 KB   256     46.0%             8a0c8663-074f-4987-b5db-8b5ff10d9774  rack1
UN  10.16.1.3  67.81 KB   256     53.7%             784c8f4d-7722-4d16-9fc4-3fee0569ec29  rack1
UN  10.16.0.3  51.37 KB   256     49.7%             2f551b3e-9314-4f12-affc-673409e0d434  rack1
UN  10.16.3.3  65.67 KB   256     50.6%             a746b8b3-984f-4b1e-91e0-cc0ea917773b  rack1

Note that you can also access the logs of a container in a pod with kubectl logs very handy.

Launching Pithos S3 object store

Pithos is a daemon which "provides an S3 compatible frontend to a cassandra cluster". So if we run Pithos in our Kubernetes cluster and point it to our running Cassandra cluster we can expose an S3 compatible interface.

To that end I created a Docker image for Pithos runseb/pithos on Docker hub. Its an automated build so you can check out the Dockerfile there. The image contains the default configuration file. You will want to change it to edit your access keys and bucket stores definitions. I launch Pithos as a Kubernetes replication controller and expose a service with an external load balancer created on Google compute engine. The Cassandra service that we launched earlier allows Pithos to find Cassandra using DNS resolution. To bootstrap pithos we need to run a non-restarting Pod which installs the Pithos schema in Cassandra. Let's do it:

$ kubectl create -f ./pithos/pithos-bootstrap.yaml

Wait for the bootstrap to happen, i.e for the Pod to get in succeed state. Then launch the replication controller. For now we will launch only one replicas. Using an rc makes it easy to attach a service and expose it via a Public IP address.

$ kubectl create -f ./pithos/pithos-rc.yaml
$ kubectl create -f ./pithos/spithos.yaml
$ ./kubectl get services --selector="name=pithos"
NAME      LABELS        SELECTOR      IP(S)            PORT(S)
pithos    name=pithos   name=pithos   10.19.251.29     8080/TCP
                                      104.197.27.250

Since Pithos will serve on port 8080 by default, make sure that you open the firewall for public IP of the load-balancer.

Use an S3 client

You are now ready to use your S3 object store, offered by Pithos, backed by Cassandra, running on Kubernetes using Docker. Wow...a mouth full !!!

Install s3cmd and create a configuration file like so:

$ cat ~/.s3cfg
[default]
access_key = AKIAIOSFODNN7EXAMPLE
secret_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
check_ssl_certificate = False
enable_multipart = True
encoding = UTF-8
encrypt = False
host_base = s3.example.com
host_bucket = %(bucket)s.s3.example.com
proxy_host = 104.197.27.250 
proxy_port = 8080
server_side_encryption = True
signature_v2 = True
use_https = False
verbosity = WARNING

Note that we use an unencrypted proxy (the load-balancer IP created by the Pithos Kubernetes service, don't forget to change it). The access and secret keys are the default stored in the Dockerfile

With this configuration in place, you are ready to use +s3cmd+:

$ s3cmd mb s3://foobar
Bucket 's3://foobar/' created
$ s3cmd ls
2015-06-09 11:19  s3://foobar

If you wanted to use Boto, this would work as well:

#!/usr/bin/env python

from boto.s3.key import Key
from boto.s3.connection import S3Connection
from boto.s3.connection import OrdinaryCallingFormat

apikey='AKIAIOSFODNN7EXAMPLE'
secretkey='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

cf=OrdinaryCallingFormat()

conn=S3Connection(aws_access_key_id=apikey,
                  aws_secret_access_key=secretkey,
                  is_secure=False,host='104.197.27.250',
                  port=8080,
                  calling_format=cf)

conn.create_bucket('foobar')

And that's it. All of these steps make sound like a lot, but honestly it has never been that easy to run an S3 object store. Docker and Kubernetes truly make running distributed applications a breeze.

Running VMs in Docker Containers via Kubernetes

2015-05-04T09:49:00.001-07:00

Couple weeks ago Google finally published a technical paper describing Borg, their cluster management system that they built over the last ten years or more and that runs all Google services.

There are several interesting concepts in the paper, one of them of course being that they run everything in containers. Whether they use Docker or not is unknown. Some parts of their workloads probably still use LMCTFY, - Let Me Contain That For You-. What struck me is that they say to not be using full virtualization. It makes sense in terms of timeline, considering that Borg started before the advent of hardware virtualization. However, their Google Compute Engine offers VM as a Service, so it is fair to wonder how they are running their VMs. This reminded me of John Wilkes talk at MesosCon 2014. He discussed scheduling in Borg (without mentioning it) and at 23 minutes in his talk, mentions that they run VMs in containers.

Running VM in containers does make sense when you think in terms of a cluster management system that deals with multiple type of workloads. You treat your IaaS (e.g GCE) as a workload, and contain it so that you can pack all your servers and maximize utilization. It also allows you to run some workloads on bare-metal for performance.

Therefore let's assume that GCE is just another workload for Google and that it runs through Borg.

Since Borg laid out the principles for Kubernetes, the cluster management system designed for containerized workloads and open sourced by Google in June 2014. You are left asking:

"How can we run VMs in Kubernetes ?"

This is where Rancher comes to our help to help us prototype a little some-some. Two weeks ago, Rancher announced RancherVM, basically a startup script that creates KVM VMs inside Docker containers (not really doing it justice calling it a script...). It is available on GitHub and super easy to try. I will spare you the details and tell you to go to GitHub instead. The result is that you can build a Docker image that contains a KVM qcow image, and that running the container starts the VM with the proper networking.

Privilege gotcha

With a Docker image now handy to run a KVM instance in it, using Kubernetes to start this container is straightforward. Create a Pod that launches this container. The only caveat is that the Docker host(s) that you use and that form your Kubernetes cluster need to have KVM installed and that your containers will need to have some level of privileges to access the KVM devices. While this can be tweaked with Docker run parameters like --device and --cap-add, you can brute force it in a very unsecure manner with --privilege. However Kubernetes does not accept to run privileged containers by default (rightfully so). Therefore you need to start you Kubernetes cluster (i.e API server and Kubelet with the --allow_privilege=true option).

If you are new to Kubernetes, check out my previous post where I show you how to start a one node Kubernetes "cluster" with Docker compose. The only modification that I did from that post, is that I am running this on a Docker host that also has KVM installed, that the compose manifest specifies --allow_pivileged=true in the kubelet startup command, and that I modify the /etc/kubernetes/manifests/master.json by specifiying a volume. This allows me not to tamper with the images from Google.

Let's try it out

Build your RancherVM images:

$ git clone https://github.com/rancherio/vm.git
$ cd vm
$ make all

You will now have several RancherVM images:

$ sudo docker images
REPOSITORY                           TAG                 ...
rancher/vm-android                   4.4                 ...
rancher/vm-android                   latest              ...
rancher/ranchervm                    0.0.1               ...
rancher/ranchervm                    latest              ...
rancher/vm-centos                    7.1                 ...
rancher/vm-centos                    latest              ...
rancher/vm-ubuntu                    14.04               ...
rancher/vm-ubuntu                    latest              ...
rancher/vm-rancheros                 0.3.0               ...
rancher/vm-rancheros                 latest              ...
rancher/vm-base                      0.0.1               ...
rancher/vm-base                      latest              ...

Starting one of those will give you access to a KVM instance running in the container.

I will skip the startup of the Kubernetes components. Check my previous post. Once you have Kubernetes running you can list the pods (i.e group of containers/volumes). You will see that the Kubernetes master itself is running as a Pod.

$ ./kubectl get pods
POD         IP        CONTAINER(S)         IMAGE(S)                                     ...
nginx-127             controller-manager   gcr.io/google_containers/hyperkube:v0.14.1   ...
                      apiserver            gcr.io/google_containers/hyperkube:v0.14.1                                             
                      scheduler            gcr.io/google_containers/hyperkube:v0.14.1

Now let's define a RancherVM as a Kubernetes Pod. We do this in a YAML file

apiVersion: v1beta2
kind: Pod
id: ranchervm
labels:
  name: vm
desiredState:
  manifest:
    version: v1beta2
    containers:
      - name: master
        image: rancher/vm-rancheros
        privileged: true
        volumeMounts:
          - name: ranchervm
            mountPath: /ranchervm
        env:
         - name: RANCHER_VM
           value: "true"
    volumes:
      - name: ranchervm
        source:
          hostDir: 
            path: /tmp/ranchervm

To create the Pod use the kubectl CLI:

$ ./kubectl create -f vm.yaml 
pods/ranchervm
$ ./kubectl get pods
POD         IP            CONTAINER(S)         IMAGE(S)                                     ....
nginx-127                 controller-manager   gcr.io/google_containers/hyperkube:v0.14.1   ....
                          apiserver            gcr.io/google_containers/hyperkube:v0.14.1                                             
                          scheduler            gcr.io/google_containers/hyperkube:v0.14.1                                             
ranchervm   172.17.0.10   master               rancher/vm-rancheros                         ....

The RancherVM image specified contains RancherOS. The container will start automatically but of course the actual VM will take couple more seconds to start. Once it's up, you can ping it and you can ssh to the VM instance.

$ ping -c 1 172.17.0.10
PING 172.17.0.10 (172.17.0.10) 56(84) bytes of data.
64 bytes from 172.17.0.10: icmp_seq=1 ttl=64 time=0.725 ms

$ ssh rancher@172.17.0.10 
...
[rancher@ranchervm ~]$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[rancher@ranchervm ~]$ sudo system-docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
229a22962a4d        console:latest      "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            console             
cfd06aa73192        userdocker:latest   "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            userdocker          
448e03b18f93        udev:latest         "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            udev                
ff929cddeda9        syslog:latest       "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            syslog

Amazing ! I can feel that you are just wondering what the heck is going on:)

You want to kill the VM ? Just kill the pod:

$ ./kubectl delete pod ranchervm

Remember that a Pod is not a single container but could contain several ones as well as volumes.

Let's go a step further, and scale the number of VMs by using a replication controller.

Using a Replication Controller to scale the VM

Kubernetes is quite nice, it builds on years of experience with fault-tolerance at Google and provides mechanism for keeping your services up, scaling them and rolling new versions. The replication Controller is a primitive to manage the scale of your services.

So say you would like to automatically increase or decrease the number of VMs running in your datacenter. Start them with a replication controller. This is defined in a YAML manifest like so:

id: ranchervm
kind: ReplicationController
apiVersion: v1beta2
desiredState:
  replicas: 1
  replicaSelector:
    name: ranchervm
  podTemplate:
    desiredState:
      manifest:
        version: v1beta2
        id: vm 
        containers:
          - name: vm
            image: rancher/vm-rancheros
            privileged: true
            volumeMounts:
              - name: ranchervm
                mountPath: /ranchervm
            env:
              - name: RANCHER_VM
                value: "true"
        volumes:
          - name: ranchervm
            source:
              hostDir:
                path: /tmp/ranchervm
    labels:
      name: ranchervm

This manifest defines a Pod template (the one that we created earlier), and set a number of replicas. Here we start with one. To launch it, use the kubectl binary again:

$ ./kubectl create -f vmrc.yaml 
replicationControllers/ranchervm
$ ./kubectl get rc
CONTROLLER   CONTAINER(S)   IMAGE(S)               SELECTOR         REPLICAS
ranchervm    vm             rancher/vm-rancheros   name=ranchervm   1

If you list the pods, you will see that your container is running and hence your VM will start shortly.

$ ./kubectl get pods
POD               IP            CONTAINER(S)         IMAGE(S)                                     ...
nginx-127                       controller-manager   gcr.io/google_containers/hyperkube:v0.14.1   ...
                                apiserver            gcr.io/google_containers/hyperkube:v0.14.1                                                    
                                scheduler            gcr.io/google_containers/hyperkube:v0.14.1                                                    
ranchervm-16ncs   172.17.0.11   vm                   rancher/vm-rancheros                         ...

Why is this awesome ? Because you can scale easily:

$ ./kubectl resize --replicas=2 rc ranchervm
resized

And Boom, two VMs:

$ ./kubectl get pods -l name=ranchervm
POD               IP            CONTAINER(S)   IMAGE(S)               ...
ranchervm-16ncs   172.17.0.11   vm             rancher/vm-rancheros   ...
ranchervm-279fu   172.17.0.12   vm             rancher/vm-rancheros   ...

Now of course, this little test is done on one node. But if you had a real Kubernetes cluster, it would schedule these pods on available nodes. From a networking standpoint, RancherVM can provide DHCP service or not. That means that you could let Kubernetes assign the IP to the Pod and the VMs would be networked over the overlay in place.

Now imagine that we had security groups via an OVS switch on all nodes in the cluster...we could have multi-tenancy with network isolation and full VM isolation. While being able to run workloads in "traditional" containers. This has some significant impact on the current IaaS space, and even Mesos itself.

Your Cloud as a containerized distributed workload, anyone ???

For more recipes like these, checkout the Docker cookbook.

1 command to Kubernetes with Docker compose

2015-04-09T12:59:00.001-07:00

After 1 command to Mesos, here is 1 command to Kubernetes.

I had not looked at Kubernetes in over a month. It is a fast paced project so it is hard to keep up. If you have not looked at Kubernetes, it is roughly a cluster manager for containers. It takes a set of Docker hosts under management and schedules groups of containers in them. Kubernetes was open sourced by Google around June last year to bring all the Google knowledge of working with containers to us, a.k.a The people :) There are a lot of container schedulers or orchestrators if you wish out there, Citadel, Docker Swarm, Mesos with the Marathon framework, Cloud Foundry lattice etc. The Docker ecosystem is booming and our heads are spinning.

What I find very interesting with Kubernetes is the concept of replication controllers. Not only can you schedule groups of colocated containers together in a cluster, but you can also define replica sets. Say you have a container you want to scale up or down, you can define a replica controller and use it to resize the number of containers running. It is great for scaling when the load dictates it, but it is also great when you want to replace a container with a new image. Kubernetes also exposes a concept of services basically a way to expose a container application to all the hosts in your cluster as if it were running locally. Think the ambassador pattern of the early Docker days but on steroid.

All that said, you want to try Kubernetes. I know you do. So here is 1 command to try it out. We are going to use docker-compose like we did with Mesos and thanks to this how-to which seems to have landed 3 days ago, we are going to run Kubernetes on a single host with containers. That means that all the Kubernetes components (the "agent", the "master" and various controllers) will run in containers.

Install compose on your Docker host, if you do not have it yet:

curl -L https://github.com/docker/compose/releases/download/1.2.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

Then create this YAML file, call it say k8s.yml:

etcd:
  image: kubernetes/etcd:2.0.5.1
  net: "host"
  command: /usr/local/bin/etcd --addr=127.0.0.1:4001 --bind-addr=0.0.0.0:4001 --data-dir=/var/etcd/data
master:
  image: gcr.io/google_containers/hyperkube:v0.17.0
  net: "host"
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
  command: /hyperkube kubelet --api_servers=http://localhost:8080 --v=2 --address=0.0.0.0 --enable_server --hostname_override=127.0.0.1 --config=/etc/kubernetes/manifests
proxy:
  image: gcr.io/google_containers/hyperkube:v0.17.0
  net: "host"
  privileged: true
  command: /hyperkube proxy --master=http://127.0.0.1:8080 --v=2

And now, 1 command:

$ docker-compose -f k8s.yml up -d

Quickly there after, you will see a bunch of containers pop-up:

$ docker ps
CONTAINER ID        IMAGE                                       
a17cac87965b        kubernetes/pause:go  
659917e61d3e        gcr.io/google_containers/hyperkube:v0.17.0
caf22057dbad        gcr.io/google_containers/hyperkube:v0.17.0
288fcb4408c7        gcr.io/google_containers/hyperkube:v0.17.0
820cc546b352        kubernetes/pause:go  
0bfac38bdd10        kubernetes/etcd:2.0.5.1                               
81f58059ca8d        gcr.io/google_containers/hyperkube:v0.17.0                     
ca1590c1d5c4        gcr.io/google_containers/hyperkube:v0.17.0

In the YAML file above, you see in the commands that it used a single binary hyperkube that allows you to start all the kubernetes components, the API server, the replication controller etc ... One of the components it started is the kubelet which is normally used to monitor containers on one of the host in your cluster and make sure they stay up. Here by passing the /etc/kubernetes/manifests it helped us start the other components of kubernetes defined in that manifest. Clever ! Note also that the containers where started with a host networking. So these containers have the network stack of the host, you will not see an interface on the docker bridge.

With all those up, grab the kubectl binary, that is your kubernetes client that you will use to interact with the system. The first thing you can do is list the nodes:

$ ./kubectl get nodes
NAME        LABELS    STATUS
127.0.0.1   <none>    Ready

Now start your first container:

./kubectl run-container nginx --image=nginx --port=80

That's a simple example, where you can actually start a single container. You will want to group your containers that need to be colocated and write a POD description in YAML or json than pass that to kubectl. But it looks like they extended kubectl to take single container start up. That's handy for testing.

Now list your pods:

$ ./kubectl get pods
POD           IP           CONTAINER(S)         IMAGE(S)                                    
k8s-master-127.0.0.1       controller-manager   gcr.io/google_containers/hyperkube:v0.14.1
                           apiserver            gcr.io/google_containers/hyperkube:v0.14.1 
                           scheduler            gcr.io/google_containers/hyperkube:v0.14.1                                                         
nginx-p2sq7   172.17.0.4   nginx                nginx

You see that there is actually two pods running. The nginx one that you just started and one pod made of three containers. That's the pod that was started by your kubelet to get Kubernetes up. Kubernetes managed by Kubernetes...

It automatically created a replication controller (rc):

$ ./kubectl get rc
CONTROLLER   CONTAINER(S)   IMAGE(S)   SELECTOR              REPLICAS
nginx        nginx          nginx      run-container=nginx   1

You can have some fun with the resize capability right away and see a new container pop-up.

$ ./kubectl resize --replicas=2 rc nginx
resized

Now that is fine and dandy but there is no port exposed on the host, so you cannot access your application on the outside. That's where you want to define a service. Technically it is used to expose a service to all nodes in a cluster but of course you can bind that service proxy to a publicly routed interface:

$ ./kubectl expose rc nginx --port=80 --public-ip=192.168.33.10

Now take your browser and open it at http://192.168.33.10 (if that's the IP of your host of course) and enjoy a replicated nginx managed by Kubernetes deployed in 1 command.

You will get more of that good stuff in my book, if I manage to finish it. Wish me luck.

Running the CloudStack Simulator in Docker

2015-04-09T07:44:00.001-07:00

CloudStack comes with a simulator. It is very handy for testing purposes, we use it to run our smoke tests on TravisCI for each commit to the code base. However if you want to run the simulator, you need to compile from source using some special maven profiles. That requires you to check out the code and setup your working environment with the dependencies for a successfull CloudStack build.

With Docker you can skip all of that and simply download the cloudstack/simulator image from the Docker Hub. Start a container from that image and expose port 8080 where the dashboard is being served. Once the container is running, you can use docker exec to configure a simulated data center. This will allow you to start fake virtual machines, create security groups and so on. You can do all of this through the dashboard or using the CloudStack API.

So you want to give CloudStack a try ? Use Docker :)

$ docker pull cloudstack/simulator

The image is a bit big and we need to work on slimming it down but once the image is pulled, starting the container will be almost instant. If you feel like sending a little PR just the Dockerfile, there might be a few obvious things to slim down the image.

$ docker run -d -p 8080:8080 --name cloudstak cloudstack/simulator

The application needs a few minutes to start however, something that I have not had time to check. Probably we need to give more memory to the container. Once you can access the dashboard at http://localhost:8080/client you can configure the simulated data-center. You can choose between a basic network which gives you L3 network isolation or advanced zone which gives you a VLAN base isolation:

$ docker exec -ti cloudstack python /root/tools/marvin/marvin/deployDataCenter.py -i /root/setup/dev/basic.cfg

Once the configuration completes, head over to the dashboard http://localhost:8080/client and check your simulated infrastructure

Enjoy the CloudStack simulator brought to you by Docker.

1 Command to Mesos with Docker Compose

2015-03-18T03:01:00.001-07:00

If you have not tried Docker, you should. The sheer power it puts in your hands and the simplicity of the user experience will just wow you. In this post, I will show you how to start a one node Mesos setup with Docker compose.

Docker announced compose on February 26th. Compose allows you to describe a multi-container setup and manage it with one binary docker-compose. The containers and volumes combinations managed by Compose are defined in a YAML file, super easy to read and super easy to write. The UX is very similar to the Docker CLI.

When compose was released, I tried it and was a bit underwhelmed, as it is basically a relooking of Fig. This is not unexpected as Docker Inc, acquired Orchard the makers of Fig. But I was expecting more added functionality and even a tighter integration with the Docker client (something a dev branch actually prototyped), even a common release instead of a separate binary. I am sure this will come.

As I am writing the docker cookbook, I have deployed Wordpress 20 different ways, and it's getting a bit boring ! I was looking for more information on Mesos and its support for Docker, I re-read a terrific blog post that showed how to start a Mesos setup (zookeeper, master, slave, marathon framework) in 7 commands. Can't beat that.

When I re-read this post, I automatically thought this was an exciting use case for docker-compose. One YAML file to start Mesos/Zookeeper/Marathon and experiment with it. Of course I am not talking about a production multi-node setup. I am just looking at it for an easy Mesos experiment.
I will spare you the details of installing compose (just a curl away). The dockers docs are great.

So here is the YAML file describing our Mesos setup:

zookeeper:
  image: garland/zookeeper
  ports:
   - "2181:2181"
   - "2888:2888"
   - "3888:3888"
mesosmaster:
  image: garland/mesosphere-docker-mesos-master
  ports:
   - "5050:5050"
  links:
   - zookeeper:zk
  environment:
   - MESOS_ZK=zk://zk:2181/mesos
   - MESOS_LOG_DIR=/var/log/mesos
   - MESOS_QUORUM=1
   - MESOS_REGISTRY=in_memory
   - MESOS_WORK_DIR=/var/lib/mesos
marathon:
  image: garland/mesosphere-docker-marathon
  links:
   - zookeeper:zk
   - mesosmaster:master
  command: --master zk://zk:2181/mesos --zk zk://zk:2181/marathon
  ports:
   - "8080:8080"
mesosslave:
  image: garland/mesosphere-docker-mesos-master:latest
  ports:
   - "5051:5051"
  links:
   - zookeeper:zk
   - mesosmaster:master
  entrypoint: mesos-slave
  environment:
   - MESOS_HOSTNAME=192.168.33.10
   - MESOS_MASTER=zk://zk:2181/mesos
   - MESOS_LOG_DIR=/var/log/mesos
   - MESOS_LOGGING_LEVEL=INFO

Four containers, images pulled from Docker hub, some ports exposed on the host. Some container linking and some environment variables used to configure the Mesos slave and master. One small hickup in the Slave defintion. You will see that I set the MESOS_HOSTNAME to the IP of the host. This allows me to browse the stdout and stderr of a Marathon task, otherwise we cannot reach it easily (small improvement to be done there.)

Launch this with docker-compose:

$ ./docker-compose up -d
Recreating vagrant_zookeeper_1...
Recreating vagrant_mesosmaster_1...
Recreating vagrant_marathon_1...
Recreating vagrant_mesosslave_1...

And open your browser at http://IP_HOST:5050 then follow the rest of the blog to start a task in marathon.

Bottom line, I went from '7 commands to Mesos' to '1 command to Mesos' thanks to Docker-compose and a fairly simple YAML file. Got to love it. When compose can do this across Docker hosts in a Docker Swarm started by Machine. Then the real fun will begin !

Rancher on RancherOS

2015-03-03T07:38:00.001-08:00

Someone at Rancher must have some cattle in the middle of Arizona or in the backcountry of California. Or one of their VCs might be in Montana sitting in a big ranch while Docker is eating the IT world. In any case, this post is short and sweet like veggies and not like cattle (TFW) and is about Rancher and the newly announced RancherOS. Check out the rancheros announcement.

Let's keep this short, shall we ? Docker is great, but it is a daemon running on a single host. Since you want to scale :) and operate multiple servers, you need something to manage your Docker containers across multiple hosts. Several solutions are emerging, of course Docker Swarm but also Kubernetes, Lattice from Cloudfoundry and even Apache Mesos. Rancher is one of these cluster management solutions for Docker. It does some nice things like cross-hosts container linking through a custom built network overlay (think Flannel, Weave, Socketplane).

You can use Rancher with any set of Docker hosts. However, a new type of operating systems have started to appear. Container optimized OS. Or Just Enough Operating System for Docker. CoreOS, ProjectAtomic from RedHat, Ubuntu Snappy fit in that space. They aim to provide rolling atomic upgrades to the OS and run everything in it as a container. No more package manager, magic happens and you are always up to date. Package all your apps in containers, and use Rancher to run them in your cluster. End of story. Wait, enters rancherOS.

RancherOS

A couple lines of bash make all the talking:

$ git clone https://github.com/rancherio/os-vagrant.git
$ cd os-vagrant
$ vagrant up
$ vagrant ssh
[rancher@rancher ~]$ docker version
Client version: 1.5.0
…

rancherOS is a super minimalistic OS exclusively for Docker. It goes further and also runs system services as container themselves. And I will let @ibuildthecloud talk about systemd and Docker as PID 1.

[rancher@rancher ~]$ sudo system-docker ps
CONTAINER ID        IMAGE               COMMAND                ...      NAMES
32607470eb78        console:latest      "/usr/sbin/console.s   ...      console             
d0420165c1c0        userdocker:latest   "/docker.sh"           ...      userdocker          
375a8de12183        syslog:latest       "/syslog.sh"           ...      syslog              
d284afd7f628        ntp:latest          "/ntp.sh"              ...      ntp

The next logical question is of course....drum roll... Can I run rancher on rancheros. RinR not R&R ? And the answer is a resounding yes. I expect Rancher to come out in the next weeks maybe months with a solid product based on the two.

Rancher

If you are interested to try out RinR then check out the Ansible playbook I just made. You can use use it to deploy a cluster of rancherOS instances in AWS, and use one of them as a master and the others as workers. The master runs in a container:

$ docker run -d -p 8080:8080 rancher/server

And the workers can register with their agent:

$ sudo docker run --rm -it --privileged -v /var/run/docker.sock:/var/run/docker.sock rancher/agent http://<master_ip>:8080

Once all the workers have registered you can use the UI or the API to start containers.

As you can see I tested this at web scale with two nodes :)

Notes

In this very early super bleeding-edge testing phase (as you can tell in my good spirit today), I did find a few things that were a bit strange. Considering rancherOS was announced just last week, I am sure things will get fixed. Cloud-init support is minimal, not able to add second network interface, support for both keypair and userdata at the same time seems off. The UI was a bit slow to start and building the overlay was also a bit slow. It is also possible that I did something wrong.

Overall though, rancher is quite nice. It builds on years of experience in the team with developing CloudStack and operating clouds at scale and applies it to the Docker world. It does seem that they want to integrate with and provide the native Docker API, this would mean that users will be able to use Docker machine to add hosts to a rancher cluster, or even Docker swarm and that launching a container would also be a docker command away. How that differentiates from Swarm itself is not yet clear, but I would bet we will see additional networking and integration services in Rancher. Blurring the lines with Kubernetes ? Time will tell.

O'Reilly Docker cookbook

2015-01-29T05:41:00.001-08:00

The last two months have been busy as I am writing the O'Reilly Docker cookbook at night and on week-ends. CloudStack during the day, Docker at night :) You can read the very "drafty" preface on Safari and you will get a sense of why I started writing the book.

Docker is amazing, it brings a terrific user experience to packaging application and deploying them easily. It is also a software that is moving very fast with over 5,500 pull requests closed so far. The community is huge and folks are very excited about it, just check those 18,000+ stars on Github.

Writing a book on Docker means reading all the documentation, reading countless blogs that are flying through twitter and then because its a cookbook, you need to get your hands dirty and actually try everything, test everything, over and over again. A cookbook is made of recipes in a very set format: Problem, Solution, Discussion. It is meant to be picked up at anytime, opened at any page and read a recipe that is independent of all the others. The book is now on pre-release, it means that you can buy it and get the very drafty version of the book as I write it, mistakes, typos and bad grammar included. As I keep writing you get the updates and once I am done you of course get the final proof-read, corrected and reviewed version.

As I started writing, I thought I would share some of the snippets of code I am writing to do the recipes. The code is available on GitHub at the how2dock account. How2dock should become a small company for Docker training and consulting as soon as I find spare time :).

What you will find there is not really code, but really a repository of scripts and Vagrantfiles that I use in the book to showcase a particular feature or command of Docker. The repository is organized the same way than the book. You can pick a chapter and then a particular recipe then go through the README.

For instance if you are curious about Docker swarm:

$ git clone https://github.com/how2dock/docbook.git
$ cd ch07/swarm
$ vagrant up

This will bring up four virtual machines via Vagrant and do the necessary boostrapping to get the cluster setup with Swarm.

If you want to run a wordpress blog with a mysql database, checkout the fig recipe:

$ cd ch07/fig
$ vagrant up
$ vagrant ssh
$ cd /vagrant
$ fig up -d

And enjoy Wordpress :)

I put a lot more in there. You will find an example of using the Ansible Docker module, a libcloud script to start an Ubuntu Snappy instance on EC2, a Dockerfile to help you create TLS certificates (really a convenience container for testing TLS in Docker). A Docker machine setup and a recipe on using Supervisor.

As I keep writing, I will keep putting all the snippets in this How2dock repo. Except frequent changes, typos, errors...and corrections :)

And FWIW, it is much scarier to put a book out in pre-release unedited than to put some scripts up on GitHub.

Suggestions, comments, reviews all welcome ! Happy Docking !

CloudStack simulator on Docker

2014-10-02T03:07:00.000-07:00

Docker is a lot of fun, one of its strength is in the portability of applications. This gave me the idea to package the CloudStack management server as a docker image.

CloudStack has a simulator that can fake a data center infrastructure. It can be used to test some of the basic functionalities. We use it to run our integration tests, like the smoke tests on TravisCI. The simulator allows us to configure an advanced or basic networking zone with fake hypervisors.

So I bootstrapped the CloudStack management server, configured the Mysql database with an advanced zone and created a docker image with Packer. The resulting image is on DockerHub, and I realized after the fact that four other great minds already did something similar :)

On a machine running docker:

docker pull runseb/cloudstack
docker run -t -i -p 8080:8080 runseb/cloudstack:0.1.4 /bin/bash
# service mysql restart
# cd /opt/cloudstack
# mvn -pl client jetty:run -Dsimulator

Then open your browser on http://<IP_of_docker_host>:8080/client and enjoy !

On Docker and Kubernetes on CloudStack

2014-09-30T02:18:00.001-07:00

On Docker and Kubernetes on CloudStack

Docker has pushed containers to a new level, making it extremely easy to package and deploy applications within containers. Containers are not new, with Solaris containers and OpenVZ among several containers technologies going back 2005. But Docker has caught on quickly as mentioned by @adrianco. The startup speed is not surprising for containers, the portability is reminiscent of the Java goal to "write once run anywhere". What is truly interesting with Docker is that availability of Docker registries (e.g Docker Hub) to share containers and the potential to change the application deployment workflows.

Rightly so, we should soon see IT move to a docker based application deployment, where developers package their applications and make them available to Ops. Very much like we have been using war files. Embracing a DevOps mindset/culture should be easier with Docker. Where it becomes truly interesting is when we start thinking about an infrastructure whose sole purpose is to run containers. We can envision a bare operating system with a single goal to manage docker based services. This should make sys admin life easier.

The role of the Cloud with Docker

While the buzz around Docker has been truly amazing and a community has grown over night, some may think that this signals the end of the cloud. I think it is far from the truth as Docker may indeed become the killer app of the cloud.

A IaaS layer is what is: an infrastructure orchestration layer, while Docker and its ecosystem will become the application orchestration layer.

The question then becomes: How do I run Docker in the cloud ? And there is a straightforward answer: Just install Docker in your cloud templates. Whether on AWS or GCE or Azure or your private cloud, you can prepare linux based templates that provide Docker support. If you are aiming for the bare operating system whose sole purpose is to run Docker then the new CoreOS linux distribution might be your best pick. CoreOS provides rolling upgrades of the kernel, systemd based services, a distributed key value store (i.e etcd) and a distributed service scheduling system (i.e fleet)

exoscale an Apache CloudStack based public clouds, recently announced the availability of CoreOS templates.

Like exoscale, any cloud provider be it public or private can make CoreOS templates available. Providing Docker within the cloud instantly.

Docker application orchestration, here comes Kubernetes

Running one container is easy, but running multiple coordinated containers across a cluster of machines is not yet a solved problem. If you think of an application as a set of containers, starting these on multiple hosts, replicating some of them, accessing distributed databases, providing proxy services and fault tolerance is the true challenge.

However, Google came to the resuce and announced Kubernetes a system that solves these issues and makes managing scalable, fault-tolerant container based apps doable :)

Kubernetes is of course available on Google public cloud GCE, but also in Rackspace, Digital Ocean and Azure. It can also be deployed on CoreOS easily.

Kubernetes on CloudStack

Kubernetes is under heavy development, you can test it with Vagrant. Under the hood, aside from the go code :), most of the deployment solutions use SaltStack recipes but if you are a Chef, Puppet or Ansible user I am sure we will see recipes for those configuration management solution soon.

But you surely got the same idea that I had :) Since Kubernetes can be deployed on CoreOS and that CoreOS is available on exoscale. We are just a breath away from running Kubernetes on CloudStack.

It took a little more than a breath, but you can clone kubernetes-exoscale and you will get running in 10 minutes. With a 3 node etcd cluster and a 5 node kubernetes cluster, running a Flannel overlay.

CloudStack supports EC2 like userdata, and the CoreOS templates on exoscale support cloud-init, hence passing some cloud-config files to the instance deployment was straightforward. I used libcloud to provision all the nodes, created proper security groups to let the Kubernetes nodes access the etcd cluster and let the Kubernetes nodes talk to each other, especially to open a UDP port to build a networking overlay with Flannel. Fleet is used to launch all the Kubernetes services. Try it out.

Conclusions.

Docker is extremely easy to use and taking advantage of a cloud you can get started quickly. CoreOS will put your docker work on steroid with availability to start apps as systemd services over a distributed cluster. Kubernetes will up that by giving you replication of your containers and proxy services for free (time).

We might see pure docker based public clouds (e.g think Mesos cluster with a Kubernetes framework). These will look much more like PaaS, especially if they integrate a Docker registry and a way to automatically build docker images (e.g think continuous deployment pipeline).

But a "true" IaaS is actually very complimentary, providing multi-tenancy, higher security as well as multiple OS templates. So treating docker as a standard cloud workload is not a bad idea. With three dominant public clouds in AWS, GCE and Azure and a multitude of "regional" ones like exoscale it appears that building a virtualization based cloud is a solved problem (at least with Apache CloudStack :)).

So instead of talking about cloudifying your application, maybe you should start thinking about Dockerizing your applications and letting them loose on CloudStack.

GCE Interface to CloudStack

2014-07-11T15:42:00.001-07:00

Gstack, a GCE compatible interface to CloudStack

Google Compute Engine (GCE) is the Google public cloud. In december 2013, Google announced the General Availability (GA) of GCE. With AWS and Microsoft Azure, it is one of the three leading public clouds in the market. Apache CloudStack now has a brand new GCE compatible interface (Gstack) that lets users use the GCE clients (i.e gcloud and gcutil) to access their CloudStack cloud. This has been made possible through the Google Summer of Code program.

Last summer Ian Duffy, a student from Dublin City University participated in GSoC through the Apache Software Foundation (ASF) and worked on a LDAP plugin to CloudStack. He did such a great job that he finished early and was made an Apache CloudStack committer. Since he was done with his original GSoC project I encouraged him to take on a new one :), he brought in a friend for the ride: Darren Brogan. Both of them worked for fun on the GCE interface to CloudStack and learned Python doing so.

They remained engaged with the CloudStack community and has a third year project worked on an Amazon EC2 interface to CloudStack using what they learned from the GCE interface. They got an A :). Since they loved it so much, Darren applied to the GSoC program and proposed to go back to Gstack, improve it, extend the unittests and make it compatible with the GCE v1 API.

Technically, Gstack is a Python Flask application that provides a REST API compatible with the GCE API and forwards the requests to the corresponding CloudStack API. The source is available on GitHub and the binary is downloadable via PyPi. Let's show you how to use it.

Installation and Configuration of Gstack

You can grab the Gstack binary package from Pypi using pip in one single command.

pip install gstack

Or if you plan to explore the source and work on it, you can Clone the repository and install it by hand. Pull requests are of course welcome.

git clone https://github.com/NOPping/gstack.git
sudo python ./setup.py install

Both of these installation methods will install a gstack and a gstack-configure binary in your path. Before running Gstack you must configure it. To do so run:

gstack-configure

And enter your configuration information when prompted. You will need to specify the host and port where you want gstack to run on, as well as the CloudStack endpoint that you want gstack to forward the requests to. In the example below we use the exoscale cloud:

$ gstack-configure
gstack bind address [0.0.0.0]: localhost
gstack bind port [5000]: 
Cloudstack host [localhost]: api.exoscale.ch
Cloudstack port [8080]: 443
Cloudstack protocol [http]: https
Cloudstack path [/client/api]: /compute

The information will be stored in a configuration file available at ~/.gstack/gstack.conf:

$ cat ~/.gstack/gstack.conf 
PATH = 'compute/v1/projects/'
GSTACK_BIND_ADDRESS = 'localhost'
GSTACK_PORT = '5000'
CLOUDSTACK_HOST = 'api.exoscale.ch'
CLOUDSTACK_PORT = '443'
CLOUDSTACK_PROTOCOL = 'https'
 CLOUDSTACK_PATH = '/compute'

You are now ready to start Gstack in the foreground with:

gstack

That's all there is to running Gstack. To be able to use it as if you were talking to GCE however, you need to use gcutil and configure it a bit.

Using `gcutil` with Gstack

The current version of Gstack only works with the stand-alone version of gcutil. Do not use the version of gcutil bundled in the Google Cloud SDK. Instead install the 0.14.2 version of gcutil. Gstack comes with a self-signed certificate for the local endpoint gstack/data/server.crt, copy the certificate to the gcutil certificates file gcutil/lib/httplib2/httplib2/cacerts.txt. A bit dirty I know, but that's a work in progress.

At this stage your CloudStack apikey and secretkey need to be entered in the gcutil auth_helper.py file at gcutil/lib/google_compute_engine/gcutil/auth_helper.py.

Again not ideal but hopefully gcutil or the Cloud SDK will soon be able to configure those without touching the source. Darren and Ian opened a feature request with google to pass the client_id and client_secret as options to gcutil, hopefully future release of gcutil will allow us to do so.

Now, create a cached parameters file for gcutil. Assuming you are running gstack on your local machine, using the defaults that were suggested during the configuration phase. Modify ~/.gcutil_params with the following:

--auth_local_webserver
--auth_host_port=9999
--dump_request_response
--authorization_uri_base=https://localhost:5000/oauth2
--ssh_user=root
--fetch_discovery
--auth_host_name=localhost
--api_host=https://localhost:5000/

Warning: Make sure to set the --auth_host_name variable to the same value as GSTACK_BIND_ADDRESS in your ~/.gstack/gstack.conf file. Otherwise you will see certificates errors.

With this setup complete, gcutil will issues requests to the local Flask application, get an OAuth token, issue requests to your CloudStack endpoint and return the response in a GCE compatible format.

Example with exoscale.

You can now start issuing standard gcutil commands. For illustration purposes we use Exoscale. Since there are several semantic differences, you will notice that as a project we use the account information from CloudStack. Hence we pass our email address as the project value.

Let's start by listing the availability zones:

$ gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com listzones
+----------+--------+------------------+
| name     | status | next-maintenance |
+----------+--------+------------------+
| ch-gva-2 | UP     | None scheduled   |
+----------+--------+------------------+

Let's list the machine types or in CloudStack terminology: the compute service offerings and to list the available images.

$ ./gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com listimages
$ gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com listmachinetypes
+-------------+----------+------+-----------+-------------+
| name        | zone     | cpus | memory-mb | deprecation |
+-------------+----------+------+-----------+-------------+
| Micro       | ch-gva-2 |    1 |       512 |             |
+-------------+----------+------+-----------+-------------+
| Tiny        | ch-gva-2 |    1 |      1024 |             |
+-------------+----------+------+-----------+-------------+
| Small       | ch-gva-2 |    2 |      2048 |             |
+-------------+----------+------+-----------+-------------+
| Medium      | ch-gva-2 |    2 |      4096 |             |
+-------------+----------+------+-----------+-------------+
| Large       | ch-gva-2 |    4 |      8182 |             |
+-------------+----------+------+-----------+-------------+
| Extra-large | ch-gva-2 |    4 |     16384 |             |
+-------------+----------+------+-----------+-------------+
| Huge        | ch-gva-2 |    8 |     32184 |             |
+-------------+----------+------+-----------+-------------+

You can also list firewalls which gstack maps to CloudStack security groups. To create a securitygroup, use the firewall commands:

$ ./gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com addfirewall ssh --allowed=tcp:22
$ ./gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com getfirewall ssh

To start an instance you can follow the interactive prompt given by gcutil. You will need to pass the --permit_root_ssh flag, another one of those semantic and access configuration that needs to be ironed out. The interactive prompt will let you choose the machine type and the image that you want, it will then start the instance

$ ./gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com addinstance foobar
Selecting the only available zone: CH-GV2
1: Extra-large  Extra-large 16384mb 4cpu
2: Huge Huge 32184mb 8cpu
3: Large    Large 8192mb 4cpu
4: Medium   Medium 4096mb 2cpu
5: Micro    Micro 512mb 1cpu
6: Small    Small 2048mb 2cpu
7: Tiny Tiny 1024mb 1cpu
7
1: CentOS 5.5(64-bit) no GUI (KVM)
2: Linux CentOS 6.4 64-bit
3: Linux CentOS 6.4 64-bit
4: Linux CentOS 6.4 64-bit
5: Linux CentOS 6.4 64-bit
6: Linux CentOS 6.4 64-bit
<...snip>
INFO: Waiting for insert of instance . Sleeping for 3s.
INFO: Waiting for insert of instance . Sleeping for 3s.

Table of resources:

+--------+--------------+--------------+----------+---------+
| name   | network-ip   | external-ip  | zone     | status  |
+--------+--------------+--------------+----------+---------+
| foobar | 185.1.2.3    | 185.1.2.3    | ch-gva-2 | RUNNING |
+--------+--------------+--------------+----------+---------+

Table of operations:

+--------------+--------+--------------------------+----------------+
| name         | status | insert-time              | operation-type |
+--------------+--------+--------------------------+----------------+
| e4180d83-31d0| DONE   | 2014-06-09T10:31:35+0200 | insert         |
+--------------+--------+--------------------------+----------------+

You can of course list (with listinstances) and delete instances

$ ./gcutil --cached_flags_file=~/.gcutil_params --project=runseb@gmail.com deleteinstance foobar
Delete instance foobar? [y/n]
y 
WARNING: Consider passing '--zone=CH-GV2' to avoid the unnecessary zone lookup which requires extra API calls.
INFO: Waiting for delete of instance . Sleeping for 3s.
+--------------+--------+--------------------------+----------------+
| name         | status | insert-time              | operation-type |
+--------------+--------+--------------------------+----------------+
| d421168c-4acd| DONE   | 2014-06-09T10:34:53+0200 | delete         |
+--------------+--------+--------------------------+----------------+

Gstack is still a work in progress, it is now compatible with GCE GA v1.0 API. The few differences in API semantics need to be investigated further and additional API calls need to be supported. However it provides a solid base to start working on hybrid solutions between GCE public cloud and a CloudStack based private cloud.

GSoC has been terrific to Ian and Darren, they both learned how to work with an open source community and ultimately became part of it through their work. They learned tools like JIRA, git, Review Board and became less shy with working publicly on a mailing lists. Their work on Gstack and EC2stack is certainly of high value to CloudStack and should become the base for interesting products that will use hybrid clouds.

Eutester with CloudStack

2014-06-03T05:31:00.002-07:00

Eutester

An interesting tool based on Boto is Eutester it was created by the folks at Eucalyptus to provide a framework to create functional tests for AWS zones and Eucalyptus based clouds. What is interesting with eutester is that it could be used to compare the AWS compatibility of multiple clouds. Therefore the interesting question that you are going to ask is: Can we use Eutester with CloudStack ? And the answer is Yes. Certainly it could use more work but the basic functionality is there.

Install eutester with:

pip install eutester

Then start Python/iPython interactive shell or write a script that will import ec2ops and create a connection object to your AWS EC2 compatible endpoint. For example, using ec2stack:

    #!/usr/bin/env python

    from eucaops import ec2ops
    from IPython.terminal.embed import InteractiveShellEmbed

    accesskey="my api key"
    secretkey="my secret key"

    conn.ec2ops.EC2ops(endpoint="localhost",
                   aws_access_key_id=apikey,
                   aws_secret_access_key=secretkey,
                   is_secure=False,
                   port=5000,
                   path="/",
                   APIVersion="2014-02-01")

    ipshell = InteractiveShellEmbed(banner1="Hello Cloud Shell !!")
    ipshell()

Eutester at the time of this writing has 145 methods. Only the methods available through the CloudStack AWS EC2 interface will be availble. For example, get_zones and get_instances would return:

In [3]: conn.get_zones()
Out[3]: [u'ch-gva-2']

In [4]: conn.get_instances()
[2014-05-21 05:39:45,094] [EUTESTER] [DEBUG]: 
--->(ec2ops.py:3164)Starting method: get_instances(self, state=None, 
     idstring=None, reservation=None, rootdevtype=None, zone=None,
     key=None, pubip=None, privip=None, ramdisk=None, kernel=None,
     image_id=None, filters=None)
Out[4]: 
[Instance:5a426582-3aa3-49e0-be3f-d2f9f1591f1f,
 Instance:95ee8534-b171-4f79-9e23-be48bf1a5af6,
 Instance:f18275f1-222b-455d-b352-3e7b2d3ffe9d,
 Instance:0ea66049-9399-4763-8d2f-b96e9228e413,
 Instance:7b2f63d6-66ce-4e1b-a481-e5f347f7e559,
 Instance:46d01dfd-dc81-4459-a4a8-885f05a87d07,
 Instance:7158726e-e76c-4cd4-8207-1ed50cc4d77a,
 Instance:14a0ce40-0ec7-4cf0-b908-0434271369f6]

This example shows that I am running eight instances at the moment in a zone called ch-gva-2, our familiar exoscale. Selecting one of these instance objects will give you access to all the methods available for instances. You could also list, delete and create keypairs. List, delete and create security groups etc.

Eutester is meant for building integration tests and easily creating test scenarios. If you are looking for a client to build an application with, use Boto.

The master branch of eutester may still cause problems to list images from a CloudStack cloud. I recently patched a fork of the testing branch and opened an issue on their github page. You might want to check its status if you want to use eutester heavily.

Migrating from Publican to Sphinx and Read The Docs

2014-03-19T08:26:00.002-07:00

Migration from Publican to Sphinx and Read The Docs

When we started with Cloudstack we chose to use publican for our documentation. I don't actually know why, except that Red Hat documentation is entirely based on publican. Perhaps David Nalley's background with Fedora influenced us :) In any case publican is a very nice documentation building system, it is based on the docbook format and has great support for localization. However it can become difficult to read and organize lots of content, and builds may break for strange reasons. We also noticed that we were not getting many contributors to the documentation, in contrast, the translation efforts via transifex has had over 80 contributors. As more features got added to CloudStack the quality of the content also started to suffer and we also faced issues with publishing the translated documents. We needed to do something, mainly making it easier to contribute to our documentation. Enters ReStructured Text (RST) and Read The Docs (RTD).

Choosing a new format

We started thinking about how to make our documentation easier to contribute to. Looking at Docbook, purely xml based, it is a powerful format but not very developer friendly. A lot of us are happy with basic text editor, with some old farts like me mainly stuck with vi. Markdown has certainly helped a lot of folks in writing documentation and READMEs, just look at Github projects. I started writing in Markdown and my production in terms of documentation and tutorials skyrocketed, it is just a great way to write docs. Restructured Text is another alternative, not really markdown, but pretty close. I got familiar with RST in the Apache libcloud project and fell in love with it, or at least liked it way more than docbook. RST is basically text, only a few markups to learn and your off.

Publishing Platform

A new format is one thing but you then need to build documentation in multiple formats: html, pdf, epub potentially more. How do you move from .rst to these formats for your projects ? Comes in Sphinx, pretty much an equivalent to publican originally aimed at Python documentation but now aimed at much more. Installing sphinx is easy, for instance on Debian/Ubuntu:

apt-get install python-sphinx

You will then have the sphinx-quickstart command in your path, use it to create your sphinx project, add content in index.rst and build the docs with make html. Below is a basic example for a ff sample project.

What really got me sold on reStructuredText and Sphinx was ReadTheDocs (RTD). It hosts documentation for open source projects. It automatically pulls your documentation from your revision control system and builds the docs. The killer feature for me was the integration with github (not just git). Using hooks, RTD can trigger builds on every commit and it also displays an edit on github icon on each documentation page. Click on this icon, and the docs repository will get forked automatically on your github account. This means that people can edit the docs straight up in the github UI and submit pull requests as they read the docs and find issues.

Conversion

After [PROPOSAL] and [DISCUSS] threads on the cloudstack mailing list, we reached consensus and decided to make the move. This is still on-going but we are getting close to going live with our new docs in RST and hosted by RTD. There were couple challenges:

Converting the existing docbook based documentation to RST
Setting up new repos, CNAMEs and Read The Docs projects
Setting up the localization with transifex

The conversion was much easier than expected thanks to pandoc, one of those great command line utility that saves your life.

pandoc -f docbook -t rst -o test.rst test.docbook

You get the just of it, iterate through your docbook files and generate the RST files, combine everything to reconstruct your chapters and books and re-organize as you wish. They are off course couple gotchas, namely the table formatting may not be perfect, the note and warnings may be a bit out of whack and the heading levels should probably be checked. All of these are actually good to check as a first pass through the docs to revamp the content and the way it is organized.
One thing that we decided to do before talking about changing the format was to move our docs to a separate repository. What we wanted to do was to be able to release docs on a different time frame than the code release, as well as make any doc bug fixes go live as fast as possible and not wait for a code release (that's a long discussion...). With a documentation specific repo in place, we used Sphinx to create the proper directory structure and add the converted RST files. Then we created a project on Read The Docs and pointed to the github mirror of our Apache git repo. Pointing to the github mirror allowed us to enable the nice github interaction that RTD provides. The new doc site looks like this.

There is a bit more to it, as we actually created several repositories and used a RTD feature called subprojects to make all the docs live under the same CNAME docs.cloudstack.apache.org. This is still work in progress but in track for the 4.3 code release. I hope to be able to announce the new documentation sites shortly after 4.3 is announced.
The final hurdle is the localization support. Sphinx provides utilities to generate POT files. They can then be uploaded to transifex and translation strings can be pulled to construct the translated docs. The big challenge that we are facing is to not loose the existing translation that were done from the docbook files. Strings may have changed. We are still investigating how to not loose all that work and get back on our feet to serve the translated docs. The Japanese translators have started to look at this.
Overall the migration was easy, ReStructuredText is easy, Sphinx is also straigthfoward and Read The Docs provides a great hosting platform well integrated with Github. Once we go live, we will see if our doc contributors increase significantly, we have already seen a few pull requests come in, which is very encouraging.
I will be talking about all of this at the Write The Docs conference in Budapest on March 31st, april 1st. If you are in the area stop by :)

Why CloudStack is not a Citrix project

2014-03-04T15:07:00.001-08:00

I was at CloudExpo Europe in London last week for the Open Cloud Forum to give a tutorial on CloudStack tools. A decent crowd showed up, all carrying phones. Kind of problematic for a tutorial where I wanted the audience to install python packages and actually work :) Luckily I made it self-paced so you can follow at home. Giles from Shapeblue was there too and he was part of a panel on Open Cloud. He was told once again "But Apache CloudStack is a Citrix project !" This in itself is a paradox and as @jzb told me on twitter yesterday "Citrix donated CloudStack to Apache, the end". Apache projects do not have any company affiliation.

I don't blame folks, with all the vendors seemingly supporting OpenStack, it does seem that CloudStack is a one supporter project. The commit stats are also pretty clear with 39% of commits coming from Citrix. This number is also probably higher since those stats are reporting gmail and apache as domain contributing 20 and 15% respectively, let's say 60% is from Citrix. But nonetheless, this is ignoring and mis-understanding what Apache is and looking at the glass half empty.

When Citrix donated CloudStack to the Apache Software Foundation (ASF) it relinquished control of the software and the brand. This actually put Citrix in a bind, not being able to easily promote the CloudStack project. Indeed, CloudStack is now a trademark of the ASF and Citrix had to rename their own product CloudPlatform (powered by Apache CloudStack). Citrix cannot promote CloudStack directly, it needs to get approval to donate sponsoring and follow the ASF trademark guidelines. Every committer and especially PMC members of Apache CloudStack are now supposed to work and protect the CloudStack brand as part of the ASF and make sure that any confusion is cleared. This is what I am doing here.

Of course when the software was donated, an initial set of committers was defined, all from Citrix and mostly from the former cloud.com startup. Part of the incubating process at the ASF is to make sure that we can add committers from other organization and attract a community. "Community over Code" is the bread and butter of ASF and so this is what we have all been working on, expanding the community outside Citrix, welcoming anyone who thinks CloudStack is interesting enough to contribute a little bit of time and effort. Looking at the glass half empty is saying that CloudStack is a Citrix project "Hey look 60% of their commits is from Citrix", looking at it half full like I do is saying "Oh wow, in a year since graduation, they have diversified the committer based, 40% are not from Citrix". Is 40% enough ? of course not, I wish it were the other way around, I wish Citrix were only a minority in the development of CloudStack.

Couple other numbers: Out of the 26 members of the project management committee (PMC) only seven are from Citrix and looking at mailing lists participation since the beginning of the year, 20% of the folks on the users mailing list and 25% on the developer list are from Citrix. We have diversified the community a great deal but the "hand-over", that moment when new community members are actually writing more code than the folks who started it, has not happened yet. A community is not just about writing code, but I will give it to you that it is not good for a single company to "control" 60% of the development, this is not where we/I want to be.

This whole discussion is actually against Apache's modus operandi. Since one of the biggest tenant of the foundation is non-affiliation. When I participate on the list I am Sebastien, I am not a Citrix employee. Certainly this can put some folks in conflicting situations at times, but the bottom line is that we do not and should not take into account company affiliation when working and making decisions for the project. But if you really want some company name dropping, let's do an ASF faux-pas and let's look at a few features:

The Nicira/NSX and OpenDaylight SDN integration was done by Schuberg Phillis, the OpenContrail plugin was done by Juniper, Midokura created it's own plugin for Midonet and Stratosphere as well, giving us a great SDN coverage. The LXC integration was done by Gilt, Klarna is contributing in the ecosystem with the vagrant and packer plugins, CloudOps has been doing terrific job with Chef recipes, Palo-Alto networks integration and Netscaler support, a google summer of code intern did a brand new LDAP plugin and another GSoC did the GRE support for KVM. RedHat contributed the Gluster plugin and PCExtreme contributed the Ceph interface while Basho of course contributed the S3 plugin for secondary storage as well as major design decisions on the storage refactor. The Solidfire plugin was done by, well Solidfire and Netapp has developed a plugin as well for their virtual storage console. NTT contributed the CloudFoundry interface via BOSH. On the user side, Shapeblue is leading the user support company. So no it's not just Citrix.

Are all these companies members of the CloudStack project ? No. There is no such thing as a company being a member of an ASF project. There is no company affiliation, there is no lock in, just a bunch of guys trying to make good software and build a community. And yes, I work for Citrix and my job here will be done when Citrix only contributes 49% of the commits. Citrix is paying me to make sure they loose control of the software, that a healthy ecosystem develops and that CloudStack keeps on becoming a strong and vibrant Apache project. I hope one day folks will understand what CloudStack has become, an ASF project, like HTTP, Hadoop, Mesos, Ant, Maven, Lucene, Solr and 150 other projects. Come to Denver for #apachecon you will see ! The end.

PaaS with CloudStack

2014-01-21T04:18:00.000-08:00

A few talks from CCC in Amsterdam

In November at the CloudStack Collaboration Conference I was pleased to see several talks on PaaS. We had Uri Cohen (@uri1803) from Gigaspaces, Marc-Elian Begin (@lemeb) from Sixsq and Alex Heneveld (@ahtweetin) from CloudSoft. We also had some related talks -depending on your definition of PaaS- with talks about Docker and Vagrant.

PaaS variations

The differences between PaaS solutions is best explained by this picture from AWS FAQ about application management.

There is clearly a spectrum that goes from operational control to pure application deployment. We could argue that true PaaS abstracts the operational details and that management of the underlying infrastructure should be totally hidden, that said, automation of virtual infrastructure deployment has reached such a sophisticated state that it blurs the definition between IaaS and PaaS. Not suprisingly AWS offers services that covers the entire spectrum.
Since I am more on the operation side, I tend to see a PaaS as an infrastructure automation framework. For instance I look for tools to deploy a MongoDB cluster or a RiakCS cluster. I am not looking for an abstract plaform that has Monogdb pre-installed and where I can turn a knob to increase the size of the cluster or manage my shards. An application person will prefer to look at something like Google App Engine and it's open source version Appscale. I will get back to all these differences in a next post on PaaS but this article by @DavidLinthicum that just came out is a good read.

Support for CloudStack

What is interesting for the CloudStack community is to look at the support for CloudStack in all these different solutions wherever they are in the application management spectrum.

Cloudify from Gigaspaces was all over twitter about their support for OpenStack, and I was getting slightly bothered with the lack of CloudStack support. That's why it was great to see Uri Cohen in Amstredam. He delivered a great talk and he gave me a demo of Cloudify. I was very impressed of course by the slick UI but overall by the ability to provision complete application/infrastructure definitions on clouds. Underlying it uses Apache jclouds, so there was no reason that it could not talk to CloudStack. Over christmas Uri did a terrific job and the CloudStack support is now tested and documented. It works not only on the commercial version from Citrix CloudPlatform but also with Apache CloudStack. And of course it works with my neighbors Cloud exoscale
Slipstream is not widely known but worth a look. At CCC @lemeb demoed a CloudStack driver. Since then, they now offer an hosted version of their slipstream cloud orchestration framework which turns out to be hosted on exoscale CloudStack cloud. Slipstream is more of a Cloud broker than a PaaS but it automates application deployment on multiple clouds abstracting the various cloud APIs and offering application templates for deployments of virtual infrastructure. Check it out.
Cloudsoft main application deployment engine is brooklyn, it originated from Alex Heneveld contribution to Apache Whirr that I wrote about couple times. But it can use OpenShift for additional level of PaaS. I will need to check with Alex how they are doing this, as I believe Openshift uses LXC. Since CloudStack has LXC support, one ought to be able to use Brooklyn to deploy a LXC cluster on CloudStack and then use OpenShift to manage deployed applications.
A quick note on OpenShift. As far as I understand, it actually uses a static cluster. The scalability comes from the use of containes in the nodes. So technically you could create an OpenShift cluster in CloudStack, but I don't think we will see OpenShift talking directly to the CloudStack API to add nodes. Openshift bypasses the IaaS APIs. Of course I have not looked at it in a while and I may be wrong :)
Talking about PaaS for Vagrant is probably a bit far fetched, but it fits the infrastructure deployment criteria and could be compared with AWS OpsWorks. Vagrant helps to define reproducible machines so that devs and ops can actually work on the same base servers. But Vagrant with its plugins can also help deployment on public clouds, and can handle multiple server definitions. So one can look at a Vagrantfile as a template defintion for a virtual infrastructure deployment. As a matter of fact, there are many Vagrant boxes out there to deploy things like Apache Mesos clusters, MongoDB, RiakCS clusters etc. It's not meant to manage that stack in production but at a minimum can help develop it. Vagrant has a CloudStack plugin demoed by Hugo Correia from Klarna at CCC. Exoscale took the bait and created a set of -exoboxes- that's real gold for developers deploying in exoscale and any CloudStack provider should follow suit.
Which brings me on to Docker, there is currently no support for Docker in CloudStack. We do have LXC support therefore it would not be to hard to have a 'docker' cluster in CloudStack. You could even install docker within an image and deploy that on KVM or Xen. Of course some would argue that using containers within VMs defeats the purpose. In any case, with the Docker remote API you could then manage your containers. OpenStack already has a Docker integration, we will dig deeper into Docker functionality to see how best to integrate it in CloudStack.
AWS as I mentioned has several PaaS like layers with OpsWorks, CloudFormation, Beanstalk. CloudStack has an EC2 interface but also has a third party solution to enabled CloudFormation. This is still under development but pretty close to full functionality, check out stackmate and its web interface stacktician. With a CF interface to CloudStack we could see a OpSWork and a Beanstalk interface coming in the future.
Finally, not present at CCC but the leader of PaaS for enterprise is CloudFoundry. I am going to see Andy Piper (@andypiper) in London next week and will make sure to talk to him about the recent CloudStack support that was merged in the cloudfoundry community repo. It came from folks in Japan and I have not had time to test it. Certainly we as a community should look at this very closely to make sure there is outstanding support for CloudFoundry in ACS.

It is not clear what the frontier between PaaS and IaaS is, it is highly dependent on the context, who you are and what you are trying to achieve. But CloudStack offers several interfaces to PaaS or shall I say PaaS offer several connectors to CloudStack :)

Clojure with CloudStack

2013-12-19T00:15:00.001-08:00

CloStack

CloStack is a Clojure client for Apache CloudStack. Clojure is a dynamic programming language for the Java Virtual Machine (JVM). It is compiled directly in JVM bytecode but offers a dynamic and interactive nature of an interpreted language like Python. Clojure is a dialect of LISP and as such is mostly a functional programming language.

You can try Clojure in your browser and get familiar with its read eval loop (REPL). To get started, you can follow the tutorial for non-LISP programmers through this web based REPL.

To give you a taste for it, here is how you would 2 and 2:

user=> (+ 2 2)
4

And how you would define a function:

user=> (defn f [x y]
  #_=> (+ x y))
#'user/f
user=> (f 2 3)
5

This should give you a taste of functional programming :)

Install Leinigen

leiningen is a tool for managing Clojure projects easily. With lein you can create the skeleton of clojure project as well as start a read eval loop (REPL) to test your code.

Installing the latest version of leiningen is easy, get the script and set it in your path. Make it executable and your are done.

The first time your run lein repl it will boostrap itself:

$ lein repl
Downloading Leiningen to /Users/sebgoa/.lein/self-installs/leiningen-2.3.4-standalone.jar now...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M  100 13.0M    0     0  1574k      0  0:00:08  0:00:08 --:--:-- 2266k
nREPL server started on port 58633 on host 127.0.0.1
REPL-y 0.3.0
Clojure 1.5.1
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> exit
Bye for now!

Download CloStack

To install CloStack, clone the github repository and start lein repl:

git clone https://github.com/pyr/clostack.git
$ lein repl
Retrieving codox/codox/0.6.4/codox-0.6.4.pom from clojars
Retrieving codox/codox.leiningen/0.6.4/codox.leiningen-0.6.4.pom from clojars
Retrieving leinjacker/leinjacker/0.4.1/leinjacker-0.4.1.pom from clojars
Retrieving org/clojure/core.contracts/0.0.1/core.contracts-0.0.1.pom from central
Retrieving org/clojure/core.unify/0.5.3/core.unify-0.5.3.pom from central
Retrieving org/clojure/clojure/1.4.0/clojure-1.4.0.pom from central
Retrieving org/clojure/core.contracts/0.0.1/core.contracts-0.0.1.jar from central
Retrieving org/clojure/core.unify/0.5.3/core.unify-0.5.3.jar from central
Retrieving org/clojure/clojure/1.4.0/clojure-1.4.0.jar from central
Retrieving codox/codox/0.6.4/codox-0.6.4.jar from clojars
Retrieving codox/codox.leiningen/0.6.4/codox.leiningen-0.6.4.jar from clojars
Retrieving leinjacker/leinjacker/0.4.1/leinjacker-0.4.1.jar from clojars
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom from central
Retrieving org/clojure/data.json/0.2.2/data.json-0.2.2.pom from central
Retrieving http/async/client/http.async.client/0.5.2/http.async.client-0.5.2.pom from clojars
Retrieving com/ning/async-http-client/1.7.10/async-http-client-1.7.10.pom from central
Retrieving io/netty/netty/3.4.4.Final/netty-3.4.4.Final.pom from central
Retrieving org/clojure/data.json/0.2.2/data.json-0.2.2.jar from central
Retrieving com/ning/async-http-client/1.7.10/async-http-client-1.7.10.jar from central
Retrieving io/netty/netty/3.4.4.Final/netty-3.4.4.Final.jar from central
Retrieving http/async/client/http.async.client/0.5.2/http.async.client-0.5.2.jar from clojars
nREPL server started on port 58655 on host 127.0.0.1
REPL-y 0.3.0
Clojure 1.5.1
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> exit

The first time that you start the REPL lein will download all the dependencies of clostack.

Prepare environment variables and make your first `clostack` call

Export a few environmen variables to define the cloud you will be using, namely:

export CLOUDSTACK_ENDPOINT=http://localhost:8080/client/api
export CLOUDSTACK_API_KEY=HGWEFHWERH8978yg98ysdfghsdfgsagf
export CLOUDSTACK_API_SECRET=fhdsfhdf869guh3guwghseruig

Then relaunch the REPL

$lein repl
nREPL server started on port 59890 on host 127.0.0.1
REPL-y 0.3.0
Clojure 1.5.1
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> (use 'clostack.client)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
nil

You can safely discard the warning message which only indicates that 'clostack' is meant to be used as a library in a clojure project.
Define a client to your CloudStack endpoint

user=> (def cs (http-client))
#'user/cs

And call an API like so:

user=> (list-zones cs)
{:listzonesresponse {:count 1, :zone [{:id "1128bd56-b4d9-4ac6-a7b9-c715b187ce11", :name "CH-GV2", :networktype "Basic", :securitygroupsenabled true, :allocationstate "Enabled", :zonetoken "ccb0a60c-79c8-3230-ab8b-8bdbe8c45bb7", :dhcpprovider "VirtualRouter", :localstorageenabled true}]}}

To explore the API calls that you can make, the REPL features tab completion. Enter list or de and press the tab key you should see:

user=> (list
list                                list*                               list-accounts                       list-async-jobs                     
list-capabilities                   list-disk-offerings                 list-event-types                    list-events                         
list-firewall-rules                 list-hypervisors                    list-instance-groups                list-ip-forwarding-rules            
list-iso-permissions                list-isos                           list-lb-stickiness-policies         list-load-balancer-rule-instances   
list-load-balancer-rules            list-network-ac-ls                  list-network-offerings              list-networks                       
list-os-categories                  list-os-types                       list-port-forwarding-rules          list-private-gateways               
list-project-accounts               list-project-invitations            list-projects                       list-public-ip-addresses            
list-remote-access-vpns             list-resource-limits                list-security-groups                list-service-offerings              
list-snapshot-policies              list-snapshots                      list-ssh-key-pairs                  list-static-routes                  
list-tags                           list-template-permissions           list-templates                      list-virtual-machines               
list-volumes                        list-vp-cs                          list-vpc-offerings                  list-vpn-connections                
list-vpn-customer-gateways          list-vpn-gateways                   list-vpn-users                      list-zones                          
list?

user=> (de
dec                           dec'                          decimal?                      declare                       def                           
default-data-readers          definline                     definterface                  defmacro                      defmethod                     
defmulti                      defn                          defn-                         defonce                       defprotocol                   
defrecord                     defreq                        defstruct                     deftype                       delay                         
delay?                        delete-account-from-project   delete-firewall-rule          delete-instance-group         delete-ip-forwarding-rule     
delete-iso                    delete-lb-stickiness-policy   delete-load-balancer-rule     delete-network                delete-network-acl            
delete-port-forwarding-rule   delete-project                delete-project-invitation     delete-remote-access-vpn      delete-security-group         
delete-snapshot               delete-snapshot-policies      delete-ssh-key-pair           delete-static-route           delete-tags                   
delete-template               delete-volume                 delete-vpc                    delete-vpn-connection         delete-vpn-customer-gateway   
delete-vpn-gateway            deliver                       denominator                   deploy-virtual-machine        deref                         
derive                        descendants                   destroy-virtual-machine       destructure                   detach-iso                    
detach-volume

To pass arguments to a call follow the syntax:

user=> (list-templates cs :templatefilter "executable")

Start a virtual machine

To deploy a virtual machine you need to get the serviceofferingid or instance type, the templateid also known as the image id and the zoneid, the call is then very similar to CloudMonkey and returns a jobid

user=> (deploy-virtual-machine cs :serviceofferingid "71004023-bb72-4a97-b1e9-bc66dfce9470" :templateid "1d961c82-7c8c-4b84-b61b-601876dab8d0" :zoneid "1128bd56-b4d9-4ac6-a7b9-c715b187ce11")
{:deployvirtualmachineresponse {:id "d0a887d2-e20b-4b25-98b3-c2995e4e428a", :jobid "21d20b5c-ea6e-4881-b0b2-0c2f9f1fb6be"}}

You can pass additional parameters to the deploy-virtual-machine call, such as the keypair and the securitygroupname:

user=> (deploy-virtual-machine cs :serviceofferingid "71004023-bb72-4a97-b1e9-bc66dfce9470" :templateid "1d961c82-7c8c-4b84-b61b-601876dab8d0" :zoneid "1128bd56-b4d9-4ac6-a7b9-c715b187ce11" :keypair "exoscale")
{:deployvirtualmachineresponse {:id "b5fdc41f-e151-43e7-a036-4d87b8536408", :jobid "418026fc-1009-4e7a-9721-7c9ad47b49e4"}}

To query the asynchronous job, you can use the query-async-job api call:

user=> (query-async-job-result cs :jobid "418026fc-1009-4e7a-9721-7c9ad47b49e4")
{:queryasyncjobresultresponse {:jobid "418026fc-1009-4e7a-9721-7c9ad47b49e4", :jobprocstatus 0, :jobinstancetype "VirtualMachine", :accountid "b8c0baab-18a1-44c0-ab67-e24049212925", :jobinstanceid "b5fdc41f-e151-43e7-a036-4d87b8536408", :created "2013-12-16T12:25:21+0100", :jobstatus 0, :jobresultcode 0, :cmd "com.cloud.api.commands.DeployVMCmd", :userid "968f6b4e-b382-4802-afea-dd731d4cf9b9"}}

And finally to destroy the virtual machine you would pass the id of the VM to the destroy-virtual-machine call like so:

user=> (destroy-virtual-machine cs :id "d0a887d2-e20b-4b25-98b3-c2995e4e428a")
{:destroyvirtualmachineresponse {:jobid "8fc8a8cf-9b54-435c-945d-e3ea2f183935"}}

With these simple basics you can keep on exploring clostack and the CloudStack API.

Use `CloStack` within your own clojure project

Hello World in clojure

To write your own clojure project that makes user of clostack, use leiningen to create a project skeleton

lein new toto

Lein will automatically create a src/toto/core.clj file, edit it to replace the function foo with -main. This dummy function returns Hellow World !. Let's try to execute it. First we will need to define the main namespace in the project.clj file. Edit it like so:

defproject toto "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :main toto.core :dependencies [[org.clojure/clojure "1.5.1"]])

Note the :main toto.core

You can now execute the code with lein run john . Indeed if you check the -main function in src/toto/core.clj you will see that it takes an argument. Surprisingly you should see the following output:

$ lein run john
john Hello, World!

Let's now add the CloStack dependency and modify the main function to return the zone of the CloudStack cloud.

Adding the Clostack dependency

Edit the project.clj to add a dependency on clostack and a few logging packages:

:dependencies [[org.clojure/clojure "1.5.1"]
               [clostack "0.1.3"]
               [org.clojure/tools.logging "0.2.6"]
               [org.slf4j/slf4j-log4j12   "1.6.4"]
               [log4j/apache-log4j-extras "1.0"]
               [log4j/log4j               "1.2.16"
                :exclusions [javax.mail/mail
                             javax.jms/jms
                             com.sun.jdkmk/jmxtools
                             com.sun.jmx/jmxri]]])

lein should have created a resources directory. In it, create a log4j.properties file like so:

$ more log4j.properties 
# Root logger option
log4j.rootLogger=INFO, stdout

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

A discussion on logging is beyond the scope of this recipes, we merely add it in the configuration for a complete example.

Now you can edit the code in src/toto/core.clj with some basic calls.

(ns testclostack.core
  (:require [clostack.client :refer [http-client list-zones]]))

(defn foo
  "I don't do a whole lot."
  [x]
  (println x "Hello, World!"))

(def cs (http-client))

(defn -main [args]
  (println (list-zones cs))
  (println args "Hey Wassup")
  (foo args)
)

Simply run this clojure code with lein run joe in the source of your project. And that's it, you have sucessfully discovered the very basics of Clojure and used the CloudStack client clostack to write your first Clojure code. Now for something more significant, look at Pallet

2014 Cloud Predictions

2013-12-18T12:32:00.000-08:00

Warning: this is written with a glass of wine on one hand, two days before vacation ...:)

1. CloudStack will abandon semantic versioning and adopt super hero names for its releases, this will make upgrade paths more understandable.

2. Someone will take Euca API server and stick CloudStack backend beneath it, adding Opennebula packaging will make this the best cloud distro of all.

3. I will finally make sense of NetflixOSS plethora of software and reach nirvana by integrating CloudStack in Asgard.

4. AWS will opensource its software killing OpenStack, and we will realize that in fact they use CloudStack with Euca in front.

5. I will understand what NFV, VNF and SDN really mean and come up with a new acronym that will set twitter on fire.

6. We will actually see some code in Solum.

7. bitcoin will crash and come back up at least five times.

8. Citrix stock will jump 100% on acquisition by IBM.

9. My boss will stop asking me for statistics.

10. Facebook will die on a Snowden revelation.

I will stop at 10 otherwise this could go on all night :)

Happy Holidays everyone

Veewee, Vagrant and CloudStack

2013-12-06T04:58:00.000-08:00

Coming back from CloudStack conference the feeling that this is not about building clouds got stronger. This is really about what to do with them and how they bring you agility, faster-time to market and allow you to focus on innovation in your core business. A large component of this is Culture and a change of how we do IT. The DevOps movement is the embodiment of this change. Over in Amsterdam I was stoked to meet with folks that I had seen at other locations throughout Europe in the last 18 months. Folks from PaddyPower, SchubergPhilis, Inuits who all embrace DevOps. I also met new folks, including Hugo Correia from Klarna (CloudStack users) who came by to talk about Vagrant-cloudstack plugin. His talk and a demo by Roland Kuipers from Schuberg was enough to kick my butt and get me to finally check out Vagrant. I sprinkled a bit of Veewee and of course some CloudStack on top of it all. Have fun reading.

Automation is key to a reproducible, failure-tolerant infrastructure. Cloud administrators should aim to automate all steps of building their infrastructure and be able to re-provision everything with a single click. This is possible through a combination of configuration management, monitoring and provisioning tools. To get started in created appliances that will be automatically configured and provisioned, two tools stand out in the arsenal: Veewee and Vagrant.

Veewee: Veewee is a tool to easily create appliances for different hypervisors. It fetches the .iso of the distribution you want and build the machine with a kickstart file. It integrates with providers like VirtualBox so that you can build these appliances on your local machine. It supports most commonly used OS templates. Coupled with virtual box it allows admins and devs to create reproducible base appliances. Getting started with veewee is a 10 minutes exericse. The README is great and there is also a very nice post that guides you through your first box building.

Most folks will have no issues cloning Veewee from github and building it, you will need ruby 1.9.2 or above. You can get it via `rvm` or your favorite ruby version manager.

git clone https://github.com/jedi4ever/veewee
gem install bundler
bundle install

Setting up an alias is handy at this point `alias veewee="bundle exec veewee"`. You will need a virtual machine provider (e.g VirtualBox, VMware Fusion, Parallels, KVM). I personnaly use VirtualBox but pick one and install it if you don't have it already. You will then be able to start using `veewee` on your local machine. Check the sub-commands available (for virtualbox):

$ veewee vbox
Commands:
  veewee vbox build [BOX_NAME]                     # Build box
  veewee vbox copy [BOX_NAME] [SRC] [DST]          # Copy a file to the VM
  veewee vbox define [BOX_NAME] [TEMPLATE]         # Define a new basebox starting from a template
  veewee vbox destroy [BOX_NAME]                   # Destroys the virtualmachine that was built
  veewee vbox export [BOX_NAME]                    # Exports the basebox to the vagrant format
  veewee vbox halt [BOX_NAME]                      # Activates a shutdown the virtualmachine
  veewee vbox help [COMMAND]                       # Describe subcommands or one specific subcommand
  veewee vbox list                                 # Lists all defined boxes
  veewee vbox ostypes                              # List the available Operating System types
  veewee vbox screenshot [BOX_NAME] [PNGFILENAME]  # Takes a screenshot of the box
  veewee vbox sendkeys [BOX_NAME] [SEQUENCE]       # Sends the key sequence (comma separated) to the box. E.g for testing the :boot_cmd_sequence
  veewee vbox ssh [BOX_NAME] [COMMAND]             # SSH to box
  veewee vbox templates                            # List the currently available templates
  veewee vbox undefine [BOX_NAME]                  # Removes the definition of a basebox 
  veewee vbox up [BOX_NAME]                        # Starts a Box
  veewee vbox validate [BOX_NAME]                  # Validates a box against vagrant compliancy rules
  veewee vbox winrm [BOX_NAME] [COMMAND]           # Execute command via winrm

Options:
          [--debug]           # enable debugging
  -w, --workdir, [--cwd=CWD]  # Change the working directory. (The folder containing the definitions folder).
                              # Default: /Users/sebgoa/Documents/gitforks/veewee

Pick a template from the `templates` directory and `define` your first box:

veewee vbox define myfirstbox CentOS-6.5-x86_64-minimal

You should see that a `defintions/` directory has been created, browse to it and inspect the `definition.rb` file. You might want to comment out some lines, like removing `chef` or `puppet`. If you don't change anything and build the box, you will then be able to `validate` the box with `veewee vbox validate myfirstbox`. To build the box simply do:

veewee vbox build myfristbox

Everything should be successfull, and you should see a running VM in your virtual box UI. To export it for use with `Vagrant`, `veewee` provides an export mechanism (really a VBoxManage command): `veewee vbox export myfirstbox`. At the end of the export, a .box file should be present in your directory.

Vagrant: Picking up from where we left with `veewee`, we can now add the box to Vagrant and customize it with shell scripts or much better, with Puppet recipes or Chef cookbooks. First let's add the box file to Vagrant:

vagrant box add 'myfirstbox' '/path/to/box/myfirstbox.box'

Then in a directory of your choice, create the Vagrant "project":

 
vagrant init 'myfirstbox'

This will create a `Vagrantfile` that we will later edit to customize the box. You can boot the machine with `vagrant up` and once it's up , you can ssh to it with `vagrant ssh`.

While `veewee` is used to create a base box with almost no customization (except potentially a chef and/or puppet client), `vagrant` is used to customize the box using the Vagrantfile. For example, to customize the `myfirstbox` that we just built, set the memory to 2 GB, add a host-only interface with IP 192.168.56.10, use the apache2 Chef cookbook and finally run a `boostrap.sh` script, we will have the following `Vagrantfile`:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "myfirstbox"
  config.vm.provider "virtualbox" do |vb|
    vb.customize ["modifyvm", :id, "--memory", 2048]
  end

  #host-only network setup
  config.vm.network "private_network", ip: "192.168.56.10"

  # Chef solo provisioning
  config.vm.provision "chef_solo" do |chef|
     chef.add_recipe "apache2"
  end

  #Test script to install CloudStack
  #config.vm.provision :shell, :path => "bootstrap.sh"
  
end

The cookbook will be in a `cookbooks` directory and the boostrap script will be in the root directory of this vagrant definition. For more information, check the Vagrant website and experiment.

Vagrant and CloudStack: What is very interesting with Vagrant is that you can use various plugins to deploy machines on public clouds. There is a `vagrant-aws` plugin and of course a `vagrant-cloudstack` plugin. You can get the latest CloudStack plugin from github. You can install it directly with the `vagrant` command line:

vagrant plugin install vagrant-cloudstack

Or if you are building it from source, clone the git repository, build the gem and install it in `vagrant`

git clone https://github.com/klarna/vagrant-cloudstack.git
gem build vagrant-cloudstack.gemspec
gem install vagrant-cloudstack-0.1.0.gem
vagrant plugin install /Users/sebgoa/Documents/gitforks/vagrant-cloudstack/vagrant-cloudstack-0.1.0.gem

The only drawback that I see is that one would want to upload his local box (created from the previous section) and use it. Instead one has to create `dummy boxes` that use existing templates available on the public cloud. This is easy to do, but creates a gap between local testing and production deployments. To build a dummy box simply create a `Vagrantfile` file and a `metadata.json` file like so:

$ cat metadata.json 
{
    "provider": "cloudstack"
}
$ cat Vagrantfile 
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.provider :cloudstack do |cs|
    cs.template_id = "a17b40d6-83e4-4f2a-9ef0-dce6af575789"
  end
end

Where the `cs.template_id` is a uuid of a CloudStack template in your cloud. CloudStack users will know how to easily get those uuids with `CloudMonkey`. Then create a `box` file with `tar cvzf cloudstack.box ./metadata.json ./Vagrantfile`. Note that you can add additional CloudStack parameters in this box definition like the host,path etc (something to think about :) ). Then simply add the box in `Vagrant` with:

vagrant box add ./cloudstack.box

You can now create a new `Vagrant` project:

mkdir cloudtest
cd cloudtest
vagrant init

And edit the newly created `Vagrantfile` to use the `cloudstack` box. Add additional parameters like `ssh` configuration, if the box does not use the default from `Vagrant`, plus `service_offering_id` etc. Remember to use your own api and secret keys and change the name of the box to what you created. For example on exoscale:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "cloudstack"

  config.vm.provider :cloudstack do |cs, override|
    cs.host = "api.exoscale.ch"
    cs.path = "/compute"
    cs.scheme = "https"
    cs.api_key = "PQogHs2sk_3..."
    cs.secret_key = "...NNRC5NR5cUjEg"
    cs.network_type = "Basic"

    cs.keypair = "exoscale"
    cs.service_offering_id = "71004023-bb72-4a97-b1e9-bc66dfce9470"
    cs.zone_id = "1128bd56-b4d9-4ac6-a7b9-c715b187ce11"

    override.ssh.username = "root" 
    override.ssh.private_key_path = "/path/to/private/key/id_rsa_example"
  end

  # Test bootstrap script
  config.vm.provision :shell, :path => "bootstrap.sh"

end

The machine is brought up with:

vagrant up --provider=cloudstack

The following example output will follow:

$ vagrant up --provider=cloudstack
Bringing machine 'default' up with 'cloudstack' provider...
[default] Warning! The Cloudstack provider doesn't support any of the Vagrant
high-level network configurations (`config.vm.network`). They
will be silently ignored.
[default] Launching an instance with the following settings...
[default]  -- Service offering UUID: 71004023-bb72-4a97-b1e9-bc66dfce9470
[default]  -- Template UUID: a17b40d6-83e4-4f2a-9ef0-dce6af575789
[default]  -- Zone UUID: 1128bd56-b4d9-4ac6-a7b9-c715b187ce11
[default]  -- Keypair: exoscale
[default] Waiting for instance to become "ready"...
[default] Waiting for SSH to become available...
[default] Machine is booted and ready for use!
[default] Rsyncing folder: /Users/sebgoa/Documents/exovagrant/ => /vagrant
[default] Running provisioner: shell...
[default] Running: /var/folders/76/sx82k6cd6cxbp7_djngd17f80000gn/T/vagrant-shell20131203-21441-1ipxq9e
Tue Dec  3 14:25:49 CET 2013
This works

Which is a perfect execution of my amazing bootstrap script:

#!/usr/bin/env bash

/bin/date
echo "This works"

You can now start playing with Chef cookbooks, Puppet recipes or SaltStack formulas and automate the configuration of your cloud instances, thanks to Veewee Vagrant and CloudStack.

Fluentd plugin to CloudStack

2013-11-05T05:25:00.000-08:00

When it rains, it pours...Here is a quick write up to use Fluentd to log CloudStack events and usage. Fluentd is an open source software to collect events and logs in JSON format. It has hundreds of plugins that allows you to store the logs/events in your favorite data store like AWS S3, MongoDB and even elasticsearch. It is an equivalent to logstash. The source is available on Github but can also be installed via your favorite package manager (e.g brew, yum, apt, gem). A CloudStack plugin has been written to be able to listen to CloudStack events and store these events in a chosen storage backend. In this blog I will show you how to store CloudStack logs in MongoDB using Fluent. Note that the same thing can be done with logstash, just ask @pyr. The documentation is quite straightforward, but here are the basic steps.

You will need a working `fluentd` installed on your machine. Pick your package manager of choice and install `fluentd`, for instance with `gem` we would do:

    sudo gem install fluentd

`fluentd` will now be in your path, you need to create a configuration file and start `fluentd` using this config. For additional options with `fluentd` just enter `fluentd -h`. The `s` option will create a sample configuration file in the working directory. The `-c` option will start `fluentd` using the specific configuration file. You can then send a test log/event message to the running process with `fluent-cat`.

    $ fluentd -s conf
    $ fluentd -c conf/fluent.conf &
    $ echo '{"json":"message"}' | fluent-cat debug.test

The CloudStack plugin:
CloudStack has a `listEvents` API which does what is says :) it lists events happening within a CloudStack deployment. Such events as the start and stop of a virtual machine, creation of security groups, life cycles events of storage elements, snapshots etc. The `listEvents` API is well documented. Based mostly on this API and the fog ruby library, a CloudStack plugin for `fluentd` was written by Yuichi UEMURA. It is slightly different from using `logstash`, as with `logstash` you can format the log4j logs of the CloudStack management server and directly collect those. Here we rely mostly on the `listEvents` API.

You can install it from source:

    git clone https://github.com/u-ichi/fluent-plugin-cloudstack

Then build your own gem and install it with `sudo gem build fluent-plugin-cloudstack.gemspec` and `sudo gem install fluent-plugin-cloudstack-0.0.8.gem `

Or you install the gem directly:

    sudo gem install fluent-plugin-cloudstack

Generate a configuration file with `fluentd -s conf`, you can specify the path to your configuration file. Edit the configuraton to define a `source` as being from your CloudStack host. For instance if you a running a development environment locally:

    <source>
      type cloudstack
      host localhost
      apikey $cloudstack_apikey
      secretkey $cloustack_secretkey

      # optional
      protocol http             # https or http, default https
      path /client/api          # default /client/api
      port 8080                 # default 443
      #interval 300               # min 300, default 300
      ssl false                 # true or false, default true
      domain_id $cloudstack_domain_id
      tag cloudstack
    </source>

There is currently a small bug in the `interval` definition so I commented it out. You also want to define the tag explicitly as being `cloudstack`. You can then create a `match` section in the configuration file. To keep it simple at first, we will simply echo the events to `stdout`, therefore just add:

	<match cloudstack.**>
	  type stdout
	</match>

Run `fluentd` with `fluentd -c conf/fluent.conf &`, browse the CloudStack UI, create a VM, create a service offering, just do a few things to generate some events that should appear in stdout. Once the interval is passed you will see the events being written to `stdout`:

    $ 2013-11-05 12:19:26 +0100 [info]: starting fluentd-0.10.39
    2013-11-05 12:19:26 +0100 [info]: reading config file path="conf/fluent.conf"
    2013-11-05 12:19:26 +0100 [info]: using configuration file: <ROOT>
      <source>
        type forward
      </source>
      <source>
        type cloudstack
        host localhost
        apikey 6QN8jOzEfhR7Fua69vk5ocDo_tfg8qqkT7-2w7nnTNsSRyPXyvRRAy23683qcrflgliHed0zA3m0SO4W9kh2LQ
        secretkey HZiu9vhPAxA8xi8jpGWMWb9q9f5OL1ojW43Fd7zzQIjrcrMLoYekeP1zT9d-1B3DDMMpScHSR9gAnnG45ewwUQ
        protocol http
        path /client/api
        port 8080
        interval 3
        ssl false
        domain_id a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197
        tag cloudstack
      </source>
      <match debug.**>
        type stdout
      </match>
      <match cloudstack.**>
        type stdout
      </match>
    </ROOT>
    2013-11-05 12:19:26 +0100 [info]: adding source type="forward"
    2013-11-05 12:19:26 +0100 [info]: adding source type="cloudstack"
    2013-11-05 12:19:27 +0100 [info]: adding match pattern="debug.**" type="stdout"
    2013-11-05 12:19:27 +0100 [info]: adding match pattern="cloudstack.**" type="stdout"
    2013-11-05 12:19:27 +0100 [info]: listening fluent socket on 0.0.0.0:24224
    2013-11-05 12:19:27 +0100 [info]: listening cloudstack api on localhost
    2013-11-05 12:19:30 +0100 cloudstack.usages: {"events_flow":0}
    2013-11-05 12:19:30 +0100 cloudstack.usages: {"vm_sum":1,"memory_sum":536870912,"cpu_sum":1,"root_volume_sum":1400,"data_volume_sum":0,"Small Instance":1}
    2013-11-05 12:19:33 +0100 cloudstack.usages: {"events_flow":0}
    2013-11-05 12:19:33 +0100 cloudstack.usages: {"vm_sum":1,"memory_sum":536870912,"cpu_sum":1,"root_volume_sum":1400,"data_volume_sum":0,"Small Instance":1}
    2013-11-05 12:19:36 +0100 cloudstack.usages: {"events_flow":0}
    2013-11-05 12:19:36 +0100 cloudstack.usages: {"vm_sum":1,"memory_sum":536870912,"cpu_sum":1,"root_volume_sum":1400,"data_volume_sum":0,"Small Instance":1}
    2013-11-05 12:19:39 +0100 cloudstack.usages: {"events_flow":0}
    ...
    2013-11-05 12:19:53 +0100 cloudstack.event: {"id":"b5051963-33e5-4f44-83bc-7b78763dcd24","username":"admin","type":"VM.DESTROY","level":"INFO","description":"Successfully completed destroying Vm. Vm Id: 17","account":"admin","domainid":"a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197","domain":"ROOT","created":"2013-11-05T12:19:53+0100","state":"Completed","parentid":"d0d47009-050e-4d94-97d9-a3ade1c80ee3"}
    2013-11-05 12:19:53 +0100 cloudstack.event: {"id":"39f8ff37-515c-49dd-88d3-eeb77d556223","username":"admin","type":"VM.DESTROY","level":"INFO","description":"destroying Vm. Vm Id: 17","account":"admin","domainid":"a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197","domain":"ROOT","created":"2013-11-05T12:19:53+0100","state":"Started","parentid":"d0d47009-050e-4d94-97d9-a3ade1c80ee3"}
    2013-11-05 12:19:53 +0100 cloudstack.event: {"id":"d0d47009-050e-4d94-97d9-a3ade1c80ee3","username":"admin","type":"VM.DESTROY","level":"INFO","description":"destroying vm: 17","account":"admin","domainid":"a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197","domain":"ROOT","created":"2013-11-05T12:19:53+0100","state":"Scheduled"}
    2013-11-05 12:19:55 +0100 cloudstack.usages: {"events_flow":3}
    2013-11-05 12:19:55 +0100 cloudstack.usages: {"vm_sum":1,"memory_sum":536870912,"cpu_sum":1,"root_volume_sum":1400,"data_volume_sum":0,"Small Instance":1}
    ...
    2013-11-05 12:20:18 +0100 cloudstack.event: {"id":"11136a76-1de0-4907-b31d-2557bc093802","username":"admin","type":"SERVICE.OFFERING.CREATE","level":"INFO","description":"Successfully completed creating service offering. Service offering id=13","account":"system","domainid":"a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197","domain":"ROOT","created":"2013-11-05T12:20:18+0100","state":"Completed"}
    2013-11-05 12:20:19 +0100 cloudstack.usages: {"events_flow":1}
    2013-11-05 12:20:19 +0100 cloudstack.usages: {"vm_sum":1,"memory_sum":536870912,"cpu_sum":1,"root_volume_sum":1400,"data_volume_sum":0,"Small Instance":1}

I cut some of the output for brevity, note that I do have an interval listed as `3` because I did not want to wait 300 minutes. Therefore I installed from source and patched the plugin, it should be fixed in the source soon. You might have a different endpoint and of course different keys, and don't worry about me sharing that `secret_key` I am using a simulator, that key is already gone.

Getting the events and usage information on stdout is interesting, but the kicker comes from storing the data in a database or a search index. In this section we show to get closer to reality and use MongoDB to store the data. MongoDB is an open source document database which is schemaless and stores document in JSON format (BSON actually). Installation and query syntax of MongoDB is beyond the scope of this chapter. MongoDB clusters can be setup with replication and sharding, in this section we use MongoDB on a single host with no sharding or replication. To use MongoDB as a storage backend for the events, we first need to install `mongodb`. On single OSX node this is as simple as `sudo port install mongodb`. For other OS use the appropriate package manager. You can then start mongodb with `sudo mongod --dbpath=/path/to/your/databases`. Create a `fluentd` database and a `fluentd` user with read/write access to it. In the mongo shell do:

    $sudo mongo
    >use fluentd
    >db.AddUser({user:"fluentd", pwd: "foobar", roles: ["readWrite", "dbAdmin"]})

We then need to install the `fluent-plugin-mongodb`. Still using `gem` this will be done like so:

    $sudo gem install fluent-plugin-mongo.

The complete documentation also explains how to modify the configuration of `fluentd` to use this backend. Previously we used `stdout` as the output backend, to use `mongodb` we just need to write a different `` section like so:

	# Single MongoDB
	<match cloudstack.**>
	  type mongo
	  host fluentd
	  port 27017
	  database fluentd
	  collection test

	  # for capped collection
	  capped
	  capped_size 1024m

	  # authentication
	  user fluentd
	  password foobar

	  # flush
	  flush_interval 10s
	</match>

Note that you cannot have multiple `match` section for the same tag pattern.

To view the events/usages in Mongo, simply start a mongo shell with `mongo -u fluentd -p foobar fluentd` and list the collections. You will see the `test` collection:

    $ mongo -u fluentd -p foobar fluentd
    MongoDB shell version: 2.4.7
    connecting to: fluentd
    Server has startup warnings: 
    Fri Nov  1 13:11:44.855 [initandlisten] 
    Fri Nov  1 13:11:44.855 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
    > show collections
    system.indexes
    system.users
    test

Couple MongoDB commands will get your rolling, `db.getCollection`, `count()` and `findOne()`:

    > coll=db.getCollection('test')
    fluentd.test
    > coll.count()
    181
    > coll.findOne()
    {
    	"_id" : ObjectId("5278d9822675c98317000001"),
	    "events_flow" : 0,
	    "time" : ISODate("2013-11-05T11:41:47Z")
    }

The `find()` call returns all entries in the collection.

    > coll.find()
    { "_id" : ObjectId("5278d9822675c98317000001"), "events_flow" : 0, "time" : ISODate("2013-11-05T11:41:47Z") }
    { "_id" : ObjectId("5278d9822675c98317000002"), "vm_sum" : 0, "memory_sum" : 0, "cpu_sum" : 0, "root_volume_sum" : 1500, "data_volume_sum" : 0, "Small Instance" : 1, "time" : ISODate("2013-11-05T11:41:47Z") }
    { "_id" : ObjectId("5278d98d2675c98317000009"), "events_flow" : 0, "time" : ISODate("2013-11-05T11:41:59Z") }
    { "_id" : ObjectId("5278d98d2675c9831700000a"), "vm_sum" : 0, "memory_sum" : 0, "cpu_sum" : 0, "root_volume_sum" : 1500, "data_volume_sum" : 0, "Small Instance" : 1, "time" : ISODate("2013-11-05T11:41:59Z") }
    { "_id" : ObjectId("5278d98d2675c9831700000b"), "id" : "1452c56a-a1e4-43d2-8916-f83a77155a2f", "username" : "admin", "type" : "VM.CREATE", "level" : "INFO", "description" : "Successfully completed starting Vm. Vm Id: 19", "account" : "admin", "domainid" : "a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197", "domain" : "ROOT", "created" : "2013-11-05T12:42:01+0100", "state" : "Completed", "parentid" : "df68486e-c6a8-4007-9996-d5c9a4522649", "time" : ISODate("2013-11-05T11:42:01Z") }
    { "_id" : ObjectId("5278d98d2675c9831700000c"), "id" : "901f9408-ae05-424f-92cd-5693733de7d6", "username" : "admin", "type" : "VM.CREATE", "level" : "INFO", "description" : "starting Vm. Vm Id: 19", "account" : "admin", "domainid" : "a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197", "domain" : "ROOT", "created" : "2013-11-05T12:42:00+0100", "state" : "Scheduled", "parentid" : "df68486e-c6a8-4007-9996-d5c9a4522649", "time" : ISODate("2013-11-05T11:42:00Z") }
    { "_id" : ObjectId("5278d98d2675c9831700000d"), "id" : "df68486e-c6a8-4007-9996-d5c9a4522649", "username" : "admin", "type" : "VM.CREATE", "level" : "INFO", "description" : "Successfully created entity for deploying Vm. Vm Id: 19", "account" : "admin", "domainid" : "a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197", "domain" : "ROOT", "created" : "2013-11-05T12:42:00+0100", "state" : "Created", "time" : ISODate("2013-11-05T11:42:00Z") }
    { "_id" : ObjectId("5278d98d2675c9831700000e"), "id" : "924ba8b9-a9f2-4274-8bbd-c27947d2c246", "username" : "admin", "type" : "VM.CREATE", "level" : "INFO", "description" : "starting Vm. Vm Id: 19", "account" : "admin", "domainid" : "a9e4b8f0-3fd5-11e3-9df7-78ca8b5a2197", "domain" : "ROOT", "created" : "2013-11-05T12:42:00+0100", "state" : "Started", "parentid" : "df68486e-c6a8-4007-9996-d5c9a4522649", "time" : ISODate("2013-11-05T11:42:00Z") }
    { "_id" : ObjectId("5278d98d2675c9831700000f"), "events_flow" : 4, "time" : ISODate("2013-11-05T11:42:02Z") } 
    { "_id" : ObjectId("5278d98d2675c98317000010"), "vm_sum" : 1, "memory_sum" : 536870912, "cpu_sum" : 1, "root_volume_sum" : 1600, "data_volume_sum" : 0, "Small Instance" : 1, "time" : ISODate("2013-11-05T11:42:02Z") }
    Type "it" for more

We leave it to you to learn the MongoDB query syntax and the great aggregation framework, have fun. Of course you can get the data into elasticsearch as well :)

OCCI interface to CloudStack

2013-11-05T01:46:00.000-08:00

CloudStack has its own API. Cloud wrappers like libcloud and jclouds work well with this native API, but CloudStack does not expose any standard API like OCCI and CIMI. We (Isaac Chiang really, I just tested and pointed him in the right direction) started working on a CloudStack backend for rOCCI using our CloudStack ruby gem. The choice of rOCCI was made due to the existence of an existing Opennebula backend and the adoption of OCCI in the European Grid Initiative Federated cloud testbed.

Let's get started with installing the rOCCI server, this work has not yet been merged upstream so you will need to work from Isaac Chiang's fork.

    git clone https://github.com/isaacchiang/rOCCI-server.git
    bundle install
    cd etc/backend
    cp cloudstack/cloudstack.json default.json

Edit the defautl.json file to contain the information about your CloudStack cloud (e.g apikey, secretkey, endpoint). Start the rOCCI server:

    bundle exec passenger start

The server should be running on http://0.0.0.0:3000 and run the tests:

    bundle exec rspec

This was tested with the CloudStack simulator and a basic zone configuration, help us test it in production clouds.

You can also try an OCCI client. Install the rOCCI client from Github:

    git clone https://github.com/gwdg/rOCCI-cli.git

    cd rOCCI-cli
    gem install bundler
    bundle install
    bundle exec rake test
    rake install

You will then be able to use the OCCI client:

    occi --help

Test it against the server that you are started previously. You will need a running CloudStack cloud. Either a production one or a dev instance using DevCloud. The credentials and the endpoint to this cloud will have been entered in `default.json` file that you created in the previous section. Try a couple OCCI client command:

    $ occi --endpoint http://0.0.0.0:3000/ --action list --resource os_tpl

    Os_tpl locations:
     os_tpl#6673855d-ce9b-4997-8613-6830de037a8f

    $ occi --endpoint http://0.0.0.0:3000/ --action list --resource resource_tpl

    Resource_tpl locations:
     resource_tpl##08ba0343-bd39-4bf0-9aab-4953694ae2b4
     resource_tpl##f78769bd-95ea-4139-ad9b-9dfc1c5cb673
     resource_tpl##0fd364a9-7e33-4375-9e10-bb861f7c6ee7

You will recognize the `uuid` from the templates and service offerings that you have created in CloudStack. To start an instance:

    $ occi --endpoint http://0.0.0.0:3000/ --action create --resource compute 
           --mixin os_tpl#6673855d-ce9b-4997-8613-6830de037a8f 
           --mixin resource_tpl#08ba0343-bd39-4bf0-9aab-4953694ae2b4
           --resource-title foobar

A handle on the resource created will be returned. That's it !

We will keep on improving this driver to provide a production quality OCCI interface to users who want to use a standard. In all fairness we will also work on a CIMI implementation. Hopefully some of the clouds in the EGI federated cloud will pick CloudStack and help us improve this OCCI interface. In CloudStack we aim to provide the interfaces that the users want and keep them up to date and of production quality so that users can depend on it.

Why I will go to CCC13 in Amsterdam ?

2013-10-17T03:09:00.001-07:00

Aside from the fact that I work full-time on Apache CloudStack, that I am on the organizing committee and that my boss would kill me if I did not go to the CloudStack Collaboration conference, there are many great reasons why I want to go as an open source enthusiast, here is why:

It's Amsterdam and we are going to have a blast (the city of Amsterdam is even sponsoring the event). The venue -Beurs Van Berlage- is terrific, this is the same venue where the Hadoop summit is held and where the AWS Benelux Summit was couple weeks ago. We are going to have a 24/7 Developer room (thanks to CloudSoft) where we can meet to hack on CloudStack and its ecosystem, three parallel tracks in other rooms and great evening events. The event is made possible by the amazing local support from the team at Schuberg Philis, a company that has devops in its vein and organized DevOps days Amsterdam. I am not being very subtle in acknowledging our sponsors here, but hey, without them this would not be possible.

On the first day (November 20th) is the Hackathon sponsored by exoscale. In parallel to the hackathon, new users of CloudStack will be able to attend a full day bootcamp run by the super competent guys from Shapeblue, they also play guitar and drink beers so make sure to hang out with them :). Even as cool is that the CloudStack community recognizes that building a Cloud takes many components, so we will have a jenkins workshop and an elasticsearch workshop. I am big fan of elasticsearch, not only for keeping your infrastructure logs but also for other types of data. I actually store all CloudStack emails in an elasticsearch cluster. Jenkins of course is at the heart of everyone's continuous integration systems these days. Seeing those two workshops, it will be no surprise to see a DevOps track the next two days.

Kicking off the second day -first day of talks- we will have a keynote by Patrick Debois the jedi master of DevOps. We will then break up into a user track, a developer track, a commercial track and for this day only a devops track with a 'culture' flavor. The hard work will begin: choosing which talk to attend. I am not going to go through every talk, we received a lot of great submissions and choosing was hard. New CloudStack users or people looking into using CloudStack will gain a lot from the case studies being presented in the user track while the developers will get a deep dive into the advanced networking features of CloudStack including SDN support -right off the bat-. In the afternoon, the case studies will continue in the user track including a talk from NTT about how they built an AWS compatible cloud. I will have to head to the developer track for a session on 'interfaces' with a talk on jclouds, a new GCE interface that I worked on and my own talk on Apache libcloud for which I worked a lot on the CloudStack driver. The DevOps track will have an entertaining talk by Michael Ducy from Opscode, some real world experiences by John Turner and Noel King from Paddy Power and the VP of engineering for Citrix CloudPlatform will lead an interactive session on how to best work with the open source community of Apache CloudStack.

After recovering from the nights events, we will head into the second day with another entertaining keynote by John Willis. Here the choice will be hard between the storage session in the commercial track and the 'Future of CloudStack' session in the developer track. With talks from NetApp and SolidFire who have each developed a plugin in CloudStack plus our own Wido Den Hollander (PMC member) who wrote the Ceph integration the storage session will rock, but the 'Future of CloudStack' session will be key for developers, talking about frameworks, integration testing, system VMs...After lunch the user track will feature several intro to networking talks. Networking is the most difficult concept to grasp in clouds (IMHO). The storage session will continue with a talk by Basho on RiakCS (also integrated in CloudStack) and a panel. The dev track will be dedicated to discussions on PaaS, not to be missed if you ask me, as PaaS is the next step in Clouds. To wrap things up, I will have to decide between a session on metering/billing, a discussion on hypervisor choice and support, and a presentation on the CloudStack community in Japan after Ruv Cohen talking about trading cloud commodities.

The agenda is loaded and ready to fire, it will be tough to decide which sessions to attend but you will come out refreshed, energized with lots of new ideas to evolve your IT infrastructure, so one word: Register

And of course many thanks to our sponsors: Citrix, Schuberg Philis, Juniper, Sungard, Shapeblue, NetApp, cloudSoft, Nexenta, iKoula, leaseweb, solidfire, greenqloud, atom86, apalia, elasticsearch, 2source4, iamsterdam, cloudbees and 42on

A look at RIAK-CS from BASHO

2013-10-01T05:48:00.000-07:00

Playing with Basho Riak CS Object Store

CloudStack deals with the compute side of a IaaS, the storage side which for most of us these days consists of a scalable, fault tolerant object store is left to other software. Ceph led by inktank and RiakCS from Basho are the two most talked about object store these days. In this post we look at RiakCS and take it for a quick whirl. CloudStack integrates with RiakCS for secondary storage and together they can offer an EC2 and a true S3 interface, backed by a scalable object store. So here it is.

While RiakCS (Cloud Storage) can be seen as an S3 backend implementation, it is based on Riak. Riak is a highly available distributed nosql database. The use of a consistent hashing algorithm allows riak to re-balance the data when node disappear (e.g fail) and when node appear (e.g increased capacity), it also allows to manage replication of data with an eventual consistency principle typical of large scale distributed storage system which favor availability over consistency.

To get a functioning RiakCS storage we need Riak, RiakCS and Stanchion. Stanchion is an interface that serializes http requests made to RiakCS.

A taste of Riak

To get started, let's play with Riak and build a cluster on our local machine. Basho has some great documentation, the toughest thing will be to install Erlang (and by tough I mean a 2 minutes deal), but again the docs are very helpful and give step by step instructions for almost all OS.

There is no need for me to re-create step by step instructions since the docs are so great, but the gist is that with the quickstart guide we can create a Riak cluster on `localhost`. We are going to start five Riak node (e.g we could start more) and join them into a cluster. This is as simple as:

    bin/riak start
    bin/riak-admin cluster join dev1@127.0.0.1

Where `dev1` was the first riak node started. Creating this cluster will re-balance the ring:

    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid     100.0%     20.3%    'dev1@127.0.0.1'
    valid       0.0%     20.3%    'dev2@127.0.0.1'
    valid       0.0%     20.3%    'dev3@127.0.0.1'
    valid       0.0%     20.3%    'dev4@127.0.0.1'
    valid       0.0%     18.8%    'dev5@127.0.0.1'

The `riak-admin` command is a nice cli to manage the cluster. We can check the membership of the cluster we just created, after some time the ring will have re-balanced to the expected state.

    dev1/bin/riak-admin member-status
    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid      62.5%     20.3%    'dev1@127.0.0.1'
    valid       9.4%     20.3%    'dev2@127.0.0.1'
    valid       9.4%     20.3%    'dev3@127.0.0.1'
    valid       9.4%     20.3%    'dev4@127.0.0.1'
    valid       9.4%     18.8%    'dev5@127.0.0.1'
    -------------------------------------------------------------------------------
    Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
   
    dev1/bin/riak-admin member-status
    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid      20.3%      --      'dev1@127.0.0.1'
    valid      20.3%      --      'dev2@127.0.0.1'
    valid      20.3%      --      'dev3@127.0.0.1'
    valid      20.3%      --      'dev4@127.0.0.1'
    valid      18.8%      --      'dev5@127.0.0.1'
    -------------------------------------------------------------------------------

You can then test your cluster by putting an image as explained in the docs and retrieving it in a browser (e.g an HTTP GET)

    curl -XPUT http://127.0.0.1:10018/riak/images/1.jpg 
         -H "Content-type: image/jpeg" 
         --data-binary @image_name_.jpg

Open the browser to `http://127.0.0.1:10018/riak/images/1.jpg` As easy as 1..2..3

Installing everything on Ubuntu 12.04

To move forward and build a complete S3 compatible object store, let's setup everything on a Ubuntu 12.04 machine. Back to installing `riak`, get the repo keys and setup a `basho.list` repository:

    curl http://apt.basho.com/gpg/basho.apt.key | sudo apt-key add -
    bash -c "echo deb http://apt.basho.com $(lsb_release -sc) main > /etc/apt/sources.list.d/basho.list"
    apt-get update

And grab `riak`, `riak-cs` and `stanchion`. I am not sure why but their great docs make you download the .debs separately and use `dpkg`.

    apt-get install riak riak-cs stanchion

Check that the binaries are in your path with `which riak`, `which riak-cs` and `which stanchion` , you should find everything in `/usr/sbin`. All configuration will be in `/etc/riak`, `/etc/riak-cs` and `/etc/stanchion` inspect especially the `app.config` which we are going to modify before starting everything. Note that all binaries have a nice usage description, it includes a console, a ping method and a restart among others:

    Usage: riak {start | stop| restart | reboot | ping | console | attach | 
                        attach-direct | ertspath | chkconfig | escript | version | 
                        getpid | top [-interval N] [-sort reductions|memory|msg_q] [-lines N] }

Configuration

Before starting anything we are going to configure every component, which means editing the `app.config` files in each respective directory. For `riak-cs` I only made sure to set `{anonymous_user_creation, true}`, I did nothing for configuring `stanchion` as I used the default ports and ran everything on `localhost` without `ssl`. Just make sure that you are not running any other application on port `8080` as `riak-cs` will use this port by default. For configuring `riak` see the documentation, it sets a different backend that what we used in the `tasting` phase :) With all these configuration done you should be able to start all three components:

    riak start
    riak-cs start
    stanchion start

You can `ping` every component and check the console with `riak ping`, `riak-cs ping` and `stanchion ping`, I let you figure out the console access. Create an admin user for `riak-cs`

    curl -H 'Content-Type: application/json' -X POST http://localhost:8080/riak-cs/user \
         --data '{"email":"foobar@example.com", "name":"admin user"}'

If this returns successfully this should be a good indication that your setup is working properly. In the response we recognized API and secret keys

    {"email":"foobar@example.com",
     "display_name":"foobar",
     "name":"admin user",
     "key_id":"KVTTBDQSQ1-DY83YQYID",
     "key_secret":"2mNGCBRoqjab1guiI3rtQmV3j2NNVFyXdUAR3A==",
     "id":"1f8c3a88c1b58d4b4369c1bd155c9cb895589d24a5674be789f02d3b94b22e7c",
     "status":"enabled"}

Let's take those and put them in our `riak-cs` configuration file, there are `admin_key` and `admin_secret` variables to set. Then restart with `riak-cs restart`. Don't forget to also add those in the `stanchion` configuration file `/etc/stanchion/app.config` and restart it `stanchion restart`.

Using our new Cloud Storage with Boto

Since Riak-CS is S3 Compatible clouds storage solution, we should be able to use an S3 client like Python boto to create buckets and store data. Let's try. You will need boto of course, `apt-get install python-boto` and then open an interactive shell `python`. Import the modules and create a connection to `riak-cs`

    >>> from boto.s3.key import Key
    >>> from boto.s3.connection import S3Connection
    >>> from boto.s3.connection import OrdinaryCallingFormat
    >>> apikey='KVTTBDQSQ1-DY83YQYID'
    >>> secretkey='2mNGCBRoqjab1guiI3rtQmV3j2NNVFyXdUAR3A=='
    >>> cf=OrdinaryCallingFormat()
    >>> conn=S3Connection(aws_access_key_id=apikey,aws_secret_access_key=secretkey,
                          is_secure=False,host='localhost',port=8080,calling_format=cf)

Now you can list the bucket, which will be empty at first. Then create a bucket and store content in it with various keys

    >>> conn.get_all_buckets()
    []
    >>> bucket=conn.create_bucket('riakbucket')
    >>> k=Key(bucket)
    >>> k.key='firstkey'
    >>> k.set_contents_from_string('Object from first key')
    >>> k.key='secondkey'
    >>> k.set_contents_from_string('Object from second key')
    >>> b=conn.get_all_buckets()[0]
    >>> k=Key(b)
    >>> k.key='secondkey'
    >>> k.get_contents_as_string()
    'Object from second key'
    >>> k.key='firstkey'
    >>> k.get_contents_as_string()
    'Object from first key'

And that's it, an S3 compatible object store backed by a NOSQL distributed database that uses consistent hashing, all of it in erlang. Automate all of it with Chef recipe. Hook that up to your CloudStack EC2 compatible cloud, use it as secondary storage to hold templates or make it a public facing offering and you have the second leg of the Cloud: storage. Sweet...Next post I will show you how to use it with CloudStack.

CloudStack Google Summer of Code projects

2013-09-03T02:21:00.001-07:00

Google Summer of Code is entering the final stretch with pencil down on Sept 16th and final evaluation on Sept 27th. Of the five projects CloudStack had this summer, one failed at mid-term and one led to committer status couple weeks ago. That's 20% failure and 20% outstanding results, on par with GSoC wide statistics I believe.

The LDAP integration has been the most productive project. Ian Duffy a 20 year old from Dublin did an outstanding job, developing his new feature in a feature branch, building a jenkins pipeline to test everything and submitting a merge request to master couple weeks ago. With 90% unittest coverage, static code analysis with Sonar in his jenkins pipeline and automatic publishing of rpms in a local yum repo, Ian exceed expectation. His code has even been already backported to the 4.1.1 release with the CloudSand distro of CloudStack.

The SDN extension project was about taking the native GRE controller in CloudStack and extend it to support XCP and KVM. Nguyen from Vietnam has done an excellent job quickly adding support for XCP thanks to his expertise with Xen. He is now putting the final touches on KVM support and building L3 services with OpenDaylight. The entire GRE controller was re-factored to be a plugin similar to the Nicira NVP, Midonet and BigSwitch BVS plugin. While native to CloudStack this controller brings another SDN solution to CloudStack. I expect to see his merge request before pencil down for what will be an extremely valuable project.

While the CloudStack UI is great, it was actually written has a demonstration of how the CloudStack API could be used to build a user facing portal. With the "new UI" project, Shiva Teja from India used boostrap and Angular to create a new UI. Originally the project suggested to use backbone but after feedback from the community Shiva switch to using Angular. Shiva's effort are to be commended as he truly worked on his own with in-consistent network connectivity and no local mentoring. Shiva is a bachelor student and had to learn bootstrap, angular and also Flask on his own. It must have been paying off since he is interviewing with Amazon and Goole for internships next summer. His code being independent of the CloudStack code has been committed to master in our tools directory. This creates a solid framework for other to build on and create their own CloudStack UI.

Perhaps the most research oriented project has been the one from Meng Han from Florida. This was no standard coding projects as it required not only to learn new technologies (aside from CloudStack) but also required investigation of the Amazon EMR API. Meng had to implement EMR in CloudStack using Apache Whirr. Whirr is a java library for provisioning of virtual machines on cloud providers. Whirr uses Apache jclouds and can interact with most cloud providers out there. Meng developed a new set of CloudStack APIs to launch hadoop clusters on-demand. At the start she had to learn CloudStack and install it, then learn the Whirr library and subsequently create a new API in CloudStack which would use Whirr to coordinate multiple node deployments. Meng's code is working but still a bit short from our goal of having a AWS EMR interface. This is partly my fault has this project could have required more mentoring. In any case, the work will go on and I expect to see an EMR implementation in CloudStack in the coming months.

All students faced the same challenge, not a code writing challenge but the OSS challenge and specifically learning the Apache Way. Apache is about consensus and public discussions on the mailing list. With several hundreds participants every month and very active discussion, the shear amount of email traffic can be intimidating. Sharing issues and asking for help on public mailing list is still a bit frightening. IRC, intense emailing, JIRA and git are basic tools used in all Apache project, but seldom used in academic settings. Learning these development tools and participating in a project with over a million line of code was the toughest challenge for students and the goal of GSoC. I am glad we got five students to join CloudStack this summer and tackle these challenges, if anything it is a terrific experience that will benefit their own academic endeavor and later their entire career. Great job Ian, Meng, Nguyen, Shiva and Dharmesh, we are not done yet but I wish you all the best.

About those Cloud APIs....

2013-08-27T01:27:00.001-07:00

There has been a lot of discussions lately within the OpenStack community on the need for an AWS API interface to OpenStack nova. I followed the discussion from far via a few tweets, but I am of the opinion that any IaaS solution does need to expose an AWS interface. AWS is the leader in Cloud and has been since 2006 -yes that's seven years- Users are accustomed to it and the AWS API is the de-factor standard.

When Eucalyptus started, it's main goal was to become an AWS clone and in 2012 signed an agreement with Amazon to offer seamless AWS support in Eucalyptus. Opennebula has almost always offered an AWS bridge and CloudStack has too, even though in total disclosure the interface was broken in the Apache CloudStack 4.1 release. Thankfully the AWS interface is now fixed in 4.1.2 release and will also be in the upcoming 4.2 release. To avoid breaking this interface we are developing a jenkins pipeline which will test it using the Eucalyptus testing suite.

Opennebula recently ran a survey to determine where to best put its efforts in API development. The results where clear with 47% of respondents asking for better AWS compatibility. There are of course developing official standards from standard organizations, most notably OCCI from OGF and CIMI from DMTF. The opennebula survey seems to indicate a stronger demand for OCCI than CIMI, but IMHO this is due to historical reasons: Opennebula early efforts in being the first OCCI implementation and Opennebula user base especially within projects like HelixNebula.

CIMI was promising and probably still is but it will most likely face an up-hill battle since RedHat announced it's scaling back on supporting Apache DeltaCloud. I recently heard about a new CIMI implementation project for Stratuslab from some of my friends at Sixsq, it is interesting and fun because written in Clojure and I hope to see it used with Clostack to provide a CIMI interface to CloudStack. We may be couple weeks out :)

While AWS is the de-facto standard, I want to make sure that CloudStack offers choices for its users. If someone wants to use OCCI and CIMI or AWS or the native CloudStack API they should be able to. I will be at the CloudPlugfest Interoperability week in Madrid Sept 18-20 and I hope to demonstrate a brand new OCCI interface to CloudStack using rOCCI and CloudStack ruby gem. A CloudStack contributor from Taiwan has been working on it.

The main issue with all these "standard" interfaces is that they will never give you the complete API of a given IaaS implementation. They by nature provide the lowest common denominator. That roughly means that the user facing APIs could be standardized but the administrator API will always remain hidden and non-standard. In CloudStack for instance, there are over 300 API calls. While we can expose a compatible interface, this will always only cover a subset of the overall API. It also brings the question of all the other AWS services: EMR, Beanstalk, CloudFormation... Standardizing on those will be extremely difficult if not impossible.

So yes, we should expose an AWS compatible interface, but we should also have OCCI, CIMI and of course our native API. Making those bridges is not hard, what's hard is the implementation behind it.

All of this would leave us with Google Compute Engine (GCE), and I should be able to bring back some good news by the end of september, stay tuned !!!

Sebastien Goasguen

Running Kubernetes on a Raspberry PI

Getting etcd to run

Getting the Hyperkube to run on ARM

The Kubelet systemd unit

Now the dirty hack

Putting it all together

Get it

Introducing Kmachine, a Docker machine fork for Kubernetes.

Building an S3 object store with Docker, Cassandra and Kubernetes

Running Cassandra in Kubernetes

Launching Pithos S3 object store

Use an S3 client

Running VMs in Docker Containers via Kubernetes

Privilege gotcha

Let's try it out

Using a Replication Controller to scale the VM

1 command to Kubernetes with Docker compose

Running the CloudStack Simulator in Docker

1 Command to Mesos with Docker Compose

Rancher on RancherOS

RancherOS

Rancher

Notes

O'Reilly Docker cookbook

CloudStack simulator on Docker

On Docker and Kubernetes on CloudStack

On Docker and Kubernetes on CloudStack

The role of the Cloud with Docker

Docker application orchestration, here comes Kubernetes

Kubernetes on CloudStack

Conclusions.

GCE Interface to CloudStack

Gstack, a GCE compatible interface to CloudStack

Installation and Configuration of Gstack

Using gcutil with Gstack

Example with exoscale.

Eutester with CloudStack

Eutester

Migrating from Publican to Sphinx and Read The Docs

Migration from Publican to Sphinx and Read The Docs

Choosing a new format

Publishing Platform

Conversion

Why CloudStack is not a Citrix project

PaaS with CloudStack

A few talks from CCC in Amsterdam

PaaS variations

Support for CloudStack

Clojure with CloudStack

CloStack

Install Leinigen

Download CloStack

Prepare environment variables and make your first clostack call

Start a virtual machine

Use CloStack within your own clojure project

Hello World in clojure

Adding the Clostack dependency

2014 Cloud Predictions

Veewee, Vagrant and CloudStack

Fluentd plugin to CloudStack

OCCI interface to CloudStack

Why I will go to CCC13 in Amsterdam ?

A look at RIAK-CS from BASHO

CloudStack Google Summer of Code projects

About those Cloud APIs....

Getting `etcd` to run

Using `gcutil` with Gstack

Prepare environment variables and make your first `clostack` call

Use `CloStack` within your own clojure project