Friday, November 30, 2012

Testing CloudStack 4.1 with DevCloud

DevCloud is the sandbox for CloudStack, it can be used in different mode:

  1. Full sandbox
  2. Development environment
  3. Cloud testbed
The community is hard at work to provide a working DevCloud and automate the building of your own using veewee, vagrant and puppet. Rohit Yadav Apache Committer recently built a new version that solved a few issues we were having, check out his recent blog.

The full sandbox runs the CloudStack management server and acts as a host, using nested virtualization to start instances within DevCloud. This is great for testing and training.

The development environment is used when developers want to develop CloudStack, modify the source, build locally and deploy on a working setup. In this use case, they push their current development to DevCloud.

In the testbed version, you run the management server locally and use DevCloud as a host and NFS server. Of course multiple variations can be done: adding hosts, different storage backends (possibly) and adding different physical machines.

In the screencast below I demo the testbed setup, using a host-only interface, running the management server on OSX (mac book air) and starting tinylinux instances within DevCloud.

Below is the screencast that will say more than I can write in this blog

Testing CloudStack 4.1 with DevCloud from sebastien goasguen on Vimeo.

If you are interested in the first release of CloudStack: 4.0 Incubating, watch the screencast below, which shows you the testing procedure we followed to vote on the release.

CloudStack 4.0 testing procedure from sebastien goasguen on Vimeo.

Wednesday, November 28, 2012

Translating Apache CloudStack docs with Transifex

One of the big changes leading to the release of Apache CloudStack 4.0 (ACS) has been converting the existing documentation to docbook XML format. The resulting books are built in various output formats using publican. The entire ACS 4.0 documentation is now in docbook XML format.

All the documentation is located in the /docs directory in the ACS 4.0 source release. Folks who are not Apache committers yet, can write documentation in docbook and submit patches. A good way to contribute is to start working on existing docs bugs (I should follow my own advice sometimes :) ).

One of the benefits of moving to docbook XML and publican is the ability to produce documentation in multiple languages. Publican can define Portable Objects (PO) that build a framework for translation of the original documentation resources. To translate them, ACS uses transifex. While it is possible to provide translation from the command line using the transifex client, the easiest way to get started is to go through the on-line interface of transifex. The following screenshots walk you through this process (click on them to enlarge).

First, create an account on transifex and login

You will be presented with your brand new transifex dashboard, search for CloudStack projects

You will see several CloudStack related projects, pick the one that interests you the most. Most likely the core documentation project , but you can also contribute to the runbook or the UI.

Once you are on the project page, you will see the various languages that are being worked on, as well as the percentage of completion of the translations. There is work to do :). By clicking on resources you will access all the resources that are available for translation. If you have checked out the source code, you will recognize the names of the docbook XML files in /docs/en-US.

Pick a resource that you would like to translate, you will be presented with that resource page. A Add New Translation icon is present on the right, click on it. You will then be able to select the language you want to translate to. Proceed by selecting the translate online button.

A form to enter your translation will be displayed. It is broken up in strings that make up that resource. Enter your translation for each string, save and exit.

You will then be returned to the resource home page and you should see that your translation has been added (i.e your language should be present). Once you return to the project Dashboard if the language your translated to was new, you should see it in the dashboard.

Once a significant portion of the translation is completed, one of the committers ( I volunteer David Nalley :) because he has nothing else to do ) will pull your translations using the transifex client and will build the new book with publican. Let's get translating...

Monday, November 26, 2012

How CERNVM uses CloudStack

Friday, while turkeys where being eaten on the other side of the atlantic, I met with Predrag Buncic from the CERNVM project. Predrag and I have known each other since 2009 when I worked on LXCLOUD at CERN. Meeting with Predrag is always extremely informative as he is at the forefront of R&D to support the experiments of the Large Hadron Collider (LHC). This time was no different.

CERNVM inception: CERNVM started in 2007 and entered development in 2008, it has been 4 years in the making and the project is now wrapping up. The main concept was to create a virtual machine appliance that scientists could use on their Desktop. The appliance would have the latest analysis software needed to analyze the data coming out of the LHC. Building appliances for LHC has now become routine, and CERNVM comes in various flavors: VirtuaBox, VMware, Xen, KVM and Hyper-V. What strikes you when you download CERNVM is its small size (x100MB). This is made possible through the use of the CERNVM file system or CVMFS an http based read only file system optimized to deliver software applications to the appliances. CVMFS is now used widely throughout the LHC community. This file system is really a side artifact of the project but a very valuable one. I heard that a micro CERNVM is under development, it would be ~6MB in size and the entire software needed would be streamed via CVMFS.

Contextualization: With a very mature building procedure to build VM appliances and a highly performant file system to help distribute software in-time, what CERNVM was lacking was a way to provision hundreds/thousands of instances on large clusters. Around 2009/2010, the CERNVM team developed a batch edition that could be used for batch processing in clusters used for analysis of LHC data. This appliance was tested successfully on LXCLOUD. The biggest challenge for the appliances was the configuration or what is known in this community as contextualization (a term often attributed to the Nimbus project). Basically it amounts to configuring the instance based on what it is supposed to do. The team developed a very advanced contextualization system. To business data center people, this is for example a way to tell an instance which Puppet profile it is supposed to use, where the Puppet master is and what are the other instances that it should be aware of. In the case of EC2, the way to contextualize an image is to pass variables through the EC2_USER_DATA entry. With Opennebula this is also done using scripts placed in an ISO attached to the instance.

CERNVM Cloud: I had not talked to Predrag about the latest development in a while, and I was impressed by how far they had gone. They totally automatized the contextualization process, creating a web frontend CERNVM On-line that users can use to fill the parameter of the contextualization and creating profiles for instances, including specifying service offerings that tie to CloudStack service offerings. The kicker is that they tied it to a Gateway, the CERNVM Gateway (they got their branding right ! ), that allows users to start instances and define clusters of instances using specific profiles. While enterprise people think of VM profiles as database or webserver profiles, here the profiles tend to be Condor clusters for batch processing, MPI clusters for parallel computing, and PROOF clusters for LHC analysis. The combination makes up their Cloud. What I really like is that they moved up the stack, from building VM images to providing an end to end service for users. A one-click stop shop to write definitions of infrastructure and instantiate it. I think of it as a marketplace and a cloud management system all at once.

Internals: What technologists will love is how they combined a IaaS to their Gateway. They built on XMPP, the former chat and now more general messaging protocol. Predrag and I had talked about XMPP some time back and a former student had developed an entire framework (Kestrel) for batch processing in virtual machines. Couple of the many interesting aspects of XMPP, is its scalability, the ability to federate servers and the ability to communicate with agents that are behind NAT. Of course we can argue about XMPP vs AMQP, but this would have to be another post. What CERNVM ended up doing is creating a complete XMPP based framework to do Cloud Federation and communicate with Cloud IaaS APIs. An XMPP agent sits within the premise of a Cloud Provider and is responsible to start the instances using the Cloud platform API. The instances then contextualize themselves using a pairing mechanism that ties them back to the CERVNM Cloud. Their Cloud can be made of Public clouds, private clouds and even Desktops. Brilliant !

And CloudStack in all of this ?:

What of course made my day is that the Cloud platform they used to test their end-to-end system was Apache CloudStack 4.0. In the two screenshots, you see that they defined a CERNVM zone within CloudStack. Being a test system, the scale is still small with 48 cores, running a mix of CentOS 6.3 and Scientific Linux (CERN version) 6.3, they do have plans to scale up and considering their expertise and CloudStack proven scalability this should not be a problem. Their setup, is a basic Zone, with a straightforward NFS storage backend. What Ioannis Charalampidis (one of the developers that I met) told me, is that CloudStack was easy to install, almost a drop in action. He made it sound like a non-event. Ioannis did not mention any bugs or installation issues instead he asked for better EC2 support and the ability to define IP/MAC pairs for guest networks. A request I knew about from my LXCLOUD days. This is mostly a security feature to provide network traceability. I proceeded to point him to the Apache CloudStack JIRA instance and showed him how to submit a feature request. I look forward to see his feature requests in the coming days.

Final thoughts:I came back from meeting with the CERNVM team thinking it was worth skipping the Turkey. It gave me a few ideas and showed me again the power of CloudStack and Open Source Software:

  • Installing a Private of Public Cloud should be a non-event. Ioannis feedback really showed me that this was the case with CloudStack and that the hard work put forth by the community really paid off. From the source build with maven to the hosting of convenience binaries, installing CloudStack is a straightforward process (at least in a basic Zone configuration).
  • An ecosystem is developing around CloudStack, we already knew this from the various partners that are participating in the Apache community and contributing, but we are also starting to see end-to-end scenarios like the CERNVM Cloud. I encouraged them to open source their XMPP framework and enhance it to generalize the profile definition. They could easily make it a marketplace for Puppet or Chef recipes and use non-CERNVM images that would pair with these recipes for automatic configuration, that would be even more exciting.
  • One other aspect that also struck me, is that they developed a very enterprise looking system. Leveraging the Bootstrap javascript framework and Django, their UI is extremely intuitive, interactive and pleasing. Something rather rare in R&D projects focused on batch processing.
  • Finally, I proposed to identify CloudStack based providers that would be willing to install their Gateway agent and share their resources with them. Participating in a Cloud Federation to find the God particle is a worth while endeavor !

Thursday, November 15, 2012

Why I love CloudStack Devcloud ?

Apache CloudStack 4.0 incubating has been released couple weeks ago now. The testing procedure used by the community to vote on the release candidate included using the new CloudStack sandbox DevCloud.

It's not too late to go through the testing procedure, just follow the steps defined on the wiki page If you want a shortcut, just watch my screencast and enjoy the french accent.

You will see that one of the first things to do is to install DevCloud: A virtualbox appliance based on Ubuntu 12.04 and running a Xen Kernel. The CloudStack management server is pre-installed with a basic toy data center being configured. Thanks to nested virtualization this allows users to start virtual machine instances within the devcloud sandbox.

The key ingredient is nested virtualization. It is really nice for testing things but most likely less so if your are concerned with performance, even though I have not seen benchmarks on nested virtualization.

DevCloud was not created solely for the release candidate testing procedure. It was developed by Citrix developer Edison Su to act as a development environment, giving developers the ability to deploy their own cloud testbed on their own machine. Of course, this does not allow for testing of all CloudStack features especially advanced networking features like VPC and other SDN solutions, but it enables anyone to do some quick smoke tests...and learn CloudStack.

The most compelling fact in favor of DevCloud is that it lowers the barrier of entry for folks to give CloudStack a go: access a working GUI, start instances, take snapshots, access the systemVMs, play with a XCP host, and learn the API. I used it in a tutorial at Linuxcon EU in Barcelona last week. I handed out several USB sticks to attendees, they loaded the appliance in VirtualBox and where on the go with cloudStack. The FOSS event coming up in India at the end of the month will also feature a CloudStack tutorial using DevCloud.

I love DevCloud because it's a great teaching and training tool. It helps you to discover CloudStack with no investment. You can get going on your laptop. Hey, I run it on my mb Air and it runs super fast. It can also expand into a research tool if you want to get adventurous with networking. A few of the folks in the CloudStack community are now using it in a host-only mode running the CloudStack management server on the localhost, together with the mysql database needed, and using devcloud as a host on which to run the VMs. It leverages the host-only interface of VirtualBox. This means that additional Vbox instances will be able to communicate with DevCloud.

This also means that something like the Virtual Distributed Ethernet VDE switch could be used as well. This would open the door to make use of the VLAN features in CloudStack and link to other hosts.

VirtualBox is great, but a DevCloud image for other hypervisors would be nice as well. Assuming they support nested virtualization. The ACIS research group at the University of FLorida is working on creating a KVM appliance for DevCloud this would open yet more doors...

To wrap up, DevCloud is a terrific new tool for CloudStack, in my view it has three basic modes:

  1. Fully contained sandbox, to give CloudStack a try
  2. Dev environment, compiling the CloudStack source outside DevCloud and pushing the build inside of it for testing.
  3. A Dev/Research environment where you can link DevCloud to other resources in your computing infrastructure via advanced networking features.

DevCloud = Awesome !

Wednesday, November 07, 2012

High Performance Computing and CloudStack

I was asked the other day what was the connection between High Performance Computing (HPC) and Clouds, so I thought I would try to post an answer here. Let's first talk a little bit about HPC.

High Performance Computing is about finding every single flops and every single iops on the largest machine you can get your hands on, in order to run your code as fast as possible. It is about batch processing on as many cores as you can get, so you can solve the largest problem you are facing. For a while, supercomputers, were large shared memory machines but in the late nineties distributed memory systems appeared, they were cheaper and you could assemble lots of nodes to get hundreds of cpus. Today the Top500 supercomputers are ranked every 6 months, this ranking is the theater of great technological battle between countries, vendors, research labs and programmers. In the latest ranking, Sequoia the supercomputer from Lawrence Livermore National lab topped the ranking at 16.32 PetaFlop/s and 1,572,864 cores. Weather modeling, atomic weapons simulation, molecular dynamics, genomics and high energy physics are among those that benefit from HPC.

There is big difference however within HPC itself. It is the difference between applications that rely heavily on inter-process communication and need a low latency network for message passing, and applications where each process runs an independent task, the so-called embarrassingly parallel applications. (e.g Map-reduce is an example of how to express an embarrassingly parallel problem ). High Throughput Computing (HTC) defines the type of application where access to a large number of cores over a specific amount of time is needed. Protein folding popularized by the Folding@home project running on PS3 as well as desktops is a good example. Financial simulation such as stock price forecasting and portfolio analysis also tend to fall in that category due to their statistical nature. Graphics rendering for animated movies also falls under HTC. HTC cares less about performance -as measured by FLOPS- and more about productivity -as in processing lots of jobs-.

The HPC quest for performance seems totally antagonist with the IaaS layer of clouds, at least when one thinks of true HPC workload that consumes every flop. Virtualization, the key enabler of IaaS, introduces overhead, both in cpus and network latency, and thus has been deemed "evil" for true HPC. Despite directed I/O, pass thrus, VM pinning and other tuning possibilities to reduce the overhead of virtualization, you might think that this would be it, no connection between HPC and Clouds. However according to a recent academic study of hypervisor performance from a group at Indiana University, this may not be entirely true and it would also be forgetting the users and their specific workloads.

In november 2010 a new player in the Top 500 arrived: Amazon EC2. Amazon submitted a benchmark result which placed an EC2 cluster 233rd on the top 500 list. By June 2011, this cluster was down to rank 451. Yet it proved a point: that a Cloud based cluster could do High Performance computing, raking up 82.5 TFlops peak using VM instances and 10GigE network. In november 2011, Amazon followed with a new EC2 cluster ranked 42nd with 17,023 cores and 354 TFlops peak. This cluster is made of "Cluster Compute Eight Extra Large" instances with 16 cores, 60 GB of RAM and 10 GigE interconnect and now ranked 72nd. For $1000 per hour this allows users to get an on-demand HPC cluster that itself ranks in the top500. This is done on-demand and provides users with their personal cluster.

A leading HTC software company CycleComputing also demonstrated last April, the ability to provision a 50,000 cores cluster on all availability zones of AWS EC2. In such a setup, the user is operating in a HTC mode with little or no need for low latency networks and interprocess communication. Cloud resources seem to be able to fulfill both the traditional HPC need and the HTC need.

The high-end HPC users are few, they are the ones fighting for every bit of performance, but you also have the more "mundane" HPC user, the one who has a code that runs in parallel but who only needs a couple hundred cores, and the one who can tolerate a ~10% performance hit, especially if that means that he can run on hundreds of different machines across the world and thus reach a new scale in the number of tasks he can tackle. This normal HPC user tend to have an application that is embarrassingly parallel, expressed in a master worker paradigm where all the workers are independent. The workload may not be extremely optimized, it may wait for I/O quite a bit, it may be written in scripting languages, it is not a traditional HPC workload but it needs an HPC resource. This user wants on-demand/elastic HPC if his application and pricing permits and he needs to run his own operating system not the one imposed by the supercomputer operator. HPC as a Service if you wish. AWS has recognized this need and offered a service. The ease of use balances out the performance hit. For those users, the Cloud can help a lot.

What you do need, is a well optimized hypervisor, potentially operated without multi-tenancy for dedicated access to network cards or GPUs, or a quick way to re-image part of a larger cluster with bare-metal provisioning. You also need a data center orchestrator that can scale to tens of thousands of hosts and manage part of the data centers in a hybrid fashion. All features are present in CloudStack, which leads me to believe that it's only a matter of time before we see our first CloudStack based virtual cluster in the Top500 list. This would be an exciting time for Apache CloudStack.

If you are already using CloudStack for HPC use cases, I would love to hear about it.

CloudStack vs. Opennebula

Prior to joining Citrix's community team for Apache CloudStack, I had worked with OpenNebula (ONE) since 2009. Like CloudStack, ONE is an Apache-licensed IaaS solution, though they take different approaches.

To compare two systems like these, the best way would be to go through a deployment and then an evaluation period. We could also do a one by one comparison of features. (See the feature pages for CloudStack and OpenNebula, respectively.) You might also want to compare the CloudStack API and OpenNebula API.

Both systems have notable commonalities: A rich GUI, support for multiple hypervisors, and a philosophy to be hypervisor agnostic, an EC2 compatible interface as well as a native API, support for multiple zones and virtual data centers with relatively fine grained access control lists. CloudStack and OpenNebula also offer an image/template repository, a storage backend supporting NFS, GlusterFS, iSCSI, and LVM. Finally, both systems have a vibrant community contributing to the projects.

From a software perspective the big difference is that CloudStack is written in Java, while ONE has a C++ core with significant Ruby scripting as well as Bash script used for drivers. It does feel like CloudStack is more on the Dev side of DevOps while ONE is more on the Ops side, but this is very much a personal opinion.

ONE does boast a few interesting characteristics that I think CloudStack could benefit from: Support for Hybrid cloud (i.e. the ability to add an EC2 like site as a cloud bursting capability), a virtual appliance marketplace (i.e the ability for the community to share images between sites), as well as tools to test-drive the software without having to do a full install.

Some of this is underway already. The marketplace concept is being discussed on the CloudStack mailing list , while DevCloud was recently announced and is the perfect Sandbox to give CloudStack a try. What could be interesting is to set up a CloudStack Public cloud, for users to access and test the GUI and API.

What CloudStack brings, however, is terrific scalability which outperforms ONE at the moment. CloudStack also has the ability to do bare metal provisioning in addition to "traditional" virtual machine provisioning, and amazing network management with the ability to configure hardware networking devices like the Juniper SRX and Citrix Netscaler as well as new features like Nicira integration.

Though I've found some differences between CloudStack and OpenNebula, I think they're both great projects.