Saturday, February 23, 2013

Translating Apache CloudStack

We are coming down the stretch to submit translations for the Apache CloudStack Documentation. The 4.1 release should be cut on March 22nd, we need the translations in before that. It is a huge task, we need your help, here is how:

Tuesday, February 12, 2013

SDN in CloudStack

Software Defined Networking (SDN) has seen a lot of uptick in momentum since VMware acquired Nicira last summer, three months after Google announced that they were using OpenFlow to optimize their internal backbone. In this post we look at "SDN" support in CloudStack.

First, let's try to define SDN in a short paragraph. It is, was it spells out to be (like the french adage: "c'est comme le Port-Salut c'est écrit dessus" :)) a way to configure the network using software. This means that the network definition (routing, switching), optimization (load-balancing, firewall, etc) becomes a software problem with SDN. If you have seen the wikipedia page and read other articles, you most likely have read that SDN decouples the control plan and the data plane. This is short for saying that the forwarding tables used in switches/routers will be controlled by a software/applications that can be remote. Part of the SDN landscape is OpenFlow. OpenFlow is a standard defined by the ONF that allows you to implement a SDN solution. It defines the protocol used by the control plane to send forwarding information (a.k.a flow rules) to network devices. However while some early SDN companies embraced and led the development of openflow (e.g BigSwitch), an SDN solution may not use the OpenFlow protocol. This leads to a key point of todays SDN solutions: SDN != OpenFlow.

The academic in me can't help but point out the GENI initiative, which aims to re-design the internet and start from a clean slate. SDN research happens on GENI but it is also seen as a way to instrument the network, isolate experiments and dynamically reconfigure the network. Something impossible (especially over wide are networks) before SDN. With FutureGrid and Grid5000 being testbeds for IaaS solutions, GENI is a real-life testbed for future networking solutions. I had a chance to work a little bit on GENI while at Clemson. We modified the NOX OpenFlow controller to integrate it with OpenNebula and provide Security Groups as well as Elastic IP functionality.

While virtualization was a key enabler of IaaS, virtual switches are key enablers of SDN based networks. Open Virtual Switch (OVS) is the leading virtual switch. OVS is now used in most private and public clouds, it replaces the standard linux bridge and is used to connect virtual machine network interfaces to the physical network. An OVS can be connected with an OpenFlow controller and receive flow rules from the controller. However it does not need to, an SDN solution could talk directly to OVS. The main issue being that a single openflow controller may not be fast enough to process all "control" decisions on a large networks. We would have to see a distributed OpenFlow controller to be able to reach extremely large scale (It might exist, I just have not found references for it). OVS can do many things among them: VLAN tagging, QoS, Generic Routing Encapsulation(GRE) and Stateless Transport Tunneling (STT) tunnels. To know more about the difference between GRE and STT see this blog by Bruce Davie and Andrew Lambeth.

So where does SDN help IaaS ? Anyone who has worked on networking of virtual machines (VM) knows how complex this can get. VMs from multiple tenants need to be isolated from each other, they have private IP addresses but may need to be accessed form the public internet, VMs can be migrated within a single broadcast domain but one may want to migrate across domains. VMs from multiple data centers may need to be in the same subnet and broadcast domain (Layer2) etc. Networking is really complex issue in IaaS. Even more so that it is hard to understand conceptually. One really needs to think in terms of logical networks and forget about the physical network (at least at a high level). To enable all these things and reach large scale we need to be able to control the network devices from the application layer. This is where a significant shift is happening. The application developers are now going to describe the network they need and provision it on-demand. Yet again SDN is Cloud. On-demand and Elasticity in the network thanks to SDN.

So where does Apache CloudStack stands with SDN ? One of the main design decisions in CloudStack was to provide multi-tenancy and isolate guest networks. Pre-SDN, the way to achieve this was to use a VLAN per guest network. Creating an isolated Layer-2 broadcast domain for each tenant (and even multiple VLAN per-tenants if need be). Advanced networking in CloudStack was all about VLANs. VLAN ids however are 12 bit, that means that the grand maximum of VLANs is 4096. While it can seem big, you could very well run out of VLANs quickly. Here comes SDN. SDN allows you to build a new type of isolation for your tenants. The main tenet (pun intended) is to build a mesh of tunnels between all the virtual switches residing on all the hosts/hypervisors in your data center. OVS can do that. Creating those meshes you can create network overlays to build Layer 2 broadcast domains within zones, across zones and over WAN while ensuring isolation of tenants.

The previous snapshot on the left shows the CloudStack GUI when you are creating an advanced zone. You need to specify the type of isolation. Traditional isolation would be using VLANs, but you see two other types of isolations: GRE and STT. These are protocols used to create tunnels between OVS bridges (logical switches). The GRE isolation type will be used with what I call the "native SDN solution" in CloudStack. It is a SDN controller built-in the CloudStack code that creates GRE tunnels using OVS (Currently only supported with Xen, but KVM support should be in 4.1 if not 4.2 this summer). The wiki has an extensive functional specification titled OVS tunnel manager. The slides below are also a great presentation of this solution:

Choosing the STT isolation type will you guessed it use STT tunnels between all the virtual switches. This is currently only being used by the Nicira NVP plugin described in our documentation. Hugo Trippaers (@Spark404) from Schuberg Phillis is the author of the plugin, he recently presented about the integration at a Build a Cloud Day workshop and talked about the upcoming features in the CloudStack 4.1 release (KVM support and Layer 3 routing). See his slides below:

We are seeing two more SDN "solutions" being integrated in CloudStack. First is Big Virtual Switch from BigSwitch. Development is happening right now, and the commits made the 4.1 code freeze. So expect to see it in the 4.1 release at the end of March. Expect to see open source OpenFlow controllers being used with CloudStack this summer. The last one is Midonet from Midokura. While documentation has been posted on slideshare (see below and skip the first page if you don't read japanese), the commits have not yet been made. So look at 4.2 release for Midonet support in CloudStack.

This is only the beginning. While these solutions are used to provide multi-tenant isolation, we can bet that SDN will be used to provide load-balancing, elastic IPs, security groups, migration support, dynamic leasing and optimization of network. SDN brings network intelligence to your IaaS.

Friday, February 08, 2013

Build A Cloud Day, Ghent Feb1st Summary

Last friday we had a full day workshop (Build A Cloud Day) in Ghent, Belgium. It was co-located with Puppet Camp. All the logistics was planned by our friends from Inuits led by Kris Buytaert (@KrisBuytaert). The BACD had approximately 50 people in attendance throughout the day, with things winding down by 4pm, when people started heading to Brussels for Europe's biggest Open Source event: FOSDEM. We had an exciting day with terrific speakers that showed the complete range of the CloudStack ecosystem. Here are all the slides, enjoy.

I started the day with an introduction talk about CloudStack, bringing some high level vision about Clouds and how CloudStack fits in. I also presented the "Apache Way", what you can expect in terms of releases, and I also highlighted the main components of CloudStack. I introduced the rest of the agenda with a theme of covering the entire "stack" of Cloud computing and seeing how CloudStack is the core backend of it.

Hugo Trippaers (@Spark404) from Schuberg Phillis then presented the integration of the Nicira Private Gateway with CloudStack. Which brings an SDN solution (STT tunnels meshes for isolation of tenants) to ACS and complements the native SDN controller that can build meshes of GRE tunnels. Hugo is a committer and PMC member of CloudStack and the lead engineer for operation of the CloudStack private cloud at Schuberg Phillis.

Once we learned about SDN and advanced networking for multi-tenant isolation, we heard from Wido Den Hollander from PCExtreme. Wido is a committer and PMC member of CloudStack as well, and he is most notably the man behind the CloudStack/Ceph integration. Ceph has received a lot of attention for the last couple years as it offers an alternative to traditional parallel distributed file systems and builds a highly scalable object store and efficient storage for virtual machines with the so-called Rados Block Device (RBD). Wido had few slides but filled many questions on the board.

Three talks and the excitement rose learning about the latest features in networking and storage for CloudStack. A short lunch and we came back in to talk about the API exposed by CloudStack. When discussing APIs, everyone mentions standards. CloudStack API is not a standard and I doubt it will ever be (for good reasons). But what we have seen in the field is a de-facto standard in AWS APIs (which is supported in CloudStack to some extent) and some emerging standards from OGF and DMTF. While limited in scope they do provide some assurance against vendor lock-in. The talk by RedHat Oved Ourfali (@ovedou) was about CIMI, the DMTF standard. We were particularly interested to hear about CIMI to see how we could integrate it in CloudStack and potentially write a deltacloud driver.

Having learned about the backend networking and storage solutions, plus the API used to manage and access your cloud, it was now time to hear about an exciting use case: Spotify the on-line music service. Noa Resare (@blippie) has been active on the CloudStack mailing list and is now helping out with packaging. He presented us Spoticloud a private cloud built for their engineers/developers to allow them to "be developers". He gave us some great feedback with pointy details like adding a pod with the wrong id, and asked for features that we are working on, like removing the secondary storage VM. Also exciting was that Spotify is hiring a cloudstack engineer:

Next up was Brian Amedro (@brianamedro) from Activeeon, I really wanted Brian to talk because he represents a SaaS application making use of CloudStack. His company Activeeon has a very interesting application: ProActive, that offers a powerful IDE, a workflow engine and a resource manager to automate the parallelization of compute intensive tasks. It is used in the automotive industry, pharmacy, finance and other fields that have a need for long running computing analysis. Where CloudStack comes in is at the resource management layer, Brian integrated ProActive with CloudStack using the API, it allows ProActive to dynamically provision machines on cloud providers and run the workflows. Check the video as well.

To wrap up our day we had Charles Moulliard (@cmoulliard), (RedHat), an Apache committer on several projects and very active on the CloudStack mailing list. Charles is not shy to bring up issues with the API, DevCloud or thrid party clients like jclouds. Charles introduced us to Karaf and Fuse Fabric. By deploying Karaf on multiple cloud providers you can create a coordinated PaaS. All the providers coordinate with a Zookeeper instance, creating a "Fuse Fabric" that eases deployment of software on all cloud nodes. It gave me an idea to use FuseFabric to deploy a hadoop cluster in the cloud...

And that was it for BACD Ghent. If you were there you learned a ton :), if you missed it...well you can always come to the next one, but it may not come with Belgium beer :)

Monday, February 04, 2013

What a month !

What a month January 2013 was for Apache CloudStack mailing lists. I have barely recovered from what was our most intense month since CloudStack was donated to the Apache Software Foundation. As some of you know I have been keeping a close eye on our mailing lists, trying to extract valuable information for our community. This was triggered by Qingye Jiang study about CloudStack, OpenStack, Eucalyptus and Opennebula. Since all decisions happen on the mailing list at the ASF, it seemed like a very reasonable method to check the health of our community

I posted some early results in a prior post. With the whirlwind of emails in January I wanted to get a closer look at this past month alone.

We had a total of 5144 emails on the developers list and a total of 775 emails on the users list. 2036 of the 5144 where from JIRA.

First 5144 emails in a month is quite big to put it mildly. Second 2036 from JIRA is a lot. JIRA generates emails automatically, but what we have seen this month is an increase in conversation between developers happening within tickets. This is actually a mode of operation that is very familiar to the HDFS community for instance. I suspect that as we go through graduation, we may find that working tickets directly in JIRA will be very valuable and that JIRA emails will represent valid communication emails.

The developers list saw 175 unique contributors and the users list saw 145 unique contributors. I still need to clean up a few duplicates, but I don't expect it to represent more than 5% of the total number of contributors. That seems to me like a pretty robust community for a single month. If we compare both set of contributors we see that 76 are common to both list. This means that 48% of the contributors on the users list are not participating in the developers list, they are "true" users.

While affiliation of individual contributors is not recognized within the ASF, I can't help but mention that 57 domains were represented in the developers list versus 63 domains in the users lists. It would not be a stretch to associate an email domain with a company.

The graph below shows our daily number of contributors. First thing to notice is that thankfully we take a break on week-ends. The dips clearly show a lower number of contributors on saturday and sunday. The developers list approximately peaks at 60 contributors a day on Tuesday or Wednesday. The users list peaks at roughly 20 contributors a day mid-week.

In my previous post I also started a social network analysis. I did it again for this month alone. Below is the social graph of our developers list. To obtain this graph I group emails by thread and create a connection between two contributors if they exchange emails within a thread. Every time the same two contributors exchange an email within a thread, the "strength" of their connection increases. This is represented by the thickness of the line between all nodes. Also the size of the node/contributor matters. The more central a node is within our community, the bigger the node. Centrality is defined as being the shortest path between any two nodes in the network. The graph clearly shows that Chip Childers had the biggest centrality in our developers list in January. That means that for any two contributors to talk, the shortest path was through Chip (Sorry Chip, looks like your inbox is going to get even bigger). Other notables is Chiradeep Vittal, Alex Huang, David Nalley and Animesh Chaturvedi. Nodes that had a small number of connections were filtered out (but their contribution is very much appreciated. I only applied the filter to get a clearer picture). My boss would like to see a dynamic version of this graph, were we can see the evolution of the social network over time...:) Will see, I am just not sure how to automate that...

And below is the social graph of our users list. We see that Geoff Higginbottom from Shapeblue is the most influential node on the users list. Closely followed by Ahmad Emneina and Pranav Saxena from Citrix. A new comer is also showing up this month, Geoff's protege, Paul Angus from Shapeblue. It is great to see the leading CloudStack integrator taking leadership in helping users of Apache CloudStack. We also see a new user coming on board strongly: Nux also know as Lucian that I had the pleasure to meet in London. Interestingly we see that the thickness of the connections between the nodes is less than on the developers list and the network is larger. This just indicates that the most influential nodes answer a lot of questions from many different people but that they don't tend to exchange within a large number of threads. Also note that the traffic on the users list is much less than on the dev list.

And that's it for January, our highest traffic so far on the mailing lists. We see some strong and increasing participation every day, a clear technical leadership has emerged and some dedicated folks are helping out users.