Wednesday, March 19, 2014

Migrating from Publican to Sphinx and Read The Docs

Migration from Publican to Sphinx and Read The Docs

When we started with Cloudstack we chose to use publican for our documentation. I don't actually know why, except that Red Hat documentation is entirely based on publican. Perhaps David Nalley's background with Fedora influenced us :) In any case publican is a very nice documentation building system, it is based on the docbook format and has great support for localization. However it can become difficult to read and organize lots of content, and builds may break for strange reasons. We also noticed that we were not getting many contributors to the documentation, in contrast, the translation efforts via transifex has had over 80 contributors. As more features got added to CloudStack the quality of the content also started to suffer and we also faced issues with publishing the translated documents. We needed to do something, mainly making it easier to contribute to our documentation. Enters ReStructured Text (RST) and Read The Docs (RTD).

Choosing a new format

We started thinking about how to make our documentation easier to contribute to. Looking at Docbook, purely xml based, it is a powerful format but not very developer friendly. A lot of us are happy with basic text editor, with some old farts like me mainly stuck with vi. Markdown has certainly helped a lot of folks in writing documentation and READMEs, just look at Github projects. I started writing in Markdown and my production in terms of documentation and tutorials skyrocketed, it is just a great way to write docs. Restructured Text is another alternative, not really markdown, but pretty close. I got familiar with RST in the Apache libcloud project and fell in love with it, or at least liked it way more than docbook. RST is basically text, only a few markups to learn and your off.

Publishing Platform

A new format is one thing but you then need to build documentation in multiple formats: html, pdf, epub potentially more. How do you move from .rst to these formats for your projects ? Comes in Sphinx, pretty much an equivalent to publican originally aimed at Python documentation but now aimed at much more. Installing sphinx is easy, for instance on Debian/Ubuntu:
apt-get install python-sphinx
You will then have the sphinx-quickstart command in your path, use it to create your sphinx project, add content in index.rst and build the docs with make html. Below is a basic example for a ff sample project.

What really got me sold on reStructuredText and Sphinx was ReadTheDocs (RTD). It hosts documentation for open source projects. It automatically pulls your documentation from your revision control system and builds the docs. The killer feature for me was the integration with github (not just git). Using hooks, RTD can trigger builds on every commit and it also displays an edit on github icon on each documentation page. Click on this icon, and the docs repository will get forked automatically on your github account. This means that people can edit the docs straight up in the github UI and submit pull requests as they read the docs and find issues.


After [PROPOSAL] and [DISCUSS] threads on the cloudstack mailing list, we reached consensus and decided to make the move. This is still on-going but we are getting close to going live with our new docs in RST and hosted by RTD. There were couple challenges:
  1. Converting the existing docbook based documentation to RST
  2. Setting up new repos, CNAMEs and Read The Docs projects
  3. Setting up the localization with transifex
The conversion was much easier than expected thanks to pandoc, one of those great command line utility that saves your life.
pandoc -f docbook -t rst -o test.rst test.docbook
You get the just of it, iterate through your docbook files and generate the RST files, combine everything to reconstruct your chapters and books and re-organize as you wish. They are off course couple gotchas, namely the table formatting may not be perfect, the note and warnings may be a bit out of whack and the heading levels should probably be checked. All of these are actually good to check as a first pass through the docs to revamp the content and the way it is organized.
One thing that we decided to do before talking about changing the format was to move our docs to a separate repository. What we wanted to do was to be able to release docs on a different time frame than the code release, as well as make any doc bug fixes go live as fast as possible and not wait for a code release (that's a long discussion...). With a documentation specific repo in place, we used Sphinx to create the proper directory structure and add the converted RST files. Then we created a project on Read The Docs and pointed to the github mirror of our Apache git repo. Pointing to the github mirror allowed us to enable the nice github interaction that RTD provides. The new doc site looks like this.

There is a bit more to it, as we actually created several repositories and used a RTD feature called subprojects to make all the docs live under the same CNAME This is still work in progress but in track for the 4.3 code release. I hope to be able to announce the new documentation sites shortly after 4.3 is announced.
The final hurdle is the localization support. Sphinx provides utilities to generate POT files. They can then be uploaded to transifex and translation strings can be pulled to construct the translated docs. The big challenge that we are facing is to not loose the existing translation that were done from the docbook files. Strings may have changed. We are still investigating how to not loose all that work and get back on our feet to serve the translated docs. The Japanese translators have started to look at this.
Overall the migration was easy, ReStructuredText is easy, Sphinx is also straigthfoward and Read The Docs provides a great hosting platform well integrated with Github. Once we go live, we will see if our doc contributors increase significantly, we have already seen a few pull requests come in, which is very encouraging.
I will be talking about all of this at the Write The Docs conference in Budapest on March 31st, april 1st. If you are in the area stop by :)

Tuesday, March 04, 2014

Why CloudStack is not a Citrix project

I was at CloudExpo Europe in London last week for the Open Cloud Forum to give a tutorial on CloudStack tools. A decent crowd showed up, all carrying phones. Kind of problematic for a tutorial where I wanted the audience to install python packages and actually work :) Luckily I made it self-paced so you can follow at home. Giles from Shapeblue was there too and he was part of a panel on Open Cloud. He was told once again "But Apache CloudStack is a Citrix project !" This in itself is a paradox and as @jzb told me on twitter yesterday "Citrix donated CloudStack to Apache, the end". Apache projects do not have any company affiliation.

I don't blame folks, with all the vendors seemingly supporting OpenStack, it does seem that CloudStack is a one supporter project. The commit stats are also pretty clear with 39% of commits coming from Citrix. This number is also probably higher since those stats are reporting gmail and apache as domain contributing 20 and 15% respectively, let's say 60% is from Citrix. But nonetheless, this is ignoring and mis-understanding what Apache is and looking at the glass half empty.

When Citrix donated CloudStack to the Apache Software Foundation (ASF) it relinquished control of the software and the brand. This actually put Citrix in a bind, not being able to easily promote the CloudStack project. Indeed, CloudStack is now a trademark of the ASF and Citrix had to rename their own product CloudPlatform (powered by Apache CloudStack). Citrix cannot promote CloudStack directly, it needs to get approval to donate sponsoring and follow the ASF trademark guidelines. Every committer and especially PMC members of Apache CloudStack are now supposed to work and protect the CloudStack brand as part of the ASF and make sure that any confusion is cleared. This is what I am doing here.

Of course when the software was donated, an initial set of committers was defined, all from Citrix and mostly from the former startup. Part of the incubating process at the ASF is to make sure that we can add committers from other organization and attract a community. "Community over Code" is the bread and butter of ASF and so this is what we have all been working on, expanding the community outside Citrix, welcoming anyone who thinks CloudStack is interesting enough to contribute a little bit of time and effort. Looking at the glass half empty is saying that CloudStack is a Citrix project "Hey look 60% of their commits is from Citrix", looking at it half full like I do is saying "Oh wow, in a year since graduation, they have diversified the committer based, 40% are not from Citrix". Is 40% enough ? of course not, I wish it were the other way around, I wish Citrix were only a minority in the development of CloudStack.

Couple other numbers: Out of the 26 members of the project management committee (PMC) only seven are from Citrix and looking at mailing lists participation since the beginning of the year, 20% of the folks on the users mailing list and 25% on the developer list are from Citrix. We have diversified the community a great deal but the "hand-over", that moment when new community members are actually writing more code than the folks who started it, has not happened yet. A community is not just about writing code, but I will give it to you that it is not good for a single company to "control" 60% of the development, this is not where we/I want to be.

This whole discussion is actually against Apache's modus operandi. Since one of the biggest tenant of the foundation is non-affiliation. When I participate on the list I am Sebastien, I am not a Citrix employee. Certainly this can put some folks in conflicting situations at times, but the bottom line is that we do not and should not take into account company affiliation when working and making decisions for the project. But if you really want some company name dropping, let's do an ASF faux-pas and let's look at a few features:

The Nicira/NSX and OpenDaylight SDN integration was done by Schuberg Phillis, the OpenContrail plugin was done by Juniper, Midokura created it's own plugin for Midonet and Stratosphere as well, giving us a great SDN coverage. The LXC integration was done by Gilt, Klarna is contributing in the ecosystem with the vagrant and packer plugins, CloudOps has been doing terrific job with Chef recipes, Palo-Alto networks integration and Netscaler support, a google summer of code intern did a brand new LDAP plugin and another GSoC did the GRE support for KVM. RedHat contributed the Gluster plugin and PCExtreme contributed the Ceph interface while Basho of course contributed the S3 plugin for secondary storage as well as major design decisions on the storage refactor. The Solidfire plugin was done by, well Solidfire and Netapp has developed a plugin as well for their virtual storage console. NTT contributed the CloudFoundry interface via BOSH. On the user side, Shapeblue is leading the user support company. So no it's not just Citrix.

Are all these companies members of the CloudStack project ? No. There is no such thing as a company being a member of an ASF project. There is no company affiliation, there is no lock in, just a bunch of guys trying to make good software and build a community. And yes, I work for Citrix and my job here will be done when Citrix only contributes 49% of the commits. Citrix is paying me to make sure they loose control of the software, that a healthy ecosystem develops and that CloudStack keeps on becoming a strong and vibrant Apache project. I hope one day folks will understand what CloudStack has become, an ASF project, like HTTP, Hadoop, Mesos, Ant, Maven, Lucene, Solr and 150 other projects. Come to Denver for #apachecon you will see ! The end.