Wednesday, March 19, 2014

Migrating from Publican to Sphinx and Read The Docs

Migration from Publican to Sphinx and Read The Docs

When we started with Cloudstack we chose to use publican for our documentation. I don't actually know why, except that Red Hat documentation is entirely based on publican. Perhaps David Nalley's background with Fedora influenced us :) In any case publican is a very nice documentation building system, it is based on the docbook format and has great support for localization. However it can become difficult to read and organize lots of content, and builds may break for strange reasons. We also noticed that we were not getting many contributors to the documentation, in contrast, the translation efforts via transifex has had over 80 contributors. As more features got added to CloudStack the quality of the content also started to suffer and we also faced issues with publishing the translated documents. We needed to do something, mainly making it easier to contribute to our documentation. Enters ReStructured Text (RST) and Read The Docs (RTD).

Choosing a new format

We started thinking about how to make our documentation easier to contribute to. Looking at Docbook, purely xml based, it is a powerful format but not very developer friendly. A lot of us are happy with basic text editor, with some old farts like me mainly stuck with vi. Markdown has certainly helped a lot of folks in writing documentation and READMEs, just look at Github projects. I started writing in Markdown and my production in terms of documentation and tutorials skyrocketed, it is just a great way to write docs. Restructured Text is another alternative, not really markdown, but pretty close. I got familiar with RST in the Apache libcloud project and fell in love with it, or at least liked it way more than docbook. RST is basically text, only a few markups to learn and your off.

Publishing Platform

A new format is one thing but you then need to build documentation in multiple formats: html, pdf, epub potentially more. How do you move from .rst to these formats for your projects ? Comes in Sphinx, pretty much an equivalent to publican originally aimed at Python documentation but now aimed at much more. Installing sphinx is easy, for instance on Debian/Ubuntu:
apt-get install python-sphinx
You will then have the sphinx-quickstart command in your path, use it to create your sphinx project, add content in index.rst and build the docs with make html. Below is a basic example for a ff sample project.

What really got me sold on reStructuredText and Sphinx was ReadTheDocs (RTD). It hosts documentation for open source projects. It automatically pulls your documentation from your revision control system and builds the docs. The killer feature for me was the integration with github (not just git). Using hooks, RTD can trigger builds on every commit and it also displays an edit on github icon on each documentation page. Click on this icon, and the docs repository will get forked automatically on your github account. This means that people can edit the docs straight up in the github UI and submit pull requests as they read the docs and find issues.


After [PROPOSAL] and [DISCUSS] threads on the cloudstack mailing list, we reached consensus and decided to make the move. This is still on-going but we are getting close to going live with our new docs in RST and hosted by RTD. There were couple challenges:
  1. Converting the existing docbook based documentation to RST
  2. Setting up new repos, CNAMEs and Read The Docs projects
  3. Setting up the localization with transifex
The conversion was much easier than expected thanks to pandoc, one of those great command line utility that saves your life.
pandoc -f docbook -t rst -o test.rst test.docbook
You get the just of it, iterate through your docbook files and generate the RST files, combine everything to reconstruct your chapters and books and re-organize as you wish. They are off course couple gotchas, namely the table formatting may not be perfect, the note and warnings may be a bit out of whack and the heading levels should probably be checked. All of these are actually good to check as a first pass through the docs to revamp the content and the way it is organized.
One thing that we decided to do before talking about changing the format was to move our docs to a separate repository. What we wanted to do was to be able to release docs on a different time frame than the code release, as well as make any doc bug fixes go live as fast as possible and not wait for a code release (that's a long discussion...). With a documentation specific repo in place, we used Sphinx to create the proper directory structure and add the converted RST files. Then we created a project on Read The Docs and pointed to the github mirror of our Apache git repo. Pointing to the github mirror allowed us to enable the nice github interaction that RTD provides. The new doc site looks like this.

There is a bit more to it, as we actually created several repositories and used a RTD feature called subprojects to make all the docs live under the same CNAME This is still work in progress but in track for the 4.3 code release. I hope to be able to announce the new documentation sites shortly after 4.3 is announced.
The final hurdle is the localization support. Sphinx provides utilities to generate POT files. They can then be uploaded to transifex and translation strings can be pulled to construct the translated docs. The big challenge that we are facing is to not loose the existing translation that were done from the docbook files. Strings may have changed. We are still investigating how to not loose all that work and get back on our feet to serve the translated docs. The Japanese translators have started to look at this.
Overall the migration was easy, ReStructuredText is easy, Sphinx is also straigthfoward and Read The Docs provides a great hosting platform well integrated with Github. Once we go live, we will see if our doc contributors increase significantly, we have already seen a few pull requests come in, which is very encouraging.
I will be talking about all of this at the Write The Docs conference in Budapest on March 31st, april 1st. If you are in the area stop by :)

1 comment:

  1. Anonymous11:08 AM

    Sphinx does not even support text colour. It's markup for tables sucks.

    It's latex output is substandard.

    Honestly... I'm thinking of switching to Publican (and docbook). It seems much more professional and does not have any of the annoying limitations of sphinx.