The Technology Behind the OpenStack Cloud Computing Project
OpenStack Technology Overview
The interesting aspect of Nova and Swift is that both projects are very Python-centric and both targeted at running on Ubuntu Server. The Object Storage (Swift) documentation makes heavy reference to various Python libraries, but seems to rely on the standard Linux rsync utility to provide synchronization between nodes. Of the two, Swift seems to provide the most technologically focused and clearly defined stack.
Nova on the other hand is a mishmash of various technologies, some of which make me leery. The main networking/HTTP layer is built around Python and the Tornado async I/O server, which was donated to the open source community by Facebook.
Due to some of Tornado's limitations regarding large file sizes, Nova includes a second, separate Web server, the highly regarded nginx server, which is written in C++. It is known for its async I/O design and excellent performance. Nevertheless, the presence of two totally separate Web servers seems to complicate Nova's design somewhat.
Nova also includes a messaging system based on RabbitMQ. Unlike the Web component, it is written in a totally different technology (Erlang) and is a memory-only, non-persistent message queue. In short, when RabbitMQ runs out of memory it dies -- very quickly and very hard. When we evaluated it internally at my organization about a year ago to replace our creaking WebLogic JMS queues, I saturated RabbitMQ and it crashed so badly that we had to reboot the whole server to get it running again. Hopefully, it has improved since then but the lack of disk-based persistence is worrisome in such a critical component. However, it has been recently bought by none other than SpringSource (and thus indirectly VMware), so it's sure to receive a steady stream of maintenance and new features.
Next, we have Redis, used for persistent, fast, replicated data sharing. Redis is often described as memcached on steroids. However, with the integration of Object Storage (Swift) it seems to be a redundant component that is an artifact of two separate projects attempting to merge together. I was not able to find any documentation clarifying Redis's future in the combined OpenStack architecture.
And last we have OpenLDAP, a highly regarded LDAP provider written in C++.
As you can see, the Object Compute part of OpenStack is quite a mixture of different technologies (Python, C++, Erlang). I presume this makes maintaining and updating such a system in production more complicated.
OpenStack vs. Ubuntu Cloud
After reading up on OpenStack, it became quickly apparent that it seems to have a lot in common with Ubuntu's cloud offering (known previously as Eucalyptus). Both attempt to emulate the Amazon EC2/S3 APIs and both run on Ubuntu Server. The difference is that the Ubuntu Cloud seems to be already complete, in production and available today. It will be interesting to see how these two will establish themselves a year from now.
Although cloud computing is still used by a minority of development shops, it may very well be the standard deployment model for tomorrow. Hence, it is good to see NASA and Rackspace's attempt at providing an open standard that could avoid vendor lock-in. This should be good news for companies such as Rackspace, probably not so much for those pushing a proprietary model (such as the Google App Engine).
About the Author
Jacek Furmankiewicz is a Senior Java EE developer at Radialpoint. He has 15 years of professional IT experience and is a Sun Certified Java Programmer (SCJP).
Page 2 of 2