By Simon Maple, Developer Advocate at ZeroTurnaround
Part One of this two-part series examined the trends and history of development in the cloud and offered practice advice and guidance for navigating public, private, and hybrid-cloud environments. This article will offer basic architectural considerations for creating cloud-ready applications.
At the end of the day, it’s not really about the particular cloud infrastructure you use. Whether it is Amazon AWS or OpenStack, what you as a developer need to understand is that all of these systems have converged on a set of core application services and infrastructure. When you work with a public or a private cloud you’ll have access to a tool or an API that allows you to provision resources without the direct involvement of operations. If you need to create a load balancer or a new VM, as a developer you should look for solutions that allow you to do so directly. Here are the top ten considerations for realizing the potential of the cloud.
1. Don’t Depend on Individual Servers
When your application ran on physical servers, it was easy to think of that application as running on that machine permanently. With physical servers, you aren’t under the same pressure you face on the cloud to automate deployment from a blank OS to a working application in minutes. In the cloud, you have to be ready to move an application or redeploy an application on a new VM quickly, and you cannot assume stability for any single node. Cloud systems have unique failure conditions, especially those that are running on popular public clouds, such as Amazon AWS.
Your applications could be designed to use something like an ESB to locate a service. One application server should never rely on another server directly, and these servers should all be behind some service locator or a load balancer. You should assume and plan for your cloud-based application to be running on a collection of constantly shifting VMs.
2. Scale Horizontally from the Start (or Everything Must Be Redundant)
In the cloud, you should always strive for redundancy. This is especially true in a multi-tenant public cloud, because you never know what other VMs are going to be sharing the same physical nodes. Instead of scaling a system up by moving to large VMs, scale a system “out” by opting for a larger number of small VMs. If you are trying to build a system that can scale to meet demand, it’s far easier to scale if you think about horizontal scale earlier in a project.
In a cloud-based architecture, this means avoiding building that single Tomcat instance that runs 20 different Web applications or building that Spring application that tries to do everything at once. If you think about your overall system as a series of smaller components cooperating with each other, it will be easier to model this as a collection of smaller footprint services on a public cloud.
3. Avoid Storing State like the Plague
We all know state is bad, right? If you want your applications to be amenable to horizontal design, your systems need to store as little state as possible. Don’t use the session scope in Web application, don’t assume that one request will be routed to the same application server, and avoid using the file system whenever possible. Session affinity? Don’t use it at scale. It might make it easier to debug systems, but session affinity will slow you down in the cloud because you’ll have to worry about where a particular request is routed.
Instead of relying on the file system, use a cloud-based storage solution like S3 to store a reference to a file that can be easily located on any other node. Alternatively, you can use NoSQL storage to store state or you can use a cloud-managed database that is replicated across clouds.
4. Use Multiple Data Centers
A very common mistake in the cloud is when a company decides to move to the cloud and they proceed to put everything into a single data center. Using Amazon as an example, when you start using the tool it is tempting to put everything into us-east1a then to use another cloud us-east1b. This isn’t distributing an application across multiple data centers; this is simply using two different zones in the same data center.
If you want to remain available to your customer, ensure you run your systems across multiple data centers. For example, if you are running a cluster of Tomcat servers, put a few on the West Coast and put a few on the East Coast. If you are replicating a database, don’t just replicate it to another rack in the same data center; replicate your database to another geographic location. Failure is real in the cloud and redundancy is a necessity. Over the last two years, there have been several high profile failures where it was clear that a company was making use of the public cloud, but only in one data center.
5. Automate Deployments
This cannot be emphasized enough. You are not really using the cloud until it is “hands-off.” If your operations team need to spin up several new instances of your application, they should be able to do so without direct developer participation. If a system needs to scale automatically, it should stop and require manual input or setup.
There is a whole host of tools available to automate deployments. Chef, Puppet, CFEngine, or Ansible may be the tool you use, or if you use Docker there are a host of tools, like Kubernetes, and a host of services that interoperate with containers. It doesn’t matter what you use to automate, you need to automate everything. Then, you need to run drills and tests to make sure that your systems can be built, deployed, and recovered with a little manual intervention as possible.
6. Configure Automatic Scaling for Cloud-based Systems
This goes hand in hand with horizontal scale and with the automation of deployments. If you are developing software that can scale horizontally, you need to take the extra step and configure your cloud-based systems to automatically react to increases or decreases in load.
7. Backups, Backups, Backups
When you run your servers, you also run your own backups. Maybe you work with a company like Iron Mountain to guarantee that your data is stored away in a vault somewhere, or maybe you have a machine that is constantly taking snapshots of your database.
In the cloud, your data is less tangible. You can’t just reach out and touch your backup tapes. For this reason, you will want to overinvest in backups for critical data. Although most cloud providers have an SLA for data storage and recovery, you shouldn’t rely on a single provider for your backups. If you do rely on cloud-based backup services, make sure that your backups are being stored in multiple data centers, preferably on different continents.
8. Understand Storage Options (and How They Perform)
When you access storage on a cloud, you have to understand how your application is going to perform based on the I/O characteristics of each storage option. On a cloud VM, you have access to ephemeral storage and, as this is ephemeral storage, it will be discarded whenever the VM is no longer being used.
After local storage, you’ll have access to block storage; in OpenStack this is something like Cinder, and in Amazon AWS this is EBS. This storage can have a wide variation of performance characteristics. For example, you can provision high IOPS storage if you need performance. The key here is to understand that if your application is I/O intensive, you have different storage options available to you in the cloud.
9. Aggregate and Analyze Your Logs
Once you’ve architected your applications to be cloud-ready, you’ll then have an application that can be horizontally scaled and distributed across multiple data centers. When you are operating a highly distributed system, the next challenge is logging and monitoring.
The cloud offers many different solutions to this problem—for example, Kafka and Flume—to help. Other people use message brokers to store and forward log messages to various systems, and one very popular option is Splunk. When you are running in the cloud, it’s important to put an emphasis on gaining insight into what your overall application is doing at any given log aggregation.
10. Monitor Per-node Performance
Whether you have ten thousand application servers or two developers, you need to have a window into single-node performance issues. You can use tools like XRebel in the cloud to gain valuable insight into single-node performance in the cloud. Developers should use tools that provide them direct insight into the performance of code running on a single node.
If your system scales horizontally, it may be difficult to follow a single transaction across multiple levels of services running across thousands of servers. Build in the ability to quickly identify which nodes were involved in a specific transaction and instrument every node in your production network to collect performance information. When you need to diagnose a performance bottleneck or a slow set of transactions, it will be easier if you can rely on XRebel along with your own set of logs to provide insight into how code is operating at scale.
Realizing the Potential of the Cloud
The cloud presents different challenges and opportunities than the more static deployment approaches it replaces. Learning to adapt to these differences can take time, and some developers initially regard concepts such as ephemeral storage and infrastructure automation with some trepidation. Adapting application architecture to cloud-based infrastructure might, at first, feel like a distraction, but after deploying several applications on the cloud most developers come to embrace these changes as they encourage architectures that are ready to both scale and remain stable in the face of failure.
In Part Two of this two-part series, you will learn more about the cloud and considerations for using it.
About the Author
Simon Maple is Head Developer Advocate for ZeroTurnaround, a Java Champion, JavaOne Rockstar, @virtualJUG founder, LJC co-org, and @RebelLabs author. Follow him on Twitter @sjmaple.