Amazon Web Services: A Developer Primer
I am an enthusiastic user of cloud computing platforms. As a developer, I'd much rather spend my time actually writing web applications and services and leave from the scalability and deployment issues to someone (or something) else. One cloud deployment platform in particular that I've found to be powerful and flexible is Amazon Web Services (AWS).
I use AWS for several customer projects as well for personal projects (Amazon also gives me grants so that I can freely experiment with AWS to support my writing endeavors). With Amazon, you have the best of both worlds: scalable services that you can use to architect your applications, and the flexibility to install and run any software on Amazon's Elastic Compute Cloud (EC2) instances (a "server unit" implemented using virtual private server technology).
Amazon partitions their massive server farms across different geographical locations called "availability zones." At the cost of increased network latency and having to pay non-local bandwidth fees, you can make your web applications and web services more robust by partitioning them across more than one availability zone.
In this article, I recalled my professional and personal experiences with AWS to provide a primer for developers who are new to the platform.
Management Console Versus Command Line Tools?
When I first started using AWS, I relied on the Web Management Console for most tasks, including starting and stopping EC2 instances, backing up my file volumes with snapshots, assigning Elastic IP addresses to EC2 instances, etc. However, as I started to use AWS on more consulting jobs, clients almost always asked me to automate as many of the routine administration tasks as possible. For that requirement, you should install the AWS command line tools and learn to use them. You also should install the command line tools on your EC2 AMIs (Amazon Machine Instances) so that they are available on your EC2 instances.
For tasks such as attaching and mounting EBS (Elastic Block Storage) file systems automatically, assigning Elastic IP addresses to an EC2 instance automatically, etc., you would write Ruby (or bash or Python or whatever your favorite scripting language is) to run from
/etc/rc.local (assuming Linux, not Windows). Learning the command line tools does require considerable time, however. Fortunately, Amazon's documentation is very good.
EC2 Instances: AWS Building Blocks
Amazon bills EC2 instances by the hour and offers many options for amount of memory and number of virtual CPU cores. On the low end, I bought a three- year EC2 reservation for a "small instance" (1 virtual CPU core with 1.7 GB of memory), which costs $0.03/hour (about $21/month because I always leave it running.) If you do not purchase a reserved instance, a small instance would cost $0.085/hour (or about $61/month if you leave it running 24/7). EC2 instances are available with up to 68 GB of memory for large applications.
For occasional needs, I set up a larger EC2 instance using the new EBS boot disk option (more on this later). EBS can be halted and restarted very quickly and you pay only for the time (billed by the hour) that it is active. For developers and companies on a budget (and who doesn't try to save money!), this is a "killer feature" of AWS because it enables them to do things such as inexpensively start up several large servers for a few hours to test deployment strategies, quickly performing large computations, etc.
The easiest way to use an EC2 instance is to find an existing public AMI with the open source infrastructure software that you need already installed. Last year while writing a book on AWS, I set up an AMI that had all of the software I was writing about installed and configured. It allowed me to provide examples for readers to try without much setup time. (I use only Linux on EC2, but Windows is also available if that is your preferred platform.) As a software developer, think of the opportunity for marketing your web applications as AMIs for delivery to customers.
While EC2 instances are the basic infrastructure "units" when using AWS, Amazon also provides other high-level and scalable services that I will discuss next.
Elastic Block Storage
Elastic Block Storage (EBS) provides block-level storage volumes that you can format using any type of file system that is appropriate for your application. For increased performance (especially for read operations), I use multiple EBS volumes in a RAID 0 configuration. Why RAID 0? EBS volumes are very robust because they are replicated across multiple Amazon availability zones. So, it does not make sense to use error-correcting RAID; I just want better read performance.
While you may need RAID to get the level of performance your application requires, the easiest way to use EBS volumes is as single file systems. Even though EBS volumes are very durable (i.e., reliable), I still recommend periodically taking "snapshots" for additional backup.
Simple Queue Service
Using robust asynchronous messaging systems is the "secret sauce" that makes building reliable distributed systems possible. On a large scale, I have used reliable messaging on a worldwide nuclear test monitoring system (1980s) and a large-scale telephone credit card fraud detection system (1990s). The issues are different in building multiple server systems that run in a single data center, but reliable messaging is still the secret for making the architecture simpler for complex systems.
The basic idea of the Simple Queue Service (SQS) is that you can write structured data to what is effectively a globally accessible queue from which other processes can provisionally remove the data and process it. If a worker process fails to acknowledge success in using a queue item, then that queue item is available for other worker processes.
SQS is very robust. Amazon designed and implemented SQS as the backbone for its transactional processing for fulfilling orders, etc. Use this robustness in your own applications, even small applications that run on a single EC2 instance.