Architecture & DesignRe-Engineering Legacy Software: Introduction to Vagrant

Re-Engineering Legacy Software: Introduction to Vagrant content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

By Chris Birchall

Excerpt from the book Re-Engineering Legacy Software

Vagrant is a tool that allows you to programatically build an isolated environment for your application and all of its dependencies.

The Vagrant environment is a virtual machine, so it enjoys complete isolation from both the host machine and any other Vagrant machines (VMs) you may be running. For the underlying virtual machine technology, Vagrant supports VirtualBox, VMWare, or even a remote machine running on Amazon’s EC2 infrastructure.

The vagrant command lets you manage your VMs (starting them, stopping them, destroying unneeded VMs, etc.), and you can login to a VM simply by typing vagrant ssh. You can also share directories (such as your software’s source code repository) between the host machine and the VM, and Vagrant can forward ports from the VM to the host machine, so you could access a web server running on the VM by accessing http://localhost/ on your local machine.

The main benefits of using Vagrant are as follows.

  • It makes it easy to automate the setup of a development environment inside a VM, as we shall see shortly.
  • Each VM is isolated from the host machine and other VMs, so you don’t need to worry about version conflicts when you have many different projects set up on the same machine. If one project needs Python 2.6, Ruby 2.8.1 and PostgreSQL 9.1, while another needs Python 2.7, Ruby 2.0 and PostgreSQL 9.3, it can be tricky to set everything up on your development machine. But if each project lives in a separate VM, it can make life easier.
  • The VMs are usually Linux machines, so if you are using Linux in production then you can exactly recreate the production environment.

If you want to get really fancy, Vagrant even supports multi-VM setups, so you could build the entire stack for your application (including web servers, DB servers, cache servers, Elasticsearch clusters and what have you), exactly replicating the setup you have in production, but all running inside your development machine!

If you don’t have Vagrant installed, head over to the Vagrant website and follow the installation instructions. It’s pretty simple. Note that you will also need a virtual machine provider such as VirtualBox or VMWare installed. I’ll be using VirtualBox.

Setting up Vagrant for the UAD project

To add Vagrant support to the User Activity Dashboard, first we need to create a Vagrantfile. This is a file in the root folder of the repository named, unsurprisingly, Vagrantfile. It’s a configuration file, written in a Ruby DSL, that tells Vagrant how to setup the VM for this project. You can create a new Vagrantfile by running vagrant init. A minimal Vagrantfile is shown below.

Vagrant.configure(VAGRANTFILE_API_VERSION) do
|config| = "ubuntu/trusty64" end

Note that we need to specify what box to use for our VM. A box is a base image that Vagrant can use as a foundation for building a new virtual machine. I’ll be using 64-bit Ubuntu 14.04 (Trusty Tahr) as the OS for my virtual machine, so I set the box to ubuntu/trusty64. There are many other boxes available on the Vagrant website.

Now you’re ready to start your VM by typing vagrant up. Once it boots, you can login by typing vagrant ssh and take a look around.

Not much to see yet, but one thing to notice is that the folder containing the Vagrantfile has been automatically shared, so it is available as /vagrant inside the VM. This is a two-way share, so any changes you make in the VM will be reflected in real-time in your host machine, and vice versa.


You can see the complete code for this chapter in the GitHub repo.

So far Vagrant is not doing anything very useful, as we just have an empty Linux machine. The next step is to automate the installation and configuration of the UAD’s dependencies.

Automatic provisioning using Ansible

The installation and configuration of everything needed to run a piece of software is known as provisioning. Vagrant supports a number of different ways of provisioning, including Chef, Puppet, Docker, Ansible and even plain old shell scripts.

For simple tasks, a bunch of shell scripts is often good enough. But they are difficult to compose and reuse, so if you want to do more complex provisioning or reuse parts of the provisioning script across multiple projects or environments, it’s a good idea to use a more powerful tool. In this article I’ll be using Ansible, but you can achieve much the same thing using Docker, Chef, Puppet, Salt, or whatever tool you’re happiest with.

In this article, I’m going to write a few Ansible scripts to provision the UAD application.

Before we can provision with Ansible, we need to install it on the host machine. See the installation docs on the Ansible website for details. (I appreciate the irony of manually installing all this stuff so we can automate the installation of other stuff, but I promise this is the last thing we need to install manually. And after you’ve installed VirtualBox, Vagrant and Ansible once, you can use them for all your projects.)


Ansible on Windows

Unfortunately Ansible does not officially support running on Windows. But with a bit of work it is possible to get it running. See this excellent blog post for a step-by-step guide on getting Vagrant and Ansible working on Windows.


Unlike other provisioning tools such as Chef or Puppet, Ansible is agentless. This means that you don’t need to install any Ansible agent on your Vagrant VM. Instead, whenever you run Ansible it will execute commands on the VM remotely using SSH.

To tell Ansible what to install on our VM, we need to write a YAML file called a playbook. A minimal example is shown below.

--- hosts: all   tasks:     - 
name: Print Hello world
debug: msg="Hello world"

This tells Ansible two things. First, it should run the script on all hosts that it knows about. In our case, we only have a single VM, so this is fine for our purposes. Second, it should run a task that prints “Hello world”.

We also need to add a couple of lines to our Vagrantfile to tell Vagrant to use Ansible for provisioning. Our Vagrantfile now looks like this.


Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| = "ubuntu/trusty64"

   config.vm.provision "ansible" do |ansible|
ansible.playbook = "provisioning/playbook.yml"
end end

Now if you run vagrant provision, you should see output something like the following.

PLAY [all] *********************************************

GATHERING FACTS ****************************************
ok: [default]

TASK: [Print Hello world] ******************************
ok: [default] => {     "msg": "Hello world"

PLAY RECAP *********************************************
default    : ok=2   changed=0   unreachable=0   failed=0

Now that we’ve got Ansible hooked up to Vagrant, let’s use it to install the dependencies for the User Activity Dashboard. Recall that we need to:

  • install Java
  • install Apache Ant
  • install Redis
  • install Resin 3.x install and configure Apache ActiveMQ
  • download a license file and copy it to the Resin installation folder

We’ll use the concept of Ansible roles, creating a separate role for each of these dependencies. This keeps each dependency cleanly separated, so we can later reuse them individually if we wish. Let’s start with Java, as we need that before we can do much else.

OpenJDK can be installed using the apt package manager in Ubuntu, so our Java role will be quite simple. It will have just one task that installs the opendjk-7-jdk package.

Let’s create a new file provisioning/roles/java/tasks/main.yml (by convention this is where Ansible will look for the Java role’s tasks) and write our task there:

--- name: install OpenJDK 7 JDK   apt: name=openjdk-7-jdk state=present

There are couple of things to note, even in this very short file. First, apt is the name of a built-in Ansible module. There are loads of these, and it’s worth becoming familiar with them so you don’t accidentally re-invent the wheel when there is already a module that does what you want. You can see a list of them, with documentation and examples, on the Ansible website.

Second, we are not actually telling Ansible to “install Java,” but rather to “ensure the Java package is present.” Ansible is smart enough to check if the package is already installed before it tries to install it. This means that (well-written) Ansible playbooks are idempotent, so you can run them as many times as you like.

We need to tell the playbook to use our new Java role, so let’s update the provisioning/playbook.yml file. It should now look like this.

--- hosts:
all   sudo:
- java

Now if you run vagrant provision again, the output should look something like this.

PLAY [all] *********************************************

GATHERING FACTS ****************************************
ok: [default]

TASK: [java | install OpenJDK 7 JDK] *******************
changed: [default]

PLAY RECAP *********************************************
default   : ok=2   changed=1   unreachable=0   failed=0

If you want to check that it worked, SSH into the VM and run java -version:

vagrant@vagrant-ubuntu-trusty-64:~$ java -version java version
"1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Cool! We just installed our first dependency using Vagrant and Ansible.

Adding more roles

Let’s continue in the same vein, adding another role for each of our dependencies. Next in the list are Redis and Ant, but they’re pretty much the same as Java (just installing a package using apt), so I’ll gloss over them here. Remember, you can view the complete code in the GitHub repo.

We’ll try Resin next. The Resin role’s tasks file is shown in Listing 1.

Listing 1: Ansible tasks to install Resin 3.x

--- name: download Resin tarball get_url: >
url =
- name: extract Resin tarball unarchive: >

- name: change owner of Resin files file: > state=directory

- name:   create /usr/local/resin symlink file: >

- name:  set RESIN_3_HOME env var lineinfile: >

  line='export   RESIN_3_HOME=/usr/local/resin'

This file is much longer than the previous one, but if you look at each task in turn you’ll see that we’re not doing anything too complicated. The tasks, which will be run by Ansible in the order they are written, are doing the following.

  1. Download a tarball from the Resin website
  2. Extract it under /usr/local
  3. Change its owner from root to the vagrant user
  4. Create a convenient symlink at /usr/local/resin
  5. Set up the RESIN_3_HOME environment variable that the UAD application requires

If you add the new Resin role to the main playbook file and run vagrant provision again, you should end up with Resin installed and ready to run.

The tasks for the next role, ActiveMQ, are similar to those for installing Resin (download a tarball, extract it, and create a symlink). The only task of note is the final one, shown below.

- name: customize ActiveMQ configuration
  copy: > src=activemq-custom-config.xml

This task uses Ansible’s copy module, which copies a file from the host machine to the VM. We use this to overwrite ActiveMQ’s configuration file with a customized one after the tarball has been extracted. This is a common technique, whereby large files are downloaded from the Internet onto the VM but smaller files, such as configuration files, are stored in the repository and copied from the host machine.

The only remaining task is to download a license file for a proprietary XML parsing library from somewhere on the company’s internal network and store it in the Resin root directory. This task is quite specific to the UAD application and likely can’t be reused anywhere, so let’s create a role just for UAD-specific stuff and put it in there.

I’ll leave the task definition as an exercise for the reader, in case you want to practice writing Ansible scripts. (Just download any random text file from the Internet to represent the hypothetical license file.) A solution is available in the GitHub repo.

Removing the dependency on an external database

This is going great so far. We’ve managed to automate almost the entire setup of the UAD development environment with just a few short YAML files, which should make the process a lot less painful for the next person who has to set this project up on their machine.

But there is still one last issue that we haven’t tackled. As things stand, the software depends on a shared PostgreSQL database in the test environment, so all new starters need to ask the ops team to create a DB user for them. If we could set up a PostgreSQL DB inside the VM and tell the software to use that one instead, it would solve the problem. It would also mean that each developer has complete control over the content of their DB, without fear of somebody else tampering with their data. So let’s give it a try!

We’ll assume we have some credentials for the test environment, and use those credentials to connect to the DB and take a dump of the schema:

$ pg_dump --username chris --host=testdb --dbname=uad
--schema-only > schema.sql

Then we add some Ansible tasks to: install PostgreSQL; create a DB user; create an empty DB; and initialize it using the schema.sql file we just generated. This is shown in Listing 2.

Listing 2

- name: create DB user  sudo_user:
  postgres postgresql_user: >
  name=vagrant  password=abc  role_attr_flags=LOGIN

- name: create the DB  sudo_user:
  postgres postgresql_db: > name=uad

- name: count DB tables  sudo_user:
  postgres  command: >  psql uad -t -A
  -c "SELECT count(1) FROM

  WHERE schemaname='public'"  [1]
  register: table_count

- name: copy the DB schema file if it
  is needed  copy: > src=schema.sql
  dest=/tmp/schema.sql  when:

table_count.stdout | int == 0  [2]

- name: load the DB schema if it is
  not already loaded  sudo_user: vagrant
  psql uad -f /tmp/schema.sql  when:
  table_count.stdout | int == 0>

[1] We use the number of DB tables to decide if we have already loaded the schema

[2] This task will not run if the DB schema already contains tables

Note that this is a little more complicated than the Ansible tasks we’ve written so far, because we need to do a bit of trickery to achieve idempotency. We’re doing some conditional processing so that we only load the DB schema if the number of tables in the DB is zero, meaning that we haven’t already loaded it.

We’ve now automated the creation of a local PostgreSQL DB, so we’ve managed to fill in the final piece of the automation puzzle. In the next section we’ll take a look at how our automation effort pays off.

First day on the job – Take Two

Congratulations! And welcome to your new job at Fzzle, Inc. Once the HR guy has given you the tour of the office, he takes you to meet your new boss. Anna is the Tech Lead of the User Services team. She shows you your desk and fills you in on the details of the job.

Your first task is to add a new feature to an application called the User Activity Dashboard. You clone the git repository and take a look at the README file to see how to get it running locally.

The README explains that you can set up a development environment using Vagrant and Ansible. As standard elements of the company’s recommended toolchain, these tools are pre-installed on your development machine. You kick off the vagrant up command, which will build and provision your virtual machine. It’ll take a few minutes to complete, so you wander off to try and work out how the coffee machine works…

By the time you get back, the provisioning is complete and you get the application running with little fuss. By lunchtime you start work on implementing the new feature. And by the end of the day you’ve completed the implementation and made your first pull request, and you’ve also made a note of a couple of places you’d like to refactor tomorrow. Not a bad first day on the job!



Re-Engineering Legacy Software: Introduction to Vagrant

By Chris Birchall

In this article, excerpted from the book Re-Engineering Legacy Software, I’ll introduce you to Vagrant, a tool that allows you to programmatically build an isolated environment for your application and all of its dependencies.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories