Build It Up, Or Tear It Down: Is Immutable Infrastructure Right For Your Team?

12th March, 2018 by

Immutable infrastructure is a DevOps pattern used to manage software deployments. The basic concept is that you never change a running system. Once a service or application has been instantiated, it cannot be modified or updated until it is removed. When a service does need to be updated, you build a new version and replace the running one. This approach can have a few interesting advantages compared to traditional upgrades including reduced configuration drift, increased confidence in deployments and reduced costs.

Martin Fowler first coined the term “snowflake server” to describe the fragile state that results from traditional deployment layering changes onto an existing server. Martin explains that “upgrades of one-bit software cause unpredictable knock-on effects. You’re not sure what parts of the configuration are important, or just the way it came out of the box many years ago. Their fragility leads to long, stressful bouts of debugging.”

In contrast, immutable infrastructure creates “phoenix servers” that rise powerfully from the ashes of the servers before them. Regularly burning down servers ensures reliable configuration and tests recovery time. In fact, Netflix even has a “chaos monkey” that burns down servers regularly to test the robustness of their architecture.

Immutable infrastructure has been gaining popularity with the advent of containers and automated configuration tools. Once you have a container, it’s simple to provision it in an isolated environment and run tests to ensure it’s healthy before provisioning it into production. Containers are not necessary for immutable infrastructure – you could even be building and distributing Amazon Machine Images (AMI), but containers definitely provide advantages here.

To examine the full benefits, it’s useful to analyze some potential issues during a traditional deployment and then see how immutable infrastructure could help.

Traditional Deployment Workflows and Configuration Drift

A traditional workflow for upgrading web servers might look something like this: perform a test upgrade on a production-like system to ensure everything will go smoothly, schedule a maintenance window, take the service offline and then, finally, upgrade the web server. After checking everything works as expected, you bring the service back online.

Obviously, that’s a very simple deployment workflow. If your DevOps is more mature, you might use a configuration management tool like Chef to automate deploys or perform rolling deployment using a load balancer. With configuration management and a cluster of servers behind a load balancer, traditional deployments are pretty slick. You have complete control, you can run tests on upgrades before they go live and everything is automated.


So why would anyone want to change to a system that mandates rebuilding and redeploying servers every time an update is required?

Imagine that instead of working to upgrade a server, you’re working to upgrade a broken vehicle. Every time something breaks you replace it with a new part or use duct tape to hold it all together. A few dozen fixes later, and the car doesn’t look anything like the vehicle you purchased – but it still runs!

Traditional deployment is similar to performing patchwork on a car. The basic server stays the same, even as upgrades and deployments occur. In the past, the cost of spinning up a new server was so time-consuming and expensive, working to keep the existing server up was the only option.

There are a few issues with this process. One of the biggest ones is configuration drift, the slow creep of configuration changes away from the originally deployed server that was set up. Configuration drift means that the server you think you’re deploying to might be in a totally different state than it was when it was set up. Maybe the production servers have an extra patch that was applied after a late-night fire fighting session and wasn’t put back into configuration management. It may be that the staging environment had half of an upgrade run already in an earlier experiment. Unfortunately, testing doesn’t entirely remove this issue.

As anyone who has owned a fixer-upper car can attest to, patches mean a lot of time in the garage. Every time something needs fixing, the car needs to be taken off the road. That’s not an option with servers, so any deployments mean that you need multiple servers running at all times, which might be prohibitively expensive for a small or nonessential service like a shipping calculator.

Finally, we all fall victim to unforeseen issues from time to time, and when that happens you can be in for extended downtime. Suppose you upgrade TLS from v1 to v1.2. The install works well on your staging environment, the test suite runs and everything is green. You add the update to Chef and watch your production stack upgrade smoothly. Job well done! But then the errors start: a critical 3rd party integration only supported TLS up to version 1.1. How do you roll back? In DevOps, operations are not necessarily reversible; just because a package installs correctly, doesn’t mean it will uninstall correctly. Sorting through the changes, identifying the issues and fixing them are all time-consuming tasks. Having a fast and reliable rollback strategy can be invaluable to quickly restore an application.

To eliminate these potential problems teams are turning to immutable infrastructure. Let’s take a look at how these “Phoenix servers” can help prevent many issues seen with traditional deployment processes.


How Immutable Infrastructure Can Help

Let’s walk through a sample deployment using immutable infrastructure principles. In this example, we’ll assume the service to be upgraded is packaged as a container for simplicity.

First, update the container’s build script to install the desired web server and then run it to create a new container. You can instantiate this container in a test environment to run your test suite or health checks. Assuming everything is working as expected, provision a brand new virtual server in the production environment and deploy the container. After validating, it can be registered with a load balancer and begin accepting requests. Drain connections from the old server before shutting it off, and the upgrade is complete.

No downtime, no touching live systems, and you have a guarantee that your new server is exactly how the build file specified – nothing more and nothing less!

Eliminate Configuration Drift – Instead of deploying changes to an existing server, new deploys are made to a clean slate, which removes many of the issues of traditional deployments. Dependencies can be tested before deployment, and there are no surprises when interacting with a live server.

Simple Rollbacks – What happens when things go wrong? Even with a great test suite things can still come unstuck. With immutable infrastructure, the old system still exists, even if it isn’t running. You can reverse the deployment process and be confident that the previous service is back in place giving you valuable time to fix the issues.

Reducing server overhead – Rolling deployments require multiple servers running all the time to ensure availability during deployments. Immutable infrastructure removes that requirement, allowing you to run extra servers during the deployment period only. This reduces both the cost and the risk of downtime.

When is Immutable Infrastructure Inappropriate?

There are many advantages to immutable infrastructure, and it might not be for everyone. While this paradigm works great for stateless apps and microservices, it’s possible that throwing away a running system is impractical in your architecture. For example, if your web application stores session data in memory or local files, destroying the server would log out all current users. It also assumes that the build time for the server isn’t prohibitively long. Because a new system is built for each change, a prerequisite for immutable infrastructure is Infrastructure as Code (IaC). Therefore, moving towards immutable infrastructure if you’re not already using IaC will be a big move.

Immutable infrastructure can be done anywhere, but it works best in a cloud or virtualized environment where creating new servers is a relatively cheap operation.

Finally, some applications like PostgresSQL or MySQL don’t do well on being moved around. There are many talks and articles (for example, Sysadmin4life) detailing how and why, but these sorts of applications aren’t designed to be thrown away.

Should I Consider Immutable Infrastructure?

If your infrastructure is cloud-based and your application meets the requirements for a 12-factor app, then immutable infrastructure might be the perfect paradigm for you.

Practicing immutable infrastructure doesn’t mandate a microservice architecture – it works equally well for monolithic applications. It’s a great way to ensure you are deploying robust and highly available services onto a platform which is easy to reason about, quick to rollback and testable.