What is Infrastructure as Code (IaC)?

What if you could automate the process of building environments, pushing out software changes, and making changes to infrastructure? What if on top of automation, it was easier to codify and view an incremental history of any adjustments to your environment?

Infrastructure as Code (IaC) is your answer. Learn what IaC really means, the benefits it offers, how it works, and what kinds of tools are available to assist in deploying IaC.

How Infrastructure as Code is changing the speed of business

Infrastructure as Code (IaC) is a way to automate provisioning software, networks, and virtual machines.

Before IaC, developers approached building, updating, and deploying environments manually — a time-consuming and error-prone process. By replacing scripts and hand-written notes with an abstract model of the environment, IaC manages and automates different infrastructure configurations using full-featured programming languages. This is more than writing code however, IaC involves using standard development practices like version control, incremental deployments, and design patterns.

And IaC makes DevOps — an IT strategy aiming to deliver projects quickly and accurately — possible by bringing the infrastructure team onboard with the development team. This is necessary to aid programmers in continuously testing and deploying software updates in real time. With IaC, the project team is able to release project code in carefully managed iterations, such as agile sprints.

New tools like SaltStack, Puppet, Chef, and Ansible have let administrators define what software needs to be installed where and then push those changes to multiple servers. It lets the administrator and architect model the entire system. Hence the name, Infrastructure as Code, is an abstract representation of the physical and logical environment.

Download The Definitive Guide to Data Integration now.
Download Now

Benefits of Infrastructure as Code

One could safely say that the introduction of Infrastructure as Code has pushed new applications and technologies, like cloud software development, ahead an order of magnitude in terms of accuracy, speed, and productivity.  Here are some tangible benefits of Infrastructure as Code:

1. Added logic to installation scripts

The word script usually means basic Linux commands, such as copy (cp). Those have limited, or no feedback loops, so it's not always possible to know when a command worked or failed.

Further, without extensive coding there is no ability to create dependencies between objects. That's an important piece because you would not want to install two instances of a database on a machine if the previous step to add a network interface failed. While you could code that with a bash script, it would make for very long code that would be difficult to understand and maintain.

2. Alignment with the development team

Infrastructure must be rolled out in an organized fashion to meet both the schedule and requirements of the development project. Consider, for example, a company that is writing an order entry system.  It is necessary to plug infrastructure tasks into software development and configuration. With a large team of testers, managers, developers, and system administrators, IaC allows for complete transparency and version control amongst the team.

 

3. Automated testing and building test environments using IaC

Automated testing takes two forms: manual and automatic. Even many manual tasks have now become automated because of advances in that code too. But still, testing has to come after coding, depending on the nature of the test. Then there is testing that takes place outside of coding, like stress testing.

All of this requires that a test environment be set up. So, in the case of someone using the AWS cloud, there is a need to build out a target environment that is different than production. The software is pushed out there, meaning it is installed and configured, and storage and networks are set up. But Amazon AWS charges for all of that hardware. So at the end of the test it is necessary to shut all of that down and delete the storage too, to avoid monthly surcharges. This task is given to the IaC developers who can automate these tasks.

Watch Run Modern Data Engineering at Scale now.
Watch Now

4. Leveraging open source packages

There are no open source repositories for bash or c shell scripts because they would not be very useful as they are limited in function. But there are plenty of public, open source code examples and tools developed by the users of SaltStack, Ansible, and other IaC tools.

For example, Ansible has Modules that can be used inside playbooks. These are very specific. There are modules to work with specific Amazon objects—like S3 storage and SSH keys—and Microsoft Azure objects—like load balancers. Then there are modules for more generic tasks, like the apache2_module, which sets up and configures the Apache web server, setting up virtual hosts, adding SSL certificates, etc.

5. Making DevOps possible and practical

DevOps is a term that has come into wide use over the past decade. It too is an outgrowth of the agile software development methodology. DevOps means integrating the infrastructure and system administration team into the development team. Software as Infrastructure makes this easier, as system administrators can take over that part of the development process that is infrastructure-related. They can take ownership of support tickets that otherwise would go to developers, write code to fix the problem, and then include that code in the same tool as the rest of the development team uses.

And, as we said before, their widgets and packages of code fit into the overarching system used by the program team.

So one DevOps task might be to increase swap space on server and reboot server. All of that can be done from the central mechanism. Unlike the older programming languages, the modern IaC languages can report details on their statuses to guide the next steps in the process or roll back the changes completely. That feedback can roll up into a large display that the whole team can see sitting in the project room.

How IaC works

First, the list of main products includes:

  • SaltStack
  • Puppet
  • Chef
  • Ansible

These all operate on the same basic principles:

  • The system pulls configuration files and code from a central repository.  
  • Some kind of map, such as a directory structure and a combination of tags, shows which servers are targeted for which purpose—such as web server or load balancer, or environment, e.g., test. This lays out a definition of the infrastructure, i.e., its abstraction into code.
  • The system authenticates with servers using SSH keys. Then it sends commands to the targeted server. Configuration files are either pushed or pulled to the target environment.
  • The system can be run on-demand or scheduled. Plus, agents in the node can poll the central repository at a predefined frequency.

SaltStack

SaltStack uses YAML files for configuration, but it also lets the programmer add logical statements using the Jinja programming language. It is a mix of static and dynamic commands, with the YAML statements (static) indicating the targeted end state (declarative), and Jinja (dynamic) allowing logic and variables to insert into the flow, thus giving run-time information that varies from machine to machine.

To illustrate, look at the example below, taken from SaltStack. The Jinja code is delimited by % (percent) symbols. This lets the user mix dynamic and static statements.

The command below says to install the software package tcsh if the operating system (grains['os']) is not FreeBSD. And the statements below that tell SaltStack where to find the configuration for each of the targeted OS's FreeBSD, Debian, and other. Machine attributes are called Grains by Salt Stack.

 


{% if grains['os'] != 'FreeBSD' %}

tcsh:

 pkg:

   - installed

{% endif %}


motd:

file.managed:

 {% if grains['os'] == 'FreeBSD' %}

 - name: /etc/motd

 {% elif grains['os'] == 'Debian' %}

 - name: /etc/motd.tail

 {% endif %}

 - source: salt://motd

Puppet

Puppet uses its own language plus the Ruby programming language for configuration.  Administrators with no programming language at all are going to be worried that they might have to learn Ruby until they get used to the idea of programming.  

Here is just a snippet. We designate the /etc/passed file and give it permissions and ownership. The ensure statement just means to verify the items below it, like who is the owner.

# A resource declaration:

file { '/etc/passwd':

ensure => file,

owner => 'root',

group => 'root',

mode  => '0600',

}

Infrastructure as Code tools

Here we look briefly as some of the main products.

SaltStack

SaltStack runs the salt-master service on the master and salt-minion on minions, or targets.

Pros

  • Operates in agent or agentless code.
  • YAML-based configuration language is not complicated.
  • Python API, useful since many programmers know Python. SaltStack is written in Python.
  • Like the other tools mentioned below it lets you package common or complex tasks in modules for later use.
  • Has a command line interface, thus it easier to do certain tasks rather than having to write a whole program.

Cons

  • The documentation used to have a lot of “To-Do” references, to remind someone to finish that. But it seems largely complete now except for certain specialized functions.

Puppet

Configuration files are called manifests. There are many publicly available at Puppet Forge. Their hierarchy of objects are Classes, Modules, and Resource types, showing that it too is a full-blown programming tool and thus able to use all those modern tools.

Pros

  • Uses very little memory
  • Puppet has a CLI (command line interface)

Cons

  • Requires installing an agent on each server
  • You need to give the Puppet user root access on each server. So, closely linked to the agent issue above, you either need another automatic tool to do that on each server in the cloud, you can do that manually, or you can include that in the template used to configure the virtual machine.

Chef

Chef configuration is also given in Ruby and JSON. JSON is static configuration. Ruby gives the user dynamic objects and methods to add logic to the configuration. These configuration files are called cookbooks and recipes. Open source Chef Community Cookbooks are available online.

The system is broken into the Chef Development Kit (desktop workStation), Chef Server (central hub), and Chef Client (target machines).

Pros

  • Like the other packages, this one integrates nicely with github, which is the industry standard for storing source code.
  • Has a desktop client, for those who prefer working with those.
  • There is a quick start open source automation kit to speed the installation on Amazon AWS.

Cons

  • The system needs 8 GB of memory. That is a lot when you consider that the t2.large AWS template only has 8 GB of memory and servers bigger than that are quite expensive. (Servers in a data center typically have small amount of memory compared to the average laptop. And the prices for memory go up quickly.)

Visualization tools

Most programmers would prefer command line tools to a dashboard, as that is how they are used to working. But dashboards are helpful to give a system graphical wide view.

For example, here is one from SaltStack:

Salt stack

Graphic source: SaltStack

And the Chef Management Console:

iac image

Graphics source: Chef

The cloud and IaC

The logistical issues related to the move from on-premises servers to the cloud has created issues that Infrastructure as Code are well suited to handle. So has the migration from the decades old waterfall approach to software development (with its long lead times for milestones) and its replacement by continuous integration, continuous delivery, and scheduled delivery of software with Ansible.

These systems only work when automation is used at each step of the project. Testing is automated more today than in the past, so it's normal and natural to add infrastructure to that automated set of processes. What was lacking in the past was development languages to do that. Now, with the cloud, those have become available.

While cloud adoption has pushed along the need for IaC, the rise of Software as a Service could reduce them as organizations move infrastructure out of Amazon, their own data center, or others, onto already-configured platforms. For example, many vendors, like MongoDB and ElasticSearch, now offer cloud versions of their products. That means responsibility for the maintenance of these systems falls to the vendor. The end user will find their access to the operating system cut off in many cases.  

IaC challenges

There are certain technical and organizational challenges with IaS:

  • The technology is constantly changing. New products have a steep learning curve. For example, containers are relating virtual machines. Some of these come with their own orchestration tools, like Docker Swarm or Kubernetes. So the IaC programmer needs to learn those and then to use them with SaltStack, etc., or replace that with the container orchestration tool altogether.
  • Not all people in the organization are able to change quickly. Once they have learned one approach to solving a task, they might be reluctant to learn another, especially if they are expected to teach themselves.

Getting started with Infrastructure as Code

Here we have laid out the use case why organizations have adopted Infrastructure as Code and explained how IaC replaces old fashioned and incomplete scripts with modern programming languages specifically written for infrastructure management. These are a mix of static configuration and dynamic commands that speed the configuration and management of machines and containers using objects and methods, but without having to write a complete, full-blown program.

You could set up a data lab to work with one or more of these products and set your analysts to work looking at each. Then they can make an informed decision as to which might work best for your organization. But remember that each of these tools has something of a learning curve, so build time into your schedule for acclimating.

When you’re ready to move take the next step, make sure your data is ready too. Talend Data Fabric is a suite of apps that integrates and manages all of your data — easily and securely. Get more information and learn how Talend’s training and consulting services can get your data lab set up to empower the use of IaC in your team.

 

| Last Updated: May 30th, 2019