Infrastructure as code

This week at work I got to define the infrastructure as code for a new greenfield project. I've taken over as principal / lead on infrastructure matters as I seem to be the most comfortable, but I'd also like to get others involved.

Our older infrastructure as code was hard to work with and reason about. It had some interesting design choices, but if I'm honest, none that I'd be comfortable using outside of an environment where they exist.

Table manners

Like any API, you should not start trying to push your own preferences within an existing IaC repository without team buy-in, and wider understanding. I like simplicity, was new to several technologies, including terraform and concourse.

A Single lead engineer and another senior that is not interested in backend or IaC are the only two people that have shown any interest in simplifying code. I don't want to create things that the team cannot work with, but I also don't want to split their brains within a single repo.

Ideally, your organisation should move towards a single coding style. To be fair my team does. Most others want to drape a thing in the most syntactic sugar possible, and throw a link to an article over the wall so you have to waste just as much time as they did.

Seizing Opportunity

I knew I was owning the epic, and that would mean I could make some moves. The team has shrunk since I onboarded, but we have some new junior, perhaps mid-level engineers that can own frontend and backend code, so I if could introduce them, then it spreads skills, and ensures that others can own details.

People not knowing enough about what is, or where a thing is, is another problem. I don't think it was intentional, but sometimes I struggled to understand why what was there when I arrived was. I didn't get more than 3 months, during which I was balancing probing questions with building team ties.

I pride myself on being time focused. Both my time and teammates time. If their time will be greater, they better be improving on a thing, or learning some computer science. At the end of the day, none of us have unlimited time.

Defining goals

Besides books, video's and others internet code; I didn't have any terraform experience prior to taking this role. What I did have after a year of working in that codebase, amongst others, is knowledge of things the team wouldn't budge on, and knowledge of what their pain points were.

  • Simpler code organisation between modules and components.
  • Fewer modules. Why define an S3 bucket outside of usage in multiple places?
  • Simpler connectivity between parts (try just using terraform).
  • Defer to companies with the staff and expertise to manage things like databases.

This may sound a little defeatist for hardcore infra fans, and it's certainly not the abstractly correct way to work, but it is pragmatic. Our team has so many more important things than infra. Maybe I won't be happy when providers inevitably screw up and our answer is to migrate from a backup, but there is less likely to be staying late to stop the business dying. It's happened twice and TBH although I don't mind the praise, I felt like it was due to laziness on the part of the business. That does not sit well with me.

What I did

For the first iteration, I shipped in a single day, all the persistent infrastructure we'd need by shrinking what we own.

  • The greenfield app we are building uses heroku as a core platform for app instances.
  • I'll be able to use heroku cli to create persistent application services.
  • We'll stick to AWS for some things, like DNS and for-now email.
  • We'll be using third-party, verified terraform where possible with gitsubmodules as a lightweight vendoring tool.
  • Our secrets will need to live in encrypted files. I borrowed existing OpenSSL repo-secrets in a private repo.
  • I've upgraded terraform to the latest version along with providers.
  • Modules default to a focused and small by default, with best practices able to be controlled.

I have a background leading software projects. I know that when too much changes, risk of not delivering goes through the roof, so I'm always a staunch advocate of the path most trodden in areas that do not matter. For most businesses the infrastructure configures commodity services, and so although it's integral, it's not important to be different in our infrastructure.

A bit more detail

I also setup terraform components in stages, a bit like a bootloader. I'm unsure how much people will like that, but it does keep the code simpler.

To give faster feedback and avoid 400 errors, which I find a pain in the backside; I'm using data attributes per-stage to ask for details of services defined in prior steps. This helps keep things simple. Creating, Modifying and deleting infrastructure can be done with Terraform apply, allowing simpler, focused, reversible changes per-commit. If something does not exist; it will raise a 404, which the very sparse documentation advises means a prior stage needs setting up first.

Staged components

  1. Bootstrap component, which for-now sets up a root DNS.
  2. Stage 1 components, things requiring or logically following a root DNS.
  3. Stage 2 components, things requiring or logically following stage 1 component parts.

There are lots of things I could do, such as sprinking modules with depends_on statements. Because we don't own all our modules and it's likely to take some time to achieve upstreaming, I've avoided that. I believe the 404's will help us know things a stage on are missing.

I did have to upstream and fork on an SES -> S3 service which will be a placeholder for a formal gmail. It's a tiny change to allow me to take control of the SPF DNS record as there cannot be multiple, and the module sets up an SPF record, along with DKIM.

To Recap

  • Know the scope of works.
  • Aim to reduce the moving parts.
  • Try to think of your team.
  • Remember table manners, and don't rush to change without context.
  • Defer to service-providers, but try to do it in a focused way, reducing entities, consolidating billing.
  • Document choices and be open to feedback.
  • Have components be high-level short-form syntactic sugar for your modules.

I've a talk to give at a local group Wednesday coming, some pairing sessions with coworkers, who I hope will be more interested in less cumbersome IaC, with sane defaults, the ability to override and control without forking modules. Want a CNAME app origin with 4 hosts weighted? Send in weights and 4 names to the DNS record module with cname type and the TTL you desire.

Hopefully this helps others.