Docker - Pinning container dependencies
Recently I have been receiving a lot of third-party docker setups from third parties.
Some have been for projects I use or work-on in CI environments. Some are part of client projects. Most have some things in common.
- They use a fraction of the useful features of Docker.
- They can bundle lots of parts into single images.
- They are not explicit in dependency management.
This post is going to deal with dependency resolution. What to do, and what to try to avoid.
The mistakes
Publishing only to latest
When people publish docker images only to latest. There is no way back from accidental push. This is by far the easiest problem to overcome. Here are some tagging strategies.
- Deploying based on date-time.
- Deploying based on release version.
- Deploying based on code-name.
- Deploying based on commit ID.
Of the above I believe that while code-name is the coolest, it is rarely applicable. For the same reason date-time can be too verbose and requires organisational knowledge of a complex web of systems and dates of deployment.
Commit ID deploys rely on source code access, leaving only release version.
Relying on Dockerfile
There are two primary ways to use Docker. One is to use a pre-built image, and the other is to build a Dockerfile. Make no mistake. As with any recipe, building a Dockerfile involves more risk than using an available image.
- With a pre-built image, you can export and import a known-working artifact.
- With a Dockerfile, you can understand some of the "how they make the sausage". It is not always important. It can bring false confidence.
This means a few things, including that if you do not control the source of distribution, you are taking on risk.
Single-step builds.
After a few versions, docker came up with the concept of using a single Dockerfile, for a single output via multi-stage builds. Quite simply you can build several pieces of software in various (optionally) named stages. You can then use the COPY command to retrieve build-artifacts. This results in a smaller distributable image, but also saves some of the odd and confusing notations that trying to build, install, clean-up and distribute led to. You can now name each step, so intent is clearer.
Bringing a Virtual Machine mindset to containers
Containers are not the same thing as virtual machines. They follow the same logic, of deploying a general artifact. But that is where it stops.
Some things about Virtual Machines
- They deploy complex environments, consisting of multiple complex parts.
- They duplicate System-level components such as OS Kernel.
- They require specialist hardware support to get near-native performance.
- Limited support via extra tooling for host to guest.
- Guest machine uses isolated permissions, firewall, everything.
Contrast with containers
- Designed to deploy single processes with dependencies, configuration.
- Designed to have limited permissions.
- Designed to support per-instance external configuration management.
- Shares system components.
- Generic hardware support at virtually native speeds.
- Deploys in any Linux environment, meaning a VM for Macintosh and Windows.
Good practices
Using environment variables
One of the amazing things for me Docker has led to, is a reliance on environment variables, rather than complex configuration files for setting up software. Backing up a little... This was not a Docker invention and I am not pretending it was. But prior to containerization, you had businesses where staff knew about and believed in 12-factor apps, and you had the complicated mess that happens if you do not.
Prior to 2012 my own apps were usually distributed using complicated configuration files, which had to be manually built, and verified in a labour-intensive process. People just largely did not know better. While not a panacea, environment variables allow applications to at process-launch set information related to running the application. Perhaps you want to test a MySQL database in one container, and Postgres in another. The point is you can deploy one set of code and change the configuration per-unique instance. This has enabled some of the greatest choice and reliability in modern software.
Using build arguments
I have written about the difference between build-time arguments and environment variables in another posting, so I will keep this brief. Essentially you can pass values which are not part of the image, but only used at build-time, protecting against shipping secrets, and allowing even more customisation.
One of the most useful things I use build-arguments for is changing a base-image to build from. If I want to know if my Dockerfile will need maintenance between different runtime versions of Python, Ruby, Node, PHP, C or C++ compilers, Java. I Can then easily build a new container passing a different argument. I use this pattern for a self-hosted CI pipeline I started for CODESIGN2. An exercise in increasing my own understanding. Practicing Lean costing and something I really enjoyed. You can also use these to send in secrets such as third-party privileged access, without sending secrets to all consumers.
Avoiding process managers, certificate copies, etc.
Now this is more advice than a hard and fast rule, but if you find yourself copying certificates, or shoehorning process-managers into Docker, it might not be the right tool for the job. Docker itself is a process manager among other things, so you are really working inside of a "yo-dawg" meme at the point you go this far.
Similarly, the sidecar pattern of having a complimentary Docker container to manage this complexity, can make troubleshooting easier. What this might look like is an internet service which does communicate using TLS. Instead, it might use headers, as it might in a proxy environment.
Tools like Kubernetes make this coordination between components quite easy, but you can get started using tools such as docker-compose, which allows you to set multiple networks, where perhaps you do not need to juggle ports or get messy to have shared-nothing local architecture, allowing you to spin-up multiple complementary micro-services.
Keeping your own registry / export backups
The ideal is to have a copy you control of any service that you depend on. If my entire application rests on using a Postgres database. I should likely either pull from a local docker repository, or regularly export from my local pulled image, so that in a disaster situation, I am able to secure the continuity of my business or organisation.
This topic could get a post of its very own, and perhaps I will publish one, but the mark of a mature Information Technology service provider, is having a degree of control if third parties for whatever reason become unavailable.
Maintaining your own Dockerfiles
Even if the content of your own Dockerfile is just to pull from an upstream, with no changes to filesystem, or environment-variables. I Recommend keeping a copy.
I Did not always recommend this. When working at Kalo I actively pushed for us to own less Docker, due to the size of the team and time constraints. Some individuals had a propensity to add unnecessary complexity into Docker images, which added complexity elsewhere. It is still a new technology, so I expect more nuance around this in coming years, but I have been using Docker since 2013.
Keeping a non-tool specific textual record for building images saying where to pull-from, means we would have easily discoverable dependencies, which can help when dealing with change. It would also mean that we could add commit rules, which are a better tool for stopping people adding complexity.
In a recent interview task, I was handed a Docker assignment, where pulling from upstream led to a difference in runtime between two containers. One was for running a tool for a language, and the other was to run the program after a build-pipeline. This led to over an hour of "Why is this broken in one but not the other?". If you can save co-workers and or your future self this pain. Please do.
Using Kubernetes or docker-compose
As much as part of me gets quite upset at the complexity of tooling. Launching docker processes on several machines, checking resource utilization, etc is a complex task and really, it should be the job of a tool such as docker-compose or Kubernetes.
Personally, I have aired on the swarm, docker-compose camp for years, however I am now coming around to using more modern tooling, even in-house so that I can get access to many types of machine, overloading my laptop, all without standing up, configuring things every time I need them. I have tried manual until it hurts and have confirmed the pain of not using something like this. Especially in microservices or multiple client accounts, which I used to run via Vagrant.
Closing
Hopefully, I have stayed on topic and talked about docker container dependencies. I have tried to keep this structured as a series of unique headings, so fragmention or the new chrome spec for text highlighting will work.
I Hope you enjoyed this. I Hope you have learned some thing or been given cause to introspect your own ideas.
Thank you for reading!