Written by Adam Dorwart, an AI Engineer at Shield AI
The word DevOps, short for Developer Operations, has become an overloaded and overused term. If you ask a sampling of your colleagues what it actually means, you'll probably get vastly different answers. This post aims to clarify some of the confusion from our vantage point here at Shield AI.
First and foremost, DevOps is a philosophy. You may see some people with the title of "DevOps Engineer," but more than a role, DevOps is a culture, a way of thinking about how you operate as a software enterprise. The origin story of DevOps is that software products had distinct phases of development and operation. Software Engineers would build the product, which would then be handed off to Operators or Sysadmins who were responsible for deploying the product and keeping it running. These tended to be two very different teams with two very different sets of interests.
Software Engineers wanted to build a better product and ship features quickly. Operators wanted to keep the product stable and reliable. This created silos and tensions between the two groups. It also had the potential to sow division, which could hurt the team and the bottom line.
DevOps to Save the Day
One goal of DevOps is to tear down silos of competing interests. Naturally specialists form in the various phases of the software cycle.
The solution to the possible silos which can form? Deliberately bring disciplines together. Google made this realization in 2003 when Ben Treynor created the Site Reliability Engineer (SRE) role as an implementation of the DevOps philosophy. The premise of an SRE is to task a software engineer with operating a product. This type of multidisciplinary bridge was crucial for Google to scale to running billions of containers in a week with high availability.
DevOps is Kaizen
But I still haven't given you a concise way to describe DevOps, so let me try:
DevOps is a mindset of investment of improving your velocity and becoming more effective at creating quality software, typically through unifying disciplines across the phases of the software lifecycle.
This has many parallels with Kaizen, the activity of continuously improving all functions made popular by Toyota's manufacturing line. In a way, DevOps can be seen as simply Kaizen focused on the manufacturer of software.
Applying Kaizen to your Software Operations
You might think of DevOps as a purely technical pursuit, but when you approach the goal of improving your velocity, you may find that your culture can have a huge impact. I’ll detail a couple of ideas that we’ve found to be foundational.
PRINCIPLES OVER PROCESS - PREFER TRUST
Processes work to bring order and help your team navigate a vast array of choices when solving common problems. They provide a "beaten path" to both reduce cognitive overhead as well as to help you to steer clear of the pitfalls of the past.
With time, the problem set and space is sure to change, so it’s important that processes are routinely questioned and revisited. More important than the process itself are the principles which guide you down your path. Acknowledge from the outset that change is inevitable, and stay efficient by making sure that the guidelines are crystal clear. This clarity, in turn, enables you to trust your team to make the right call when deviations are made because you are guided by a shared vision.
Here are a few principles we use to guide our process:
Don’t break shared code - Most are already familiar with the concepts of continuous integration but CI systems can become complex over time. Have mitigation strategies if your CI isn’t working and don’t let a pass or fail badge stop you from thinking about the effect of a change.
Maintain elegant code - Beyond the code functioning we believe the quality of your code base strongly correlates with its correctness, performance, and maintainability. Broken windows theory applies to software as well. Encourage static analysis tools and style guidelines. Make it clear that care was given for the codes function and form.
Prefer solutions rooted in fundamentals - Your code base will adapt over time. If something doesn’t meet requirements it’s usually better to start with something that was designed to do so, rather than adapt something deficient with a mess of changes. We believe that doing even the hard things right is how you go fast over the long term. Address technical debt routinely.
Automation Does Not Replace Critical Thinking
Automation is a crucial tool for anyone focused on DevOps, but without care, it too can fall into the same traps as process. If process is a beaten path, then automation is a Ferrari on a paved road. You still need to question whether this is the right road. A good automated tool can hide the pain for a period, but the little things add up, so reflect regularly upon when your tools need to be adapted.
An example of this that we experienced recently was a problem with our Kubernetes CI cluster. We started seeing an increasing number of slow or failed builds caused by network timeouts. Things worked well for a while because of the robustness of Kubernetes on capable hardware. As it turned out the root of the issue happened to be a combination of the Gitlab and Kubernetes scheduler causing some large jobs to land on overloaded nodes. We made an assumption that the scheduler wouldn’t do something like this but after investigation it was clear it needed a bit more help. We applied additional scheduling constraints and resource requests that were better tuned for our use case and we’re seeing a dramatic increase in reliability and more sensible load distribution.
It’s important to catch issues before they become large problems. Measure the effectiveness of the tool and also the time spent interacting with it. Quantifying these things can be challenging -- and is the subject of a whole other blog post! -- but starting simple with just a feedback form can go a long way toward getting you started. Use these metrics to help guide your focus.
Breakdown Walls, Encourage Autonomy
Trust in your team based upon clarity of principles and shared vision is key to moving efficiently. Be transparent at your default and encourage feedback always. At Shield AI, we’re familiar with the challenges that can arise from needing to track and control who accesses what. Strive to be transparent in what you do given business imperatives and seek feedback. Even if the code you write can’t actually be open source, maintain a mindset of encouraging collaboration and be open to contributions from as many team members as possible. Being more inclusive helps prevent “us versus them” thinking on your team. Encourage direct communication whenever possible, so information gets to where it’s needed ASAP. If you find gaps forming, DevOps mantra says to apply focus and task people to work on both sides of the challenge to build a bridge.
When you’re growing fast it’s imperative that you leverage the intellect of your entire team. Some may say that bridging gaps is the work of a DevOps specialist, but demolishing silos and forging new paths is everybody’s work.