Bitfield Consulting

View Original

A reality check about cloud native DevOps

John Arundel is a consultant and author of Cloud Native DevOps with Kubernetes. We spoke with John on how adopting DevOps destroys walls, builds bridges, and influences company cultures for the better.

You’ve just co-authored (with Justin Domingus) a book entitled “Cloud Native DevOps with Kubernetes: Building, Deploying, and Scaling Modern Applications in the Cloud.” From the description of the book, it looks like it can take you from zero to hero in building and scaling apps the Kubernetes way. What do you think makes this book stand out from other books on the topic available on the market?

It’s the book that Justin and I wish we’d had when we started trying to deploy real apps on Kubernetes! The first and most important introductory book on Kubernetes that anybody should read is Kubernetes Up And Running. But what should come after that?

We wanted a book that answers the question “OK, I have Kubernetes; now what do I do with it?” We found lots of great tutorials and introductions and blog posts explaining how to set up a simple Kubernetes cluster. But it seemed like as soon as we started asking hard questions, the blog posts dried up.

We were trying to deploy real apps for clients; complicated, messy things which don’t look like a conference slide, and we had a ton of questions like:

  • Should we build our own cluster or use a managed Kubernetes service?

  • How big should the cluster be?

  • How should we deploy app updates to the cluster?

  • How do we configure replicas, shared volumes, load balancers, databases, IP addresses, DNS records, and all those fun things, and update them along with the app using infrastructure as code?

  • How do we set up the cluster and the app to be secure?

  • How do we integrate all this with our CI systems?

  • Which of the zillion Kubernetes add-on tools are actually helpful? How should we use
    them?

You get the idea. It’s funny that when you try and do something real with a technology, you realize how little you actually know about it. “Cloud Native DevOps with Kubernetes” is not a book about Kubernetes, in fact; it’s a book about doing DevOps, in a cloud-native context, where the de facto standard platform is Kubernetes. A few years from now that might not be the case, but
we can envisage a future edition of the book which teaches the same principles and practices, only using some other software instead of Kubernetes.

Looking at the table of contents one can notice that you’re going through a lot of Kubernetes-related topics that range from practical knowledge through monitoring, continuous delivery all the way to monitoring and metrics. How would you define the ideal reader of the book? A cloud native beginner? A senior developer that wants to learn about Kubernetes? A CTO who wants to get a bit more than the general overview of Kubernetes and its purpose? Or is it targeted at people with hands-on experience that will put the theory into
practice?

We discussed this a lot in the planning stages. It’s important to know your audience, or you may finish writing the book and find no one wants it. We’ve aimed the book at three main groups of people:

  • DevOps and infrastructure engineers who are responsible for operating applications in production, at scale, and who want to know where Kubernetes fits into this, what it can do for them and how to get the best out of it;

  • Developers who don’t have a dedicated ops team, but who need to deploy their apps to the cloud and run them reliably, scalably, without having to be an infrastructure genius. Kubernetes is great for this;

  • CTOs, engineering managers and tech leads who want to know the business angles: how is Kubernetes going to benefit my company? How is it going to help us ship working software faster, to more customers, with fewer defects, with lower fixed costs and better flexibility? What arguments will help sell the idea of adopting Kubernetes to my management?

We didn’t want to leave out anyone who hasn’t yet encountered containers, or even cloud infrastructure, so there are a couple of introductory chapters which explain the whole thing from scratch. We like practical books, so we included lots of hands-on tasks, like setting up a Kubernetes cluster on your laptop, building a containerized application in Go and deploying it to the
cluster.

Justin and I make a good team in this regard. While I like to explore concepts and principles, Justin is a very pragmatic, get-things-done kind of guy. So I think that creative tension has helped us avoid getting bogged down in abstruse technical detail, and keep the book focused on real, practical and everyday problems and solutions without hand-waving.

We’re both “show me the code” types, too; it’s all very well saying airily that you can do X, Y and Z with Kubernetes, but how, exactly? Like, what do I type to make this work? To that end, we’ve included usable code examples in just about every chapter, and they’re all available in a GitHub repo that accompanies the book.

In one of the interviews you define DevOps very broadly: “If you wake up in the morning feeling depressed about another day as an unvalued cog in a pointless machine, your company doesn’t do DevOps.” This almost sounds as if DevOps is a solution to a lot of issues connected with company culture. Could you tell us a bit more about this approach and why aligning your company’s culture with DevOps best practices can increase happiness inside the company and create business value?

Let’s leave aside the term “DevOps” for a moment and think about specifics. DevOps is often associated with infrastructure as code. We have lots of great tools and practices in software engineering that we can use to collaborate on developing code, to learn and share knowledge across teams and to ship reliable, maintainable code. Now that more and more infrastructure is defined, provisioned and managed by code; what’s the real difference between that code and the application software that the infrastructure is there to support? I don’t see one.

Another example: when some feature blows up at 3:00 AM, some poor ops person with a pager, who never saw this code before, has to get out of bed and triage it. What sense does that make? Isn’t the person who wrote the code the most likely person to be able to fix it? Shouldn’t we close the loop between writing features and understanding how they perform in production? I think we should.

We talk a little in the book about the future of operations teams. Kubernetes can automate many of the processes that kept ops people busy in the past, but that doesn’t put them out of a job. Far from it. Instead, they can now do more interesting work, rotating around development teams bringing their knowledge and experience to bear on high-level problems, teaching and coaching, designing and maintaining internal tools and platforms. These are some very smart and skilled people. If parts of their work can be automated, that means they’re free to do things which bring more value to the business. They’ll also be happier and more fulfilled, which is what every manager wants for her team, isn’t it?

If your company doesn’t look like what I’ve described, it’s worth taking some time to think about why not. Talk to your people and find out what they want. Ask them what would need to change to make them excited about coming to work. If they sense that you’re genuinely interested in fixing a broken culture, you may find you’ve got all the help you need, right there in your own team.

You’ve worked as a consultant for various brands ranging from medium-sized companies to enterprises. I know that every scenario is different, but what would you identify as a common obstacle that your clients meet in adopting DevOps as you define it?

First, any criticism of the status quo is implicitly a criticism of the people who created it. But it’s possible for good people to do good things with good intentions, and still end up in a bad place. As the late, great Jerry Weinberg liked to say, “Things are the way they are because they got that way.” If you don’t change the way you make decisions, you can’t expect different results.

Second, it would be nice to think that you could just wave a wand and everybody in the organization would instantly understand your priorities, share your values and radically transform things to deliver them. But that doesn’t always happen.

While it’s always desirable to bring about change by explaining, motivating, inspiring and guiding people towards new attitudes, there will be people who don’t agree with you and aren’t willing to change.

If all other options have been exhausted, you may have to gently explain to them: “This is what the future of this organization looks like. If you don’t see yourself as part of that future, it may be time to start looking for another one.”

In other words, if you can’t change your people, you may eventually have to change your people.

A third obstacle is what I call “consultant fatigue.” Failing companies often start trying different things more or less at random, flailing around and half-implementing half-understood ideas. Is it any wonder that people in that situation become cynical and pessimistic? They start to develop antibodies to management medicines. As one group of consultants parachutes in, changes
everything, creates chaos, fails to deliver real results and is replaced by another group, the organization stumbles on from one missed earnings forecast to the next.

Sometimes all it takes to put the light back in people’s eyes is a gleam of hope. Hearing about companies where things do work can make a big difference. When I talk to my clients about teams I’ve known that are happy, productive, and high-functioning, it can help dissolve years of encrusted cynicism. “It doesn’t have to be this way” is a powerful sentence. It can open minds, and
doors.

In Build bridges not walls: DevOps is about empathy and collaboration you write: “Developers, you might think your job ends with a ‘git push’. But software that doesn’t work in the real world is a waste of bits. (…) The truth is there was never a neat line between dev and ops. The overlap is precisely where things get interesting.”

You’ve been in the industry for some years now and how would you describe the current state of developers sharing responsibilities with their infrastructure colleagues? Does it work the other way round — with infrastructure teams doing pair programming with their dev teams?

The future is already here, as William Gibson noted; it’s just not evenly distributed. I know, and have worked with, many teams that are really flying high. They have wide-ranging and complementary skill sets, covering all aspects of development and operations. They’re constantly teaching and learning from each other, by pair programming, code reviews and other means.
They collaborate in powerful ways, making sure everybody sees the big picture as well as the tiny piece they’re working on, and making everybody feel involved in decisions. Instead of little code fiefdoms, stoutly defended by local warlords, and which no one else dare touch, there’s a sense of collective ownership. When you feel valued, and you’re thoroughly invested in the success of what you’re all building together, there’s no need for consultants anymore.

It’s at this point that I like to quietly slip away. There are always other teams who need my help.

With Kubernetes being complex and giving birth to many “follow-up-ecosystems” plus the entire cloud native technology landscape growing at an incredible speed is it even realistic to demand from development teams to keep up with the upcoming novelties in how the software they build can be deployed and maintained? It’s all about dividing responsibilities between teams, but this again might lead to working in silos. Is it possible to close the loop between people who create software and deliver it?

Can everybody know everything about everything? No, indeed, and they shouldn’t try. I have a general idea of how engines work, and I know enough to be able to drive my car to the store and back. But I couldn’t build an engine, nor could I discourse insightfully on the relative merits of gasoline, hybrid, or even Wankel rotary engines.

The same applies to Kubernetes. Not only do you not have to know how it all works under the hood, it would actually be a waste of your time to learn (unless you’re a car mechanic). You should do only the things that only you can do. “Undifferentiated heavy lifting” — time-consuming work which doesn’t relate to your core business or unique strengths — should be outsourced. We argue strongly in the book that unless you’re in the business of running Kubernetes, you shouldn’t be in the business of running Kubernetes.

To overload that metaphor slightly, the Cloud Native DevOps book is a road map, not an engine manual. It’s about where Kubernetes can take you, not how to change the spark plugs. Yes, you’ll need people on your teams who are experienced and skilful at operating Kubernetes clusters and getting the best out of them, in the same way that you need people who understand programming languages and compilers or databases and load balancers. But these are all just tools. They’re not the thing; they’re the thing that gets us to the thing.

I noticed that the chapter on choosing a Continuous Delivery tool is missing a solution called Semaphore. What can we do to be included in the next edition of the book?

As a matter of fact, we’re big fans of Semaphore, but we faced a dilemma when writing this book. If we just listed all the tools available, that would be a ridiculously long and unhelpful book, not to mention instantly out of date. Similarly, if we just picked our favorite tools, we might justly be accused of leaving out some really important alternatives. And if we didn’t mention specific tools at all, we would leave a lot of questions unanswered.

What we’ve tried to do, therefore, is sample. In each particular category (managed services, Kubernetes add-ons, deployment tools, CI systems, metrics servers and so on) we’ve chosen to highlight a few representative examples, to give readers a flavor of what’s out there, while making it clear that there are many other options. I apologize humbly to anyone whose favorite tool or product we had to leave out for reasons of space.

It’s worth saying that the GitHub repo, which contains our code samples, is available as a free download, whether or not you buy the book, and it’s open source — we welcome issues and pull requests to add support for tools and options that we didn’t have space to cover in the book. For example, you could add a CI configuration example for Semaphore, a demo application in Python or Terraform examples for Microsoft Azure. We’ll then have a really comprehensive set of code samples which everybody can dip into and use for inspiration, or adapt to their own circumstances. 

Please check out the repo and let us know what you think. We’d love for you to help us make it better!