All Things Pivotal Podcast Episode #23–You are Going to Need a Platform, Operational Concerns of the Third Platform

April 8, 2015 Coté

featured-pivotal-podcastAndrew Clay Shafer and Coté talk about one of Andrew’s works in progress, a short essay on the operational needs of a cloud platform. Based on a comprehensive view of devops, the essay is an early release of 12 high-level, generic requirements needed to operate applications in the cloud, deliberately shedding monikers and acronyms that name specific tools or technologies. In this context, Andrew and Coté cover people, process, technology, culture and reference plenty of top notch thinkers.

 

The list so far is:

  1. role based access to resources—the right people should be able to do things, and the wrong people shouldn’t
  2. run specified bits on demand—take code, put it together with all the rest of the things it needs and get it running
  3. coordinate cross service configurations—in a service oriented world, services need to be configured to connect with each other
  4. route public requests to running bits the next big thing needs access to the Internet
  5. read and write persistent data—data has to live somewhere
  6. add and remove resources—scaling is a great problem to have, but still
  7. record internal and external events—keep track of what changed, when and by who
  8. isolate resources and failures—without isolation and decoupling, that is one big distributed single point of failure
  9. measure performance/health—you can’t manage what you don’t measure
  10. detect and determine failure—sometimes, things get real… but how do you know
  11. recover failures—someone is going to have to clean this mess
  12. work tomorrow—when everything you’ve thought to be true has been shown not to be

 

PLAY EPISODE #23

RESOURCES:

Transcript

Coté:
Hey Andrew. How’s it going this week?

Andrew Clay Shafer:
Just another week in paradise.

Coté:
You know, I was thinking. There’s something I wanted to ask you. You have, like myself, pretty broad experience in the IT world over a couple of decades now, which is odd to think about. I think you’ve had your head in a slightly different silo than I have, in the early days. I think you can answer something that’s been bugging me. What is up with people who don’t use the shift key? Like they never capitalize anything. They have good punctuation. They don’t capitalize the beginning of sentences. Am I over thinking this? Is there … I feel like a lot of technological people that I know, I shouldn’t say a lot, there’s a fair amount of them who don’t capitalize things. Because they’re sort of like programmers or operators, I know that they pay close attention to syntax. I feel like it must be a conscious choice. Right? They must have decided, sort of like: one day I decided I’m not going to put 2 spaces after a period. Done. I never do that. Right? So, I ask you again, what’s going on with people who don’t capitalize things?

Andrew Clay Shafer:
I think it’s just a hipster…I sometimes do that. It actually is conscious.

Coté:
See. This is … I’m not passing any judgment. I have no judgment to pass at all. I’m genuinely curious. When that affectation is applied intentionally, what the semantic thing is going on there. Is it, what’s that fancy word, like the study of symbols? Symbiotic? There’s some sort of symbiotic thing going on there.

Andrew Clay Shafer:
Quite frankly, I don’t know how philosophical you want to get, in an encore performance here, but I don’t actually see the point of capitalization. Seems redundant.

Coté:
Now, that’s a statement right there. I like it. That’s something meaty. I think, this has been my theory.

Andrew Clay Shafer:
With punctuation and spacing, what purpose does capitalization …

Coté:
Yeah. Yeah. It can all be inferred, basically. Right? You know. This would also highlight why, if this was like 2002 and we were complaining about the kids with their T9 texting, that would be a whole other discussion of no capitalization. In this case, I think it’s this subset of people who are technologically inclined. I feel like the answer you just gave is probably what’s going on with a lot of them. It’s like, I want to have an economy in my writing that strips out anything that’s unnecessary.

Andrew Clay Shafer:
Yeah. It’s a protest against hierarchy.

Coté:
Namely the hierarchy of typefaces that are taller than others.

Andrew Clay Shafer:
Exactly. We don’t need a class system.

Coté:
We need a class system. That’s an entirely different type of class system we’re talking about. Not separating things out in their value. More logo space class system. So, on that topic … So, this week, you have a blog post that, or let’s call it a musing …

Andrew Clay Shafer:
Assuming that I actually finish it.

Coté:
This is true. Speaking at just easing at transitioning from semantics. Something only becomes a blog post if it is posted on a blog. We should be careful in calling it that. There is a, let’s call it an essay, if you will, that you’ve been writing. I think we’ve actually used this phrase a few times. It’s, basically, like whether you want to use contractions or not. We won’t get into that discussion, but you’re going to need a platform. You sort of read over it. It highlights a couple of things I wanted to talk about. The point of the piece is, basically, if you want to do all this fun cloud native application stuff, that is write software, in the contemporary day, there’s a bunch of stuff you’re going to need, operations wise. Let’s kind of go over those.

It’s beyond just application concerns. I think one of the reasons that I like this ever evolving essay is it starts to take a swag, if you will, at the moment, 11 sort of things that you care about, on the operations side. Before we get to that, in thinking about this, I was reflecting that despite the fact, and you know to sort of intentionally troll a little bit here, despite the fact that you work at a platform as a service company, you’re always talking about operations. You always bring the conversation back to that. I’m curious, why operations is such a, why you think it’s such a vital conversation, in the context of delivering software?

Andrew Clay Shafer:
First, let’s go back a little bit about why I even started thinking about this and started writing this sort of thing down. There kept being people or discussions, some of it’s blogs and twitter, that were making statements. Obviously, there’s marketing departments and hyperbole involved about how this technology or that technology means you don’t need a platform. This also goes back to the last rant we had together, where I was, I don’t like some of the ambiguity in this language around platform as a service and PaaS and what those actually mean. How people are often making false equivalents comparisons. I wanted to really articulate that your choice is not between having a platform or not having a platform. There’s obviously options in the types of technology you’re going to choose. Your choice is really between having structured platforms, standard platforms, or having ad hoc platforms that you put together with some other pieces.

Coté:
Then, so, this word platform, in this context, what are you meaning by platform?

Andrew Clay Shafer:
Well, yeah, this goes back. We kind of went through this a little bit last time. To me, when people are talking about platform, as a service, and this also fits the NIST definitions, that is a application, a service, if you will, that has APIs that allow you to deploy some code into some environment. Then, hopefully do some other things to manage it, scale it, etc. I put a lot of emphasis on operations, I think for a few reasons. I never really aspired to be a software developer. I kind of found myself a software developer.

My track was in mathematics. I liked solving problems. People told me they’d pay me to program computers. So, hey, what are you going to do? You’re in this work. You’re seeing what’s going on. I happened to work for a start up that had some kind of e-commerce platform. Numerous times, when I was quote/unquote, a developer, where I came to work and there was a bunch of emails because something that got deployed the day before or the night before brought the site down. The people on the east coast had woken up and they couldn’t sell their things. They were upset. I started to, as at least, the people that respect the most in the industry, I started to really care about the combination of, not just the technology, but also this value that is being delivered.

I just started to realize, over time, that there’s a bunch of things that have to do with quality. There’s a bunch of things that have to do with operability. That was where the value’s created. People over rotated in my opinion on this developers, developers, developers narrative. I don’t actually believe that developers are the new king makers. I’d love to debate your former colleague on that point. At the end of the day, if you run everything for time, let’s say you believe that a service will exist for some amount of time, then it costs you money to operate it and maintain it. If that fixed cost of operating over time is high, then, it will quickly dwarf whatever time you spent developing it. On some time line, it will always surpass whatever time you spend developing on it. Yeah. There’s all this pressure to go fast. There’s all this pressure to get value. At the same time, you have to understand that full system and the cost. That’s one of my hallmarks of my career, so far.

I’ve really tried to be able to think holistically about systems. That’s not just the systems of the computers, and the disks, and the networks, and how all this stuff fits together, but the inputs and outputs, with respect to the business and the customers and the messaging and how all these things feed back on each other. Just, over time, that became a focus. I obviously got involved in Puppet. That really shifted the domain that I thought about all the time. From software e-commerce to how do you manage all these machines, how you think about deployment, how you think about the interface between developers and operations in a way that makes it so you have this shared boundary object that was often, especially early, some puppet code. If you have, this is kind of where the DevOps narrative emerges, if you have a bunch of friction, if you have a bunch of kind of impedance mismatch between those responsibilities, then it can really slow you down. It has impact on stability of the system. It has impact on the speed of the system. And, frankly, impact on quality of life. You say you focus on the operations. That just means that if you carried a pager, you don’t really want to wake up at 3 in the morning. You don’t really want anyone else to have to do that, either.

Coté:
Yeah. You know, you’re hitting on something that in the n-th time of reading through your little essay that was occurring to me early this morning. There’s two straw men of operations think, in my head. One of them is like, it’s almost like black box operations, where there’s this black box. My job as an operations person is to make sure that the lights are all green and the box is okay. Someone gives me this box. I sort of run it and make sure it’s okay. Every now and then, I’ll have to hack around it and figure things out and stuff like that, which is the bulk of what I’ve seen in my career in systems management. Like, the operations people, they don’t have this power of “no” that the DBA and the security person has, where they can basically just say, “No. You can’t do that. Done. Meeting’s over.” It’s more of the responsibility is just to keep this thing up and running, if you will.

There’s this other, more DevOps-y, if you will, or just healthy, where you’re doing operations where you’re not really like a Dr. No, but you, as an operations person, are being prescriptive about what a sustainable infrastructure is to the developers. Which is to say, “I don’t want to restrict your freedom, developers, and tell you you can’t do this thing because it’s not secure or you’re going to mess up my model in a data base. You realize when you’re running this under situations x, y, and z, which may be a huge amount of scale that you need to achieve or making long term viable, there’s some issues that are going to come up. Let me just pro-offer what you might see as constraints, but things you need to make sure that you do. Boxes or sandboxes that you should be running in and operating in.”

At that point, operations has to sort of speak to developers in a way that they understand how they should be writing their application and boxing up their application, so to speak, so that it can run long term. Then, it doesn’t hit these problems that you’re saying, like, “If you write the application this way, then we’re going to have to buy a huge data base license. That’s cost prohibitive.” Vice versa. That’s over the past 15 years or so, the shift that I’ve been slowly seeing at the large scale, where, it’s almost like … Again, this is why DevOps is such a good metaphor for it. The operations team is coming in and helping program what the application looks like. Otherwise, you know if there’s no operation concerns in a developers head, they just sort of go crazy. It’s buffet madness.

Andrew Clay Shafer:
Those crazy developers.

Coté:
I say that as a former developer myself. I remember when I was at BMC a long time ago, they made us take a 3 month tour of 3rd level support, as developers. Then, when you came back, day 1, after that, you were like, “Whoa, it’s time to write my code a lot different.”

Andrew Clay Shafer:
As an aside: If I was in charge of a company, I would make everyone, every exec, every everything, do at least 1 month in support, as part of onboarding.

Coté:
Exactly. Then, to stop my rambling to get back to your essay, this is part of what I try to communicate to people now days, your build isn’t done until an end-user is actually using it. Right?

Andrew Clay Shafer:
I would say your software is not done, this is what I would really want to be in every developers head, if I ruled the world, “You are not done, software is not done, until it’s end of lifed.” In a service oriented world …

Coté:
Indeed.

Andrew Clay Shafer:
If it’s running on a server somewhere, it’s not done. You’re not done.

Coté:
That’s right. This is the thing to communicate. Hopefully, you can stick in a developers head, as well as everyone else involved, that you care about the full life cycle of this thing you’re creating. Not just your little component of it. I think that’s a very small start to how you orient yourself around it.

With all of that …

Andrew Clay Shafer:
You know, we covered a lot there. Let’s kind of go back to the beginning. You started with this idea that there was a time, and I still think there’s some places where this is true, where you just have this thing set up. As long as the light is green, you’re golden. Frankly, that’s how a lot of industries still think about some of their infrastructure. Some spaces, maybe like the telco space, that’s actually still true for a lot of their components.

When you get to the big web … One of the things that we’re talking about here is this transition to the quote/unquote the 3rd platform, which most people seem to be confused about. You can Google it and read the web results for yourself. There’s this transition that happened. Really it’s in that 15 year framing that you just gave us, where the beginning, software was shipped on some physical media. Software was this thing that you rushed to finish and eventually put on to a thing that would go and run on someone else’s computer, to transition to this software as a service oriented world, where the vast majority of software now, even if it’s going to run locally, you don’t ship it on a CD. It’s going to be through an app store. It’s going to be through some download. The physical media, transitioning from physical media to an as the service world, changed the context.

On one hand, you didn’t have to fix what it was. In time, you could update it more frequently. This enabled some of these things we’re seeing with continuous delivery. Enables some of these other options. The double-edged sword of that is now it has to run on your computers. Now it has to be there. If your servers aren’t up, you’re software does not exist. The building of these systems is, I wouldn’t say that it’s not scientific or engineering oriented, but there’s a mix of internal and external forces, that are making these things living breathing, complex systems. If you think you can just back away and wait for the light to turn red and then do something about it, then you’re probably going to catastrophic failures. Those people who made the leap … In many cases, these were not things that people did because they had a choice. They sat and thought really hard about it. It was an adaptation that was forced by their own evolution. There were lessons that Amazon learned managing a massive web infrastructure, 15 years ago, is very similar to what a lot of these enterprises that are starting to get to that skill and trying to do these things with mobile and internet things, are going through right now. That … Just to kind of unequivocally frame it.

I’ve said this many times before. People look at Amazon and they’re in awe of what they’re able to do, with Amazon Web Services. They’re definitely the leader in that space. They say, “Oh, you know, here’s this functionality. Here’s this feature.” They focus really heavily on the API that manages hypervisor. They’re like, “We have a cloud.” Well, in reality, that is not Amazon’s advantage. The advantage that Amazon has is not that they were able to put an API in kind of a hypervisor. The advantage that they have and that they continue to have is their focus on, 1, building a platform, and 2, operating it at a massive scale, for decades. People also see that. They think, “Okay, well, we focus on this feature.” What they’re missing in many cases is the culture that created that. In 2006, the same time that they have their very first release of EC2. The writing wasn’t quite on the wall yet. There’s an interview with Werner Vogels, which I often refer to, where he made the statement that, “Developers will run the code.” If you build it, you run it. They’d already shifted a lot of those operational responsibilities, on to be developers. I don’t think you can argue with results.

Coté:
Indeed. You can press a button to get some detergent ordered for you.

Andrew Clay Shafer:
Is that real? I haven’t decided. Is that an April fools joke?

Coté:
I’ve got to research this. You know, it seems like if there were buttons strewn about your house, there would be a lot of packages coming, especially if you put them at the level that my 5 year old could press them. That would be very dangerous.

Andrew Clay Shafer:
I saw that going around. I wasn’t sure if that was you on The Onion or if that was real.

Coté:
To your point, this is one of the things that is always, to sort of seemingly switch context a lot, when you’re doing M&A deals, one of the things that you classically undervalue is institutional knowledge. Whether you want to call it culture or people, it’s sort of just like, it’s the hardest thing to quantify and secure, as opposed to actual assets, IP, where there’s software or physical things or whatever. I think that is an important part of thinking in a new way, of doing cloud like stuff. Much of what we’re discussing is, you know, all these big cloud people, they screwed up a lot at the beginning and had all sorts of things. Maybe we should learn from that. There’s all … Instead of you having to screw up and figure out how to run things properly, there are lessons learned that we pulled from those companies. It may sound weird, compared to like keep the green light/black box situation.

There’s a reason we’re going over this stuff. It’s really important to make sure you to learn from those lessons and follow those procedures. Just like you said, the idea, I think there was almost a disservice done over the past 5 or so years. This idea of “put a pager on the developer.” It’s almost like, in the 70s, we would say it was bumper sticker politics. It’s almost like a t-shirt level summary of what’s going on there. It’s more like “get your developers to own most of the code and production. Have them understand that” because that makes them write their software differently, essentially.

Andrew Clay Shafer:
Just following along the Werner Vogels quote, we can dig this up and put a link to it, he is basically arguing that without this operational responsibility, the developers don’t have a feedback loop. They don’t have a feedback loop into the impact of their code on both the infrastructure and also with the customers. It’s critical that you have and impact loop in the impact that you’re actual delivered software has on the customer.

Coté:
Right. Right. It’s like the marrying of a phrase we haven’t used yet, product management, into this whole thing, which is a whole other dimension of fun.

Andrew Clay Shafer:
Well that goes back to, you know, systems thinking. What are your inputs and outputs, product management. I’ve argued many times the hardest thing in software is deciding what to build. Then, maintaining the fidelity of that decision and those ideas through the machinery that creates the software, so that you actually get what you wanted at the end.

Coté:
Indeed. Getting to the operational side of it. As we in an industry and yourself as well, you start to catalog. What are the new and modified types of operational concerns that we have in a situation like this? Let’s go into what some of those things are. What are the things that you want in a platform that allow you to, basically, have my 5 years old press a button and have Thomas the Train delivered 5 times in our house in 2 days?

Andrew Clay Shafer:
This list, there’s a couple of things that motivated this. Observations across a bunch of different projects. Having done automation with a bunch of different tools. All your favorite DevOps tools and configuration management. Every one of those projects always have this goal to be able to give developers, for the most part you want to be able to enable developers. You want to be able to enable speed. One of those things is you should be able to push a button, like you said, and have certain stocks. Over time, what I came to notice is there’s these patterns. In most cases, everyone sort of re-implements aspects of them for their own special needs. I articulated this list so far. I don’t think it’s necessarily complete. I think it’s getting pretty good. There’s 2 things.

On one side you have things that are really about the process and the work flow, which is definitely an advantage. If you have adopted, at least the way most people interpret the ITIL focus on minimizing your incidents, as opposed to what I consider the new DevOps focus of minimizing your time to recover, then your platform should enable a bunch of things that make it relatively safe for you to do experiments. And keep the thing going. So there’s basic principles of architecture, going back to probably my favorite talk of all time, one of my favorite talks, definitely on this topic, which everyone should watch. We should put a link to. Joe Armstrong, who’s the primary author of Erlang, talking about his 6 laws of reliability for systems that never stop. It’s a great talk. It’s on InfoQ. He talks about these things like isolation concurrency, the being able to detect and determine failures. I think that your platform should basically have institutionalized those 6 laws, as much as possible, at the level that you understand them are capable of implementing them. Then, you also have to deal with things like the people and the process. There’s the things like role based access to resources. The point where you have the ability to do something with your platform. Who should be able to do that? The point where you have thousands of developers.

You want to have them all have access to this different resources. Who should be able to do this? How you separate and manage those things. A small team, if you have 3 kids in a dorm room in Palo Alto and your trying to build the next big thing, then you might have different concerns and considerations for what that means, than if you’re a quote/unquote, enterprise. This is also an exercise kind of reflecting the other side of the 12 factor app discussion. There’s this manifesto, we’ll call it a manifesto, that was essentially heroic marketing for the platform. How they thought about the world. What it meant to have a 12 factor app. They put that up there. It’s been up there for years. If you haven’t read the 12factor.net website, then I suggest you do that. That establishes a contract. The platform has a contract with the applications, to be able to do certain things for them. If you adhere to that contract, you’re going to get a certain benefits.

What the 12 Factor App represents in my opinion, is if you adhere to these 12 factor apps, you’re going to be able to do a bunch of things very rapidly, that we can help you with from the platform perspective. We guarantee that we can keep those stateless containers. That’s one of the 12 factors. We can guarantee that we can scale those, on demand. We can recover those. You’re never going to have any of the problems that you get when you start to try to automate things that have a lot of state dependence or order dependence. They’ve tried to, with their 12 Factor manifesto, eliminate most of those things that you would have problems, in a context of automation, so that they don’t have to deal with it. If you want to do those other things, that’s fine. You’re not going to be able to use something like Heroku or Cloud Foundry, for that matter. I think if you look at what’s happened. In Heroku, as in the first one, they were the first one to kind of define this category. They weren’t the first one to put these kind of constraints on developers.

Many of those platforms were built inside of organizations. If you look at what’s going on inside of places like Amazon and Google, they had platforms with these qualities for a long time. They didn’t market them as products. You could get access to resources, as a developer, at a place like Google or Amazon, for years, if you’re credentialed and have the roles and the rights. You could have access to those resources. What we’re seeing now is an attempt to package those lessons and package those abilities, so that they can be consumed, as a standard, across these different enterprises that we’re working with.

Coté:
Right. As we’ve been talking about this whole time, the operations concerns are as challenging as the product concerns. Product concerns are challenging because it’s hard to come up with stuff people want to pay for, or whatever the transaction in question is. Right? Doing really good design or really good product development can be difficult. The uncountable failures are evidence of that. On the other hand, operating all that is equally tedious and difficult.

Andrew Clay Shafer:
It certainly can be.

Coté:
There’s a whole lot of concerns.

Andrew Clay Shafer:
It certainly can be. One of my favorite quotes in ways to frame things is, “The best ways to solve some problems is not to have them.” I think that …

Coté:
Exactly.

Andrew Clay Shafer:
… One of the things that a standard platform can help do, with a thoughtful platform, it can enable things on one side. Then, you can restrict the class of problems that are on the other side of that. At least in my opinion, there’s a whole host of things that people do with their infrastructure, as in their applications, which on the outside, if you are trying to set something up that has a lot of owner dependency or is complicated to set up, then you have to imagine that it’s going to be as bad or worse to unravel all that and fix it, when it starts to break. Whenever I see stuff like that, personally, I don’t want to be responsible for operating it. I’ll just read the list so people have it in their head. We keep talking about it. Then, maybe they’ll have the context.

Hopefully, I can finish this and post it. Then, it will be much easier. People can read it. The first half are about enabling, so there’s this role based access to resources, being able to do deployments of arbitrary code on demand, being able to coordinate services, configuration. Especially as you move to any sort of complicated micro-server architecture, you want a way to do service discovery. You want to be able to route. This has to do with the way networks work. It doesn’t matter if you can get software running on a server, if you don’t have a way to get public requests to that server. How are you going to show everyone your new thing. You need to be able to read and write data. You need to be able to add and remove resources. That’s sort of the first half. I want to be able to scale up and down. I want to be able to do the deployments. I want to make sure the right people can do deployments. I want to make sure their data stays there and that people can access it from the outside world.

Then, the second half of the list is mostly about these failure things. I want to be able to isolate resources in failures. I don’t want to have the failure of 1 component or 1 application cascade into the whole platform collapsing. I want to be able to measure, monitor performance and health. I need to be able to detect and determine failure. Then, I need to be able to have some mechanism, and automated as possible, built into the platform to recover from failures. I’d argue you need to be able to do that on both the level of the application, and the platform should be able to recover it from it’s own failures.

Coté:
Right.

Andrew Clay Shafer:
Then, finally, this one is maybe one of the more nebulous ones. It needs to work tomorrow. I think there’s a lot of software that gets written, this kind of goes back to these previous points where, you get it set up. It’s sort of the happy path. Then, once it breaks you’re kind of on your own to go figure out what’s wrong. There’s brittleness inherent in the architecture. It has to, at least from my opinion. You want to build stuff that will run forever. Going back to my friend, Joe Armstrong.

Coté:
So, that is … I mean, as all point 11s are, it’s an open ended one. Right? The “work tomorrow” one. I think it more points towards a mentality that you have. You need all of the tools at your disposal, whenever you’re writing the application, but also operating it, to make sure that you can fix it and run it. It’s not just a black box, if you will.

Andrew Clay Shafer:
I want it to work tomorrow. Obviously, I want to consider that possibility of failure, as a first class citizen, the ability to remedy a failure, as a primary consideration. For the most part, I want to eliminate as many failures as you can. Really build these. I don’t like this language but, a lot of times people will say, “anti-fragile.” Just the whole mentality around failure that comes out of the stuff that people like Netflix are doing with the Chaos Monkey, where you would purposely cause failure, inject failure into your processing, your platform, so that you can be sure that you’re relatively safe and sleep through the night. It’s all about sleeping.

Coté:
That’s right. If you can’t sleep, you can’t really work tomorrow, very effectively.

Andrew Clay Shafer:
Then, if you do that a few days in a row, it’s yeah. All bets are off. I wanted to make one last comment. Some way is in my head. This celebration of heroism that kind of got baked into operational culture. There’s a lot of people that are talking about this now. It also relates to other things that are really important in our industry. Aspects of mental health and well being. In a lot of cultures, the operations responsibilities were kind of relegated by treated like second class citizens. Frankly, if everything is working 100% of the time, no one knows they exist. As a consequence, I think some of these cultures adapted to that by celebrating heroism, where they would stay up and fix these problems, in some cases, that were self-inflicted, not necessarily by those individuals. Definitely by the organization. They would pull these all-nighters, maybe multiple days in a row, to fix this thing, in this hero like manor. That was the only way that they could get recognition.

Coté:
Right. Yeah. It reminds me, oddly enough, you ever see that wool series of books? Like sheep’s wool? It’s one of those books like, in the future humans live in silos because the outside world is trashed. You know that genre. Very close to the zombie genre. Right?

Andrew Clay Shafer:
Utopia.

Coté:
That’s right. Utopia, indeed. There’s one character in there who’s like this master mechanic. She’s in charge of keeping the generators running. Right? You could imagine, if a significant chunk of humanity is isolated to a silo. Right? There’s all sorts of good lexicographical crossover between this conversation and this anecdote. Like, you know you can’t just go get a new generator from Siemens or GE or whoever. You need to keep it up and running. At one point, she’s reflecting and she’s thinking like, “Oh, when I came down here, everyone was always reactive. They had a hero mentality to fixing things all the time, which was fine. I spent several years installing a proactive, preventative maintenance program. We knew something was going to break or would break. So, we would fix it. Then, it took a long time. People realized that was a better way of going about things. It allowed for sleeping more often and less heroics.” It is like that. I think you’re exactly right.

Back in the IT realm, that’s one of the big structural changes that you have to get over. Celebrating hero culture is often not too cool. Right? In the developer world, you sort of understand this. There’s still too much hero culture in the developer world. There’s that funny trick that you do in continuous integration stuff, where you do like blame-storming when the build gets broken. You’re not really like, “Hey, good job. You broke the build.” You’ve created an opportunity for heroism, if you will. I think a lot of these 11 things, if you will, they sort of are baking in the notion of, “how can you avoid hero culture and do a lot more preventative maintenance?” How can you bake that into your platform and celebrate like, you know … There’s been no accidents on this work staff for 300 days.

Andrew Clay Shafer:
Yeah. Exactly. I want it to be really boring.

Coté:
That’s right. You don’t want to celebrate the hero moment of someone got their finger chopped off and we’ve prevented them from dying and returned them to work. It’d be better for them not to have a finger chopped off in the first place.

Andrew Clay Shafer:
“…then, I tied the tourniquet.”

Coté:
That’s right. “Let’s do a presentation on that.”

Andrew Clay Shafer:
Then, you were asking me about what is it going to finally … I pulled this up awhile ago. It’s been out there for over two months, I think. I think it’s a combination of my, I want to say things a certain way. I want a certain quality. I didn’t feel like I fully had the things articulated that I wanted to articulate. It’s not just this list. There’s some framing and some narrative around it. I will … Basically, it’s a busy world. I have other responsibilities than just writing blog posts. I kind of had it there. I’ve got a bunch of these things worked out in my head now. I just have to go back and clean up. I think we’ll get it out there. I’m an artist Coté. I need the … You can’t rush the art.

Coté:
As we know, if you are made to wait, it is to ensure you enjoy your dinner or whatever, your meal. I forget the exact quote.

Andrew Clay Shafer:
Perfect.

Coté:
On that note, there’s some buttons I need to press around here to go order up some Tide. I need to sort out if that’s true. We are recording on April Fools. There’s some good ones out there that I’ve been encountering. Hopefully, people can enjoy their April Fools posts.

Andrew Clay Shafer:
Yeah. This whole thing isn’t all a joke, right?

Coté:
I hope not. We probably won’t release this on April Fools. That way we’ll be clear of it being a joke.

Andrew Clay Shafer:
The blast radius.

Michael Coté:
That’s right. Always contain problems. That’s one of the things we try to do in the podcasting world, as well. All right. With that, you can always go to, what you ought to do is subscribe to our Pivotal podcast, which you can find the show notes. If you go to blog.pivotal.io/podcasts, you can basically find all the information that you need for this. You can see all the previous episodes and things like that. We got a great reception to the first episode that Andrew and I did, recording this. That was nice. It’s been fun talking with people about it and seeing that they enjoy listening. We’ll be back very soon with some more observations.

Andrew Clay Shafer:
If there’s demand, we could do Andrew and Cote every week.

Coté:
That’s right. We’ll see how long we can go, before we start telling the same old stories again. At least we’ll try to use different words when we tell the stories.

Andrew Clay Shafer:
I got a lot of words.

Coté:
That’s right. That’s right. All right. We’ll see everyone next time. Bye bye.

Andrew Clay Shafer:
Till next time.

About the Author

Biography

More Content by Coté
Previous
Organizing for DevOps Success
Organizing for DevOps Success

In this post, Cote provides a short overview of the changes needed for DevOps as a lead in to an article he...

Next
Just Do It: How OpenSignal Optimized Their User Research Program in 24-hours.
Just Do It: How OpenSignal Optimized Their User Research Program in 24-hours.

As a tech consultant, I help companies optimize how they build software. I talk to a lot of people about im...

Enter curious. Exit smarter.

Register Now