As a software company, it might sound silly to NOT talk about software at a software conference with 1,500 people.
Yet, that was the main point Andrew Clay Shafer made at the Cloud Foundry Summit held this past May. Instead of talking about Cloud Foundry, he talked about all the things surrounding Cloud Foundry. Offering his unique perspective on the topic, Shafer explains that technology doesn’t always pose the hardest problems to solve. Often times, the harder problems are with people and process, behavior and culture, decision-making and organizational learning.
Shafer takes the audience through a fascinating and educational journey. He starts off by talking about three types of stone cutters—those who get paid to cut stones, those who are fascinated by the craft of cutting stones, and those who want to build a cathedral. Shafer is not only authentically excited to build a cathedral, he wants those who enter the cathedral to participate in co-authoring the hymnal for everyone to sing from. In this stone-cutter context, Shafer tells the story of his career, and the failures along the way. Importantly, Shafer shares key lessons learned, how he became connected to DevOps’ history, and what forces put him on a trajectory towards the future of DevOps, helping companies achieve a new level of performance.
One of Shafer’s most notable points is about Conway’s Law, which states that for two separate software modules to interface correctly, the designers and implementers of each module must communicate with each other, and, therefore, the interface structure of a software system will reflect the social structure of the organization(s) that produced it. Shafer underscores how important the behaviors, culture, and incentives are to make DevOps work. He goes on to share the stories of Flickr, who does 10 deploys per day, and Amazon who is deploying every 11 seconds. He then places these facts in the context of what Amazon’s CEO says about DevOps and culture—that the right cultural elements are required. Shafer does a great job of illustrating the principles, practices, and tools that are evolving within the DevOps realm. And ends his talk with a great wrap-up on the risk and advantages of DevOps, while providing clarity on how DevOps, microservices, and platforms all work together.
If you want to implement DevOps in your organization, this video is a must watch.
- Pivotal Cloud Foundry’s list of videos and Cloud Foundry Foundation videos
- Pivotal Cloud Foundry Product Information
- Pivotal Cloud Foundry Blog Articles
So, how’s the Cloud Foundry Summit treating everyone? So far so good? People are trickling in. So, I’ll go slow. This is a talk about a lot of things and I usually give a disclaimer that we’re going to go really fast. I have 30 minutes supposedly, but I’ll probably go long, and I’m going to go through about 90 some odd slides, and some of them have more bullet points than others, but the main thing I’m going to do is take you on a little journey, tell a little narrative and for some of you, it’ll relate to things that you’ve experienced in your career, maybe on a similar or observe from different personas.
My career so far, I’ve kind of spanned many different roles inside of IT as a developer, system admin, a founder of a company and just thinking, really about how all these things fit together. So, the alternative title, if I can get this thing to change, there we go, is systems thinking is the new black. Just by a show of hands, did anyone come to last year’s summit at the Hilton? How many people saw my talk? How many people saw it on video? So this is all online, but the ideas in that talk I think are key to actualizing some of the things that you’re going to see in this talk and also some of the things you’re probably trying to do inside your organizations.
It definitely has a bunch of information about organizational learning and I use the stone cutter metaphor. This is another alternative title, and this is a stone cutter’s quest for nice things. You guys recalled the morning session with Sam. I have a slightly different version, but it’s basically this idea of connecting to a higher purpose or not, right? There’s different levels of understanding of your craft and the lowest level is the stone cutter that he gets paid to cut stones. He knows that he can make food for his family because he cuts stones. He has a skill.
In the second level is the stone cutter that is, he’s just fascinated with the actual craft. If you ask him questions, he’ll get excited about how the stones are laid and the tools and how the servers are configured with this one configuration as a tool.
The last one is the one who’s connected to this higher purpose of the cathedral. This is our lowly stone cutter and in my career, and I’ll just go through this really fast, but I’ve been involved in a lot of startups so we’re going to actually talk through some of that.
Most people know me from Puppet. I did a lot of work, early days especially, on OpenStack. My background, and we’re going to go through a lot, but I did some writing for O’Reilly. This is a little book I helped write on web operations and there’s a lot of great stuff. I’m packing my house to move to Los Angeles from Pittsburgh and I came across this book just this weekend and I read this book. It’s from 2010, but it’s really stood up, and we’re five years out from that, and anyone who’s interested in really building a new future, we’re going to really see a lot of the ideas that are embodied in that book, but it’s a great reference if you’re interested in DevOps and operations and really imagine this stuff at scale.
I’ve been involved in the DevOps Movement and we’ll talk about that a little bit. I’ve been on the core organizing team of DevOps since the beginning of DevOps Days and we organize dozens of those around the world now. If you’ve not had the opportunity to participate in a DevOps Days, then I think it will really be eye opening to see the communities of practice that come together and openly share the information.
This whole thing is very similar. We’re trying to build a community of practice around Cloud Foundry, but traditionally a lot of the organizations that are quote unquote enterprise, they tend to hide information. They often hide information from themselves internally, but they try to hide information from each other. Where, one of the things that you saw in the DevOps communities, is even though people are working on face on somewhat competitive services, that they would often share the way that they were solving these common and differentiating problems with their infrastructure.
I work for Pivotal. If anyone needs a job, we’re hiring. I try to amuse myself on Twitter and amuse you too, but I can be reached there, and that’s probably the fastest way to get my attention. Even if you work with me and have my e-mail which I’m usually about 1,000 emails behind on.
All right, so if you saw my talk last year, this is funny. Three stone cutters walk into a Pareto Inefficient Nash Equilibrium. Go watch that talk and you can laugh later. I’m not going to talk about Cloud Foundry. What I’m actually going to talk about is all the stuff that kind of is around Cloud Foundry and by relief, by the way that that silhouette then will kind of have Cloud Foundry left over if that makes sense.
I like to do this … let’s start with the conclusion. There’s a bunch of buzzword bingo crap that Gartner publishes and everyone does and some of the words mean something and some of the times they mean different things to different people. Again, it’s still all sorts of strange conversations if you don’t start each conversation with the definitions. For this conversation, this is what we’re going to mean and that’s fair, that’s fine, but I’m going to argue that this is just one thing that DevOps is continuous delivery, microservices, these are all really part of the same phenomenon that has happened. These things are enabling. These things are dynamic. These things are not really finished. What we’re able to do today will really evolve rapidly, especially as these communities cooperate and draw that innovation forward. We can come back and we’ll have a little bit more definition around each of these words by the end.
First, I’m going to tell you a story about me. In the beginning, but not the real beginning, I got a job as a developer. I had a job. I had a degree in math and I had a minor in CS and I had a professor, he was a theoretical chemist and he wanted to have this project. He’s like, “Can you do this?” I was like, “I have no idea how to do that.”
Yes, so he gave me tiny bits of money and I was really smart and I had Google, but I really had no idea what I was doing. I was sort of left to my own devices. There was this half cobbled together idea that this grad student had sort of implemented for this guy. His vision was he wanted to build a way for these theoretical chemists who were redoing a lot of the same calculations over and over to share these pre-calculated chemical, kinetic and thermodynamic principles.
His original idea was he wanted to have this database. You would have this database, but you were going to ship these synchronized databases across the community, but that’s a terrible idea when you could build an online system. This was right around 2001 so you’re starting to see internet stuff come online and I was like, “How about we build it as a web service?” I had convinced him that we should do that.
I learned how to do a bunch of stuff with my little red JSP book and my Google. I was able to build a bunch of stuff. I did this with no experience, really no one teaching me how to do it. I didn’t know anything about testing. I didn’t know anything about backups. I really didn’t know anything about servers so I was learning how to configure, patching, do the whatever thing to set up the Tomcat thing, to set up this other thing and then I would go writing a little bit of code and make this thing work. There was just me and my server that served the website off of my desk, literally. I made it work. It mostly worked. It worked enough that the guy got a $5 million grant to make it more real.
At the time, he was paying me like nothing and everyone was making a ridiculous amount of money doing the same exact kind of technology. So I had a conversation where I was like, “I think you should pay me more than nothing.” He was like, “Well, you know … ” I think in academics, people are really used to slave labor for some reason. He was like, “Well, you know, maybe … and there’s all this stuff going on and there’s all these jobs … ” and I was like, “I think what I should go do is go work for someone else because they’ll pay me more money.” I’m looking at the job postings.
In between that conversation and the date that I set to leave, we had September 11th, and, then, no one would talk to you after that. It just basically evaporated so there’s no work and every job that was posted had hundreds of resumes. It was ridiculous though, because you get in these conversations with the hiring manager and I always try to be proactive. I’m getting better or worse at that depending on how you look at it as my career goes along, but I wrote a letter to this guy. I was like, “I’m trying to make my way. Can you give me some advice on how to do this better? What about my resume?” He’s like, “We had all these things, hundreds of them, and we’re only going to interview a dozen people so we decided to interview people that only had … anyone who didn’t have 10 years of experience in Java. We didn’t interview them.” It’s like, “It’s 2001. Java came out in 1995. Good luck.”
Then I went to hide. I went to hide from the economy. This was around 2002. I went to grad school and I did a program at University of Utah called Computational Science. It was basically envisioned as this bridge between the computer scientists who didn’t know how to do math and the math people who didn’t know how to do computers. What I specialized, or what I post a lot on, is building these biometric fields, models of biometric fields, and so we’d build these Torso Models with the tissue and resistance and conductance across the tissue and then we would do these visualizations. I TA’d this class to do visualization of the biometric fields and lots of stuff. It was pretty fun.
Did anyone go to grad school? Does anyone wish they were perpetually in grad school? If you could make money doing that, back to the slave labor thing? What I learned there, which was formative, was this idea of technical debt because this thing called Skiron, which is you could go see it, it’s how you would be able to make these types of visualizations.
It was a project that had a five-year grant for millions of dollars, but the way that it was architected was, this is where Conway’s Law comes in which is … does anyone know Conway’s Law? I use it in every talk so I start to assume people know what it is. Conway’s Law is this idea that organizations will build systems that mirror the communication structures of the organization. If you communicate in a certain way, often this is true in almost every product I see, if you see API level problems, then if this group doesn’t talk to this group very well, you’re going to have that problem all the time.
In this project, although I did learn to use source control, you had a project that was this longstanding grant where the majority of the code was written by a grad student who was only there maybe working on it for a year. Over a five-year period, you can imagine that all their incentives and all that communication was perfectly aligned with building great software. It’s what I like to euphemistically refer to as Academic C. Maybe some of you have written some of this code before.
There is really no, what I consider, very mature process. In most cases, there’s very little regard for the future. It’s like you have some little bit and the parts that I worked on were mostly about doing Algebra so people had implemented a bunch of naïve ways to do matrix multiplication and so I basically replaced it all with the fortran libraries and did a bunch of stuff with that so that was mostly focused on the math part, but it was super fun.
I graduated, my wife started medical school and we had our first son in about a two-month span so that was the end of the rock and roll lifestyle, clearly. She decided that I should get a job and try to feed us. I sent out some resumes and put one on Dice.com and then a recruiter called me and he’s like, “Will you go to this interview?” It was like, “Sure.” He was like, “On your resume it says VTK,” which stands for Visualization Tool Kit, “and they’re looking for someone who knows GTK. That’s pretty close so go talk to them.” This is a true story.
I go to this group and I spent, I was there about 40 minutes and I talked to two people. One guy was in charge of marketing and what they were advertising for, what they were looking for at the time, was someone to come and work on user interfaces for this device I’m going to show you in a second.
They had a bunch of ideas about how it should work, and they had someone who didn’t know how to code that was trying to work on them and they were just kind of desperate at that point. I’d been working on user interfaces and usability with respect to visualization and I started … I have a character flaw, but I get really fixated on little things, and I want to go find out all this stuff. So, I got onto this stuff on usability and Jakob Nielsen and all these people that write about usability and UX, going back a decade so I was way into all these ideas about how to do this and so they got excited about that.
I went across the hall and I talked to the VP of Engineering. The VP of Engineering is looking at my stuff and he was like, “You’ve been doing math models and biometric fields on the torso of a human,” and said, “Aren’t you going to be bored working on user interfaces for our crappy little device?” I said, “I am pretty good at entertaining myself.” He said, “Come back on Monday.”
So, I joined this adventure where we spent $26 million and had really nothing to show for it in the end except for this Wikipedia article which you can go read if you look up Black Dog and Realm Systems.
I have this little thing, we made the boards, we have a custom kernel around this thing all the way up through this operating system. This is some of the specs for the device. We had this big vision about solving identity and we were actually doing some things that are very much the same problems that everyone’s trying to solve with the IoT. You had to provision this thing and what we did was build an on-the-fly Debian packaging of the policy so your administrator could configure all these things that you’re supposed to have access to in the back end. We’d make on-demand Debian packaging of that. When your key got online, it would download just the things for you. It was fun, and it was one of the best engineering teams I ever worked with. And, it was really formative for me.
I worked with this guy who wrote the book on Make. He literally wrote this book on GNU Make, and he taught me a bunch of things about how to think about problems. It wasn’t so much like, “Do this, do this, do this.” It was like “you should not implement something until you’ve thought about three ways to do it. You should test these things.”
One time, we were cross-compiling these things for this little chip on our Linux machine and it’s funny now because everyone’s all excited about containers, but we would have these chroot environments to go build this stuff and put it onto these devices. And, he’s like, “You’re not testing your code for this interface for this other thing.” I was like, “It’s impossible.” He’s like, “No, it’s not.” I was like, “It’s impossible. I tried.” He’s like, “Okay. I’m going to come and we’re going to pair and we’re going to make it work.” I was like, “Okay.”
He came over to my desk and after literally two hours one day, I’m trying to get all this testing, where is boost and C++ to do this thing and test it. He had these high aspirations and finally he just tapped out. He’s like, “Fine. You don’t have to test that.” I was like, “I wanted to. I really wanted to. Couldn’t do it.”
The thing that was interesting, it was a perfect display of Conway’s Law again and there were all these little Game of Thrones ideas about the different teams and everyone on this thing. And, I kind of broke it accidentally because I was working on this C code and platform code on this one side and then there’s this Java side where they did the actual server stuff to do the configuration.
I’d implemented a bunch of stuff that needed to have the server put together those configs and then push them to the devices. I went and talked to this guy for weeks or whatever and he’s like, “That’s awesome.” I was like, “When do you think we’re going to see this system work end-to-end?” He was like, “It’s not in my sprint. It’s not in my stories.” I was like, “What the heck are you talking about? Why do you think I spent all this time explaining how this works? Don’t you get this … we sit and we hear the vision for what this product’s supposed to do. Why aren’t you connected to this cathedral we’re trying to build?”
My solution was, basically, to just go implement on the job server, which made everyone mad. Well, not everyone. It made some people very happy because now this thing worked, but it made the people who thought that “it was their code” very unhappy, and, as a result, the Vice President of Engineering decided it was better for everyone to get work done and not have this ownership. So, he dissolved all the teams and everyone just became one pool.
Then it was like a free-for-all for me. I was like, “I’m going to learn how the kernel works, and I can go do that stuff.” The next week, I would just grab a story from whatever part of the system I wanted to. So, that was a really rapid iteration of my leveling up on how computers work.
There’s stuff that I wouldn’t have been able to do inside of another organization. One of the things this did for me is, I disassociated my identity from any aspect of anything. I don’t care if you’re a sysadmin or you’re a developer, you’re a Java guy or a C++ guy. It’s just something you just do. You want to make it work, you make it work. If you attach your identity to things, I think that’s limiting and it’s bad. That kind of comes back to play in the DevOps story.
The next thing that happens is that company runs out of money, and they go from basically 100 people down to 15 people in six months. I was in the layoffs, between my friends who were getting two weeks’ severance and my friends who missed paychecks. So, my group was just, “Hey, don’t come back.” I didn’t ever miss paychecks.
Here’s another side because we developed all this expertise on flash. The team that we worked on there, they eventually built Fusion-io. Does anyone know who Fusion-io is?
Depending on when I signed on, I would have been the fifth to the ninth employee at Fusion-io. Their deal was, “We know we just unraveled and you didn’t get paychecks, and your friends didn’t get paychecks. We might be able to pay you in nine months when we raise money.” These guys across town had just raised their $5 million A round, and they’re like, “We’ll pay you a six-figure salary.” That was a really easy decision for my wife to make.
I’m losing my mic I think. No, that was just the card. If everyone can still hear me …This was the opposite. The first team I worked on was a dream team from an engineering perspective, and we had this deep expertise across a wide bunch of things, but we had this pathological understanding of our business. In this other startup, we had actually a very clear business mission, and we had pathological technology. We had a CTO who had a CS degree, and had never worked for anyone, had never had any mentoring. He could type 120 words a minute, and he could fix any problem in the universe with another if or six ifs. That was the code base.
His specialty was the if-else plinko. It was in three places across JSP’s and code, the if else plinko had to magically construct the string that got put straight into the JDBC to call the database then come back out. And, then you had to have the same if-else plinko to recover the result, and it was terrible.
I had a contest once at a conference about who saw the worst code ever. After about an hour into this meal, everyone was just tapping out. They were like, “No, we win.” I wasn’t even to the good parts yet. It was amazing.
What I learned there is the opposite lessons, but I was really empowered. It was one of the first places where I wasn’t just a developer. All of a sudden, I can do things that other people couldn’t do. I was put in charge of a bunch of stuff because I proved I could solve these problems really rapidly. The CEO, he couldn’t undermine his CTO, but he knew they were in trouble so he put me in this position to act as a check-and-balance. I learned a bunch of stuff about team dynamics and more Conway’s Law, but what we built looked suspiciously like this rube goldberg machine, and we had a lot of automation. We were deploying, at that time, this is 2006ish, our infrastructure was roughly 50 servers. Each of those were 8 cores. They were pretty beefy servers for the day. We were doing about $30 million in revenue, not revenue, but transactions through it. We were taking a small piece of that.
This is how we did stuff. I didn’t know anything other than what I’d learned, and most of what I’d learned wasn’t really about doing these large scale deployments on servers up to that. Those aren’t really large scale in retrospect, but we shared a bunch of stuff. There’s the do-it-now.sh or do-it-now 5. You just keep changing things, and you share them. This is what things looked like.
That story, that narrative is really every day you go to work. And, if you did a deployment the night before, then you’re often greeted by a bunch of emails from your east coast clients who hadn’t been able to do transactions on their eCommerce store because you broke the way checkout works or something. This was a pretty common … Has anyone ever lived this movie? Have any of you ever seen this?
As it turns out, my roommate from Reed College was Luke, and Luke had already thought a little bit about this. I had been part of the Puppet community, and I’d made some commits to it. But, it wasn’t really a business yet. As I told you a bit more of my story, it was hard to convince my wife the way this should work is I’m going to go with my roommate from college, and we’re going to make software that we will give away for free, and it’s going to be awesome. It’s going to work out. While she’s doing her 80-hour-a-week medical school stuff.
The thing that really happened mentally for me is, and a lot of this I’ll give credit to Luke’s perspective, and there’s a ton of people that influence how I think about this stuff, some of them are here. I’ll get to Velocity in a minute, but we really wanted to change the relationship between people and computers. We wanted to change the way that people thought about their computers. We didn’t want to have the do it 5 as the way that people thought about it. We wanted to get it so you could consistently manage these complex systems to scale in a way that basically very few people outside of some of these other organizations I’m going to talk about in a second were able to do.
What turns out, and this is part of the Cloud Foundry story, is you actually need to change the relationship between people and other people as much or more than you need to change the relationship between people and their computers. Because, going back to Conway’s Law, if you have a bunch of different ideas about how the world works, then you’re probably, or God forbid different incentives, then you’re probably not going to do the greatest job with your computers.
This is this DevOps story. Everyone’s read the Gartner Report on DevOps and how you need to change your culture, but I don’t think you can really talk about DevOps if you don’t talk about Velocity. I think I’m going to bring up some ideas that people haven’t thought of before, at least not articulated.
This is a conference that I consider the first DevOps conference. This is certainly a slide that was … this is from Velocity 2009. John Allspaw and Paul Hammond talking about 10 employees per day at Flickr. They had these slides, and, for some of us who have been around them and been around this conversation, it was just like, “Paul and John. They do this thing.” And, there have been a bunch of these threads, but for people who have not been in that world, who hadn’t thought this way, hadn’t seen this stuff, it was like [head exploding sound]. We’ll come to more of that.
This is actually the first recorded use of the word, DevOps and these are from that session at Velocity. And, this is my Twitter. This is July 3rd, 2009. I just pasted that image from Twitter this morning. We’re having the same conversation, but this basically hasn’t really changed since then. It’s just propagating through the rest of the industry.
The thing I want to draw attention to with respect to Velocity is Velocity was started as a guy from Amazon and a guy from Google as the chairs, putting together this program around web operations and performance. What that represented was, as I mentioned earlier, these communities of practice coming together and sharing their ideas, sharing their successes, sharing their failures. Some of the most interesting things, and this is interesting, especially in a culture where it’s very blameful, that you share your failures. I understand how you failed so they don’t fail again, so they don’t fail like you.
Everyone hears about Amazon and there’s this bookstore, has anyone ever ordered anything from this bookstore? My wife gets three boxes a day from Amazon. This is the last public thing I heard them say, and this is a couple years old now, but according to my sources, this is actually an order of magnitude off now. So, they’re closer to a deployment every second. They might not say that out loud, but that’s roughly what I would imagine. You got to understand, and we’ll get to this in a minute, is that’s not monolithic deployment. That’s like thousands and thousands of services each being updated independently.
This is a quote from Werner Vogels. Who knows who Werner is? It tells you on the slide. So, hopefully you can figure it out. I won’t read all of it, but if you haven’t, this is from 2006. So, this is three years before anyone said the word DevOps. He said, “Amazon, if you build it, you run it.”
This brings developers into contact with the day-to-day operations of their software. It also brings them into day-to-day contact with the customer. It will change the dynamic of your organization very, very quickly if the person who is going to feel the pain for a bad decision on their code actually gets paged for that bad decision. I don’t think anything changes things faster. It’s one thing to connect incentives and a bunch of stuff, but if you move pain to the right places, it goes away.
Just a quick aside, does anyone know the experience where they give the guys the shocker just like a classic California experiment? Humans are weird, man. What are we doing? It’s like, “You’re just going to shock the guy?”
Everyone looks at Amazon and they rush to copy this. It’s like, “There’s the cloud.” Amazon’s advantage is not the fact that they put API in front of hypervisors. I think that if you look at some of the stuff that’s happened with other attempts to make open-source solutions to that problem, you realize that’s not the hard part. These are superficial features, and, really, the big advantage that Amazon has is the process and culture that produced that.
Those are artifacts of this other thing, and they have a huge advantage against most other people in this space. Although, there’s a short list of people that could compete with them, with respect to operating a massive web infrastructure at scale. I don’t know what just happened. The cost of operating that is something that they are able to press down and down and down which is why they can provide the level of service they can at the prices that they can. That’s why you’re seeing roughly 30% decrease in the cost of their services year to year over the last four or five years.
This is from O’Reilley Radar, which is also related to Velocity, and they’re arguing here that operations is the secret sauce. They’re talking about startups so what you’re seeing on the left is what we’ll call the do-it-5.sh solution to operations. On the right, it’s more of the configuration, management style, very policy-driven, collapsing the complexity with systems that can enforce across your full infrastructure.
The bottom, what they’re trying to show, is the servers. So, you did a bunch of work, you started your deployment, that pink there is the first deployment to production. As you grow servers, if you’re using traditional IT, then the cost of doing that is going to continue to go up linearly where if you’ve done this work to make the configuration a non-issue, then it’s still going to go up, but your linear factor is much, much lower. So, your overall total cost of ownership is much lower.
The thing that you have to understand, if you’re building services, which just seems like it takes a lot of people a while to get, is that day 2 matters. Deployment is the price you pay to get to your real problems. If you think about, everyone over-rotates on developers, developers, developers. If you are going to do the math about what it costs to own something, then, on some timeline, the cost of ownership, if it’s high to operate it, then that will dwarf the development costs. If it’s high, the time that it takes to equal that is very, very short, but for whatever reason, many organizations don’t internalize that.
What I want people to understand here is, these are really things that emerge from principles. When you see someone using Puppet, that’s not DevOps. If you see someone using any tool, Cloud Foundry, doesn’t matter. That’s not DevOps. The principles are the greatest thing, if you understand that, then you can flexibly change to whatever you have to deal with. The practices that will naturally emerge as you understand these things are one thing, and then tools are the last artifact. That’s the lowest level thing about what we’re talking about.
Let’s rewind. I’m going to keep going, and I’m going to go really fast. Let’s rewind that. Software in the beginning. We had a bunch of stuff we shipped on CDs. It was hard to change after release. It runs on other people’s’ computers. You don’t have to really worry about bugs because it’s out there on someone else’s computer. The processes don’t run really long. You turn off your computer, and you come back so it doesn’t matter if it has memory leaks so much.
There’s no real uptime. Who made some memory leaks before? This is all to me, there’s this story about, and this is what’s powerful about software development is you can take ideas and you can just make stuff. You manifest ideas as code. That’s powerful. This is the process and traditional thing. You have a good idea, request a server, you get a purchase order, you wait, you wait, and the server arrives, the server gets power to the network, the server gets operating system, start to configure for the deployment.
The sysadmin, and he keeps all this stuff running, he doesn’t care about your application. He’s not paid to care. Other people need their servers too, and he’s a call center anyway. So, he doesn’t care. He has to worry about not just all these servers, but also probably the e-mail, and, maybe, God forbid, printers. You shift to servers. You’re services now. We’re not going ship CDs, we’re going to have services. The Internet changes all this. We run things on other computers, they’re our computers. We can change those computers any time we want. We still have to worry about bugs. And, processes now run a really long time, and, now, time becomes everything. That’s the transition that not everyone in the industry has made, but the big web lives and dies by it. We’ll get to this platform story in a minute.
This is bigger, faster, this is a Google data center. You always hear people say, “We can’t do this. We’re the enterprise. Our ways are different.” We have this thing, and it’s like what our strategy is. We’re going just slow everything down. This moving slow is an advantage how? I don’t know.
This is a classic slide from Velocity conference talks I give and use all over. Developers and operations, what you’re going to do is you’re going to put a wall between them, probably a ticket system, and then everyone’s unhappy, and, then, this is pretty much what happens.
Are misaligned incentives an advantage? That’s really not how the web was built. No one at Google or Amazon, there was an infosec, and there’s a bunch of other stuff. So, there’s definitely still gating. It’s not a free-for-all. We’ll get to that in a minute, but it’s not a competitive advantage if you have to go through this kind of process.
What’s evolved or what we’re talking about now is these narratives around Platform, DevOps, Continuous Delivery, Microservices, who’s read any of these books? Continuous Delivery is a great book. Jez Humble is a great writer. The Phoenix Project has really rewritten The Goal, which is the Theory of Constraints, framed it with IT, and release it as a bunch of patterns for. It’s kinda the stuff you see in Netflix open source with respect to some of these patterns.
We have this new narrative that’s emerging, and we have a bunch of new tools. This is definitely part of my story. Now, we have a new process, and, the process is, I have a good idea, and I get a server, and now I can make cloud API calls. I get a server in minutes now, and I run my configuration tools, and in minutes, everything’s up, and that’s pretty cool.
Principles, practice, tools. Then there’s this guy … you notice who this is? Now he’s a venture capitalist. So, he turned to the dark side, but he used to lead a lot of the cloud stuff in Netflix. These are straight-up stolen from some of his presentations. So, he’s like, “This is what I learned. Speed wins. Remove friction from the product. High trust, low process, no hand-off between teams.” I think this is the hardest thing for a lot of the people who have really ingrained their identity in these processes to overcome. “Freedom and responsibility culture.” We’re not going to cover everything with all these rules. What we’re going to create is a bunch of highly-empowered people that can take responsibility for our success or failure. “Don’t redo things that you don’t have to do. Use simple patterns automated by tooling.” This is word-for-word off of his presentation. Self service cloud. This is the key. “Self service cloud makes impossible things instant.”
This is something that every DevOps project always aspired to do, be able to give you self service, but, rarely, you get to there because there’s all these other things you have to solve. It’s one thing to configure servers. How you going to solve the role-based access? How are you going to do the API, to do the orchestration, to then get in the server, and then you run your Puppet. There’s a bunch of other stuff, and everyone aspires to do that, but very few people actually get there. I’ve gotten there before with those tools. I think an integrated solution is a better way to go.
Here’s another thing I want everyone in this room to internalize, especially if you work for enterprise, but we are an enterprise. We don’t have the talent to do this. This is a quote from Adrian as well. “But Netflix has a superstar development team and we don’t.” This is what he told them, “We hired them from you. We hired all those people from you and we got out of their way.”
This is this impact of batch size. So, I want people to understand when we start talking about rapid changes to these infrastructures, it’s actually safer. If you imagine risk accumulating as you write code, and, then, every time you do deployment, it comes back down. If you do big batches, the amount of risk you’re exposing yourself to is higher. Your exposure to the risk is less frequent, but the actual total risk is higher. Just thinking about it mentally, if I do a deployment of something I wrote an hour ago and there’s an actual problem, it’s not a big mystery where that code is. I can go fix it immediately. So, you’re minimizing your time to recover, and you’re not that worried about the impact because you know can minimize it very fast. What’s happening in these web companies is their time to recover goes to almost zero. Plus, they have continuous delivery. It’s predicated on continuous integration. If you don’t have culture of testing, if you don’t have a culture of monitoring, don’t do this stuff, but as you get those capabilities, you should aspire to do this because it’s both faster and safer. You want to go faster because John Boyd. I don’t have time to get to that, and I’m way over time and still have more slides. “It is not necessary to change. Survival is not mandatory.” Edward Deming.
Netflix built a platform to enable self-service deployment. They built a platform to deploy and operate microservices. They built a platform to continuously deliver software. They built a platform that could protect itself from failure. What Netflix did not do is build a platform for general ad-hoc automation. That’s another thing people need to understand because the platform makes promises and the constraints are the contract that allows the platform to keep those promises.
We start thinking about 12-Factor Apps, and that’s the contract that Heroku decided to make with their platform, and there’s a bunch of other platform contracts you can imagine, but if you just allow anything, you didn’t really solve anything either. You’re right back to trying to figure out what’s wrong and you have a problem. If you collapse what you’re able to do, those constraints are actually enabling because of the 80-20 rule that you can get 80% of the benefit from 20% of the features most of the time.
DevOps refers to the practices and tools that emerge from high performing organizations and continuous delivery is a result as a consequence of that practice. This is not possible with gating and handoffs. We could debate continuous because there’s a quantum of delivery, but in reality, you want to shrink your batch size, and you want to shrink the cost of delivery because it’s not possible, it’s untenable, if the fixed cost of deployment is high. If you have a high fixed cost of deployment, and then you do it all the time, that is expensive.
Microservices is just a description of the post-cloud, post-DevOps, post-continuous delivery architecture. It’s a natural evolution. So, this baseline operational capability that you get when you can do deployment on demand … the other thing this is giving you is the team dynamics to leverage Conway’s Law because you’re decoupling the amount of people that need to be involved by enforcing the contract and the API as your communication. You can decouple so each of those can be deployed independently. That’s the whole point of microservices. You have loose coupling. If you have to deploy them all together, you don’t really have microservices. What you have is a monolith that you broke apart. You didn’t actually get it to microservices because you’re not decoupled.
I’m going to argue that continuously delivered microservices are the natural evolution for services that need to run at scale and be changed frequently. A bunch of people built platforms to do this, but those are one-off platforms. Continuous delivery is a why, and DevOps is a how, and microservices is a what, but all of them, you basically need a platform. Until you can take arbitrary code, and put it on a server and have it run, you can’t do continuous delivery.
Until you can have something that is relatively monitored and safe and maybe even self-healing, you can’t really do microservices.
To be able to do that, to build all the scaffolding around doing that, that looks suspiciously like a platform. Do you want to build one by yourself? There’s this thing. Maybe you’ve heard of it. Maybe you don’t want to build one by yourself, but maybe we could build one together, and this is the new process.
You have a good idea? Push your code to the platform. It’s running in seconds, self service, self-healing, and we all live happily ever after.
This is like the takeaway. No one set out to do microservices, continuous delivery, or any of this stuff. These were natural consequences. So, don’t fixate on the words. It’s annoying to me when people are, “We’re so agile.” It’s like, “You guys suck at software. What are you talking about? Don’t tell me how DevOps you are. Tell me how much you’re kicking ass for your company.”
This is actually not the end because what’s happening now with the Internet of Things and Big Data and everything getting bigger and faster, that means that in the next five years, I say the average, if you just start to do math on time series and sensors and all this stuff, the average Internet of Things deployment is going to put enterprises on the scale that Google had to be at maybe five, ten years ago. It’s just math.
If you are still stuck in the past with your processes and your thoughts about how to do this stuff, you’re going to be at a huge disadvantage, And, someone else is going to build that future for you. I work for Pivotal. I work on Cloud Foundry. I get a little excited about stuff, and I’m way over time, but thanks for sharing this time with me. I’m happy to answer any questions you might have and thank you.
About the Author
BiographyMore Content by Abby Kearns