Deploying distributed software is a non-trivial challenge. It is one of those things that is hard to do at scale, and harder to do as scale grows. Even more challenging than deploying software in a distributed system is updating that software whilst systems are running. This takes complexity to a whole new level.
Fortunately, there is an open-source tool called BOSH to come to the rescue! BOSH is used “under the covers” of Pivotal CF to deploy and update the infrastructure components of the system. It also provides the cloud-agnostic layer providing support for providers including VMware, AWS and OpenStack.
In this episode, we take a look at what BOSH is, how it works as well as a journey into the world of Canary Updates.
PLAY EPISODE #16
- Subscribe to the feed: http://pivotalsoftwarepodcast.libsyn.com/rss
- Visit here for show transcript
- Click here for other episodes
- Feedback: firstname.lastname@example.org
- Links Referred to in the Show:
Welcome to the All Things Pivotal podcast. The podcast of the intersection of agile, cloud, and bit data. Stay tuned for regular updates, technical deep dives, architecture discussions, and interviews. Please share your feedback with us by emailing email@example.com.
Hello everyone and welcome back to the All Things Pivotal podcast. Sensational to have you listening again. Hope you’re enjoying the series. My names Simon Elisha, CTO and director of field engineering here in Melbourne, Australia. It is a beautiful sunny day in Melbourne and a good chance to talk a bit of technology.
Let’s talk about a little thing called BOSH. Now if you’ve ever had anything to do with Cloud Foundry, or Pivotal Cloud Foundry in particular, the word bosh may have come up in the conversation. Depending on which country you’re from the word bosh can mean nothing, literally it doesn’t mean anything, or it can mean rubbish is another meaning for the word BOSH. In this case BOSH is anything but rubbish, it’s actually very, very handy. It is an open source tool chain for release engineering deployment and life cycle management of large scale distributed services. That could be interesting and could be useful.
Now, many people when they hear the word BOSH … that’s B-O-S-H … say, ‘Ah, what does BOSH stand for?’ It’s an acronym. It’s actually a self referencing acronym. BOSH stands for the BOSH Outer shell, which in itself is BOSH. In true computer geek humor, hilarious, didn’t stop laughing until I started. That’s what BOSH stands for. Now you know that it doesn’t actually stand for anything in particular. I’m sure you could come up with some ideas though if you like as well.
What is interesting about BOSH is not so much the name, is what it does, because it tackles a notoriously challenging problem and handles it very, very well. Because deploying distributed systems is a nontrivial problem. Getting systems to run in a consistent way cross all the nodes is hard in the first instance. What is even harder is upgrading those systems live, while the system is running, in a seamless and reliable way as well. Really, really difficult. That is what BOSH is here to help us with.
The other thing that BOSH does really well is it does this in an infrastructure as a service agnostic way. It is designed to be completely neutral as to what you’re deploying upon. This means that we can extend it to work on different providers, be they on premises cloud providers, to public cloud providers, and anything in between, as and when needs dictate. Currently BOSH will support VMware vSphere, it supports Amazon web services, and it also supports Openstack as well. It also supports Vcloud Air as well. There’s four there that you’ve got that it works.
What this means is that you can use it and operate it in exactly the same way across all those providers. Why would that be useful? Let’s think about it if we wanted to deploy Pivotal CF both in our on premises infrastructure but also maybe we want to use some Amazon web services for development and test work load, wouldn’t it be nice if we could do it in the same consistent way? Consistency is the watch work of the way BOSH tries to create the world for you.
Let me roll into what BOSH does and how it does it, and we’ll deconstruct it a little bit. Every BOSH deployment has three main complements: a stem cell, a release, and a manifest. The word stem cell is a really useful one because it gives a though process to how the human body is built. A stem cell is a cell that can become any other specific cell in the body. Similarly a stem cell in the BOSH world is a naked, it could be anything, type thing. A stem cell is a virtual machine template. BOSH clones new VMs form the stem cell to create the VMs that it needs for a deployment.
What is contained on the stem cell is an operating system and an embedded BOSH agent that allows BOSH to control the VMs that it cloned from the particular stem cell. Now, a VM clones from a stem cell can have difference in view, memory, storage, network settings; it can have different software packages installed. The only dependency it has, or the only thing that’s consistent is which cloud infrastructure it is tied to. You’ll have a stem cell, for example, for Vmware, you’ll have a stem cell for IWS, a stem cell for Openstack, etc.
Now, the release is a collection of source code, configuration falls, and start up scripts, with a version number that identifies these components. Really, this release is saying, here’s all the stuff I need you to build to go ahead and deploy a particular service. This is typically at the infrastructure life. This is less around a traditional view of applications or customer facing applications, this is typically for infrastructure deployment. What will happen is jobs will run with are set to configuration falls and scripts to run the binaries from a package on the virtual machines that were created from stem cells.
The final component is the manifest. The manifest is just a YAML file that defines the layout and properties of the deployment. Now, the manifest is important because now we have a representation of our environment that we can put into source control. I’m a massive advocate of putting everything into source control, because by putting things into source control you create this infrastructure as code world that is so much easier because we know now how things are configured, why they’re configured a particular way, and what changed between configurations.
What BOSH does really well is it will take that YAML file and initiate a new deployment, and the BOSH director which controls everything, will receive a new version of the deployment manifest and creates a new deployment using the manifest. It knows what to go ahead and what to go and build. Basically, it creates run-able software for you from the release that you define, which makes it nice and easy.
Let’s look at the process that actually takes place. The first thing we do is we create a stem cell. That’s something you create in an isolated environment, you create it with exactly what you want to have running on it, so you gave complete control and confidence that what is running in your environment is as you defined. You upload the stem cell, you then upload the release so all the other bits and pieces that need to take place to be part of the environment. In the case of Cloud Foundry it’ll be things like the cloud control, the health manager, the router, all these different components. Then you define the deployment with a manifest. Then step four is you deploy. Of course step five is profit, as is in all of these lists.
Lets think about what this actually means. The fact that you divided a BOSH deployment into a stem cell, a release and a manifest means that you can make changes to one aspect of the deployment without having to change everything else. This is really nice.
Lets talk about moving between clouds: nontrivial exercise. You can keep exactly the same release, you simply change the stem cells specific to the new cloud, and you would tweak the manifest to change which provider you’re using. Again, what you are deploying actually doesn’t change. What it’s being deployed onto would change, the mechanism by which it does so remains the same as well.
If you want to scale up and application you can keep the same release, you can use the same … oh, it’s hard to say … use the same stem cell, and you just change one line in the manifest and say, Hey I would like to have more instances of a particular capability, and BOSH will make it so. To update or rollback an application you simply apply the newer or older release version, sue the same stem cell, use the same manifest and BOSH will, again, make it so, put it into the state that it needs to be to make it work effectively. BOSH is doing a lot of those underlying infrastructure lifting for you, without you having to worry about it.
If you used BOSH in the past you’ve probably used the command line. If you’ve used Pivotal CF or you’ve used the Ops Manager component, Ops Manager provides a GUI that really abstracts what’s happening underneath at the BOSH level and taking care of the creation of those manifests, etc, for you.
One really powerful capability of BOSH that I wanted to hone in on for this episode is the ability to handle updates. If you’ve ever worked in an operational environment you’ll know that updating the system is usually the most difficult, hairy, challenging, risk prone, annoying process you have to do. as a result, people tend to do it less often, and as a result of that they tend to become less practiced at it, and as a result it tends to not work so well. I’m a big advocate of updating often and frequently and getting really good at it.
There are clever techniques you can use to make updating software easier. One of the fantastic ways to do that is to use what are called canary updates. Canary updates is tool that’s used through … pretty much most of the interest giants use them. Google, facebook, the amazons etc, they all use this technique and it’s a very worthy technique. As with most techniques that we use in IT, it was developed and invented far, far away from IT for completely other purposes.
If you’re not familiar with the origin of a canary deployment, canaries were used in the days of coal mining when coal miners would go down without respiratory gear etc, down into the mines and would be mining away. One of the challenges of mining coal is that gas is often released as they’re mining, and gas and breathing often do not go well together and people would die. The miners would take a caged canary down with them and the canary would be leading the pack, if you like, happily singing along in the cage, and if they looked up and the canary was dead, they would all flee frantically from the mine knowing that the noxious gases had been released because the canary is far more susceptible to that then the human beings to be, and they would have time to get out and escape.
Basically it gives you a chance to understand whether there was a problem and you need to take action for it. If you’re a Simpson’s fan, as I used to be back in the day, you’ll remember the famous episode where they had the canary in the coal mine and the canary died of natural causes. That’s a different situation, but I’m using nonetheless.
Let’s spin back to BOSH. What can BOSH do in terms of a canary deploy? As part of the manifest it has a section called the update block. It’s probably my favorite part of the BOSH manifest. In the update block we define how many canaries we want. We can say how many instances do we want to deploy the update too, and see if they worked. Isn’t that a nice idea? What if I deploy my application or my update through BOSH and I say, okay go ahead, and this is 100 node cluster that I’m managing, Go ahead and update ten. The canary deploy will go ahead and update ten, and we’ll watch it and make sure they’re healthy, because BOSH checks the health of systems. In fact, if BOSH detects that virtual machines are falling for some reason, it will restart them for you; it does health management for you as well.
In the canary deploy it will say, Hey I’m going to go ahead and wait for a designated amount of time to declare the update health or unhealthy. If the job is unhealthy, I do not continue with the deploy. Isn’t that good? Instead of saying, We’re rolling out a change across 100 nodes, we’re doing them 10 at a time, and bang, bang, bang they all break and now I’ve got an unworking system, because of my canary deploy I’ve only deployed a few and I can rollback that change and fix it.
The other thing we can specify is the maximum in flight changes. This is maximum number of non canary instances to update in parallel. Once the canary update has taken place and we’re comfortable with it, we can also update the system in portions rather than doing the whole lot. Again, it’s this rolling upgrade process, which is preferred process if you’re doing in place upgrades of systems. That’s a little capability, if you like, of BOSH, which is the canary update.
Really, it provides a very effective mechanism to manage distributed systems and to update the software on distributed systems. It is opensource software, it’s on GitHub – I’ll provide the link for you as well. It is part of the magic and the secret sauce that works behind the scene of Cloud Foundry to make it work on other cloud providers and across cloud providers, and to make deploying Pivotal CF straight forward in many environments.
I hope that’s been interesting and useful to you. That’s BOSH, the BOSH outer shell, and a little bit about how software can be deployed efficiently in distributed systems. Again, we love that you’re listening to the podcast, giving some great feedback and some great suggestions for new episodes. If you have a suggestion please email us, firstname.lastname@example.org. We love to get your feedback, your suggestions, your recommendations, etcetera. Until then, please share it with others, let people know the podcast exists, and keep on building.
Thanks for listening to the All Things Pivotal podcast. If you enjoyed it, please share it with others. We love hearing your feedback, so please send any comments or suggestions to podcast@pivotal .io.
About the AuthorMore Content by Simon Elisha