BOSH Fundamentals for PKS Administrators

March 29, 2019 Tyler Britten

BOSH is a Cloud Foundry Project that automates the deployment and ongoing operations of distributed systems. While it can be used for almost anything, it’s primarily used by Cloud Foundry for both the Application Runtime (CFAR) and Container Runtime (CFCR).

Longtime Cloud Foundry operators are likely familiar with BOSH. It powers the platforms that support workloads for many Fortune 500 customers. But if you’ve just recently deployed Pivotal Container Service (PKS) for your Kubernetes environment, you might want to know a bit more about the release toolchain that’s powering your deployment.

BOSH fully automates the deployment of your PKS clusters. It also restarts failed components, recovers and replaces failed VMs. BOSH even handles seamless cluster upgrades for you!

Since I hadn’t worked with BOSH in a while, I figured I’d document my recent experiences with the tech. This isn’t a comprehensive BOSH guide. It’s more of a “cheat sheet” for a few helpful key commands and concepts you’ll likely need if you run PKS.

 

Getting Started with BOSH

BOSH is a very powerful (and therefore can be complex) system to automatically manage and deploy platforms like PAS and PKS. You don’t need a PhD in BOSH to get a lot of value out of it. Let’s first set the table with some high-level concepts:

  • Ops Manager- This is the first thing you deploy when you install PKS. It is where you configure your underlying infrastructure integration and where we’re going to interact with BOSH.

  • Director- The Director is the core orchestrating component in BOSH. The Director controls VM creation and deployment, as well as other software and service lifecycle events. Once you configure the infrastructure tile (in my case GCP) in OpsManager, the director is deployed in its own VM.

  • Bosh-CLI - the command line tool to manage BOSH.

  • Task Logs- Exactly as it sounds. Any work the director does (even just listing resource) is a task, and each one is numbered and logged.

  • Deployments- A collection of resources defined in a manifest. In the case of PKS, each cluster is a deployment

  • Instances- the VMs that BOSH has created and is managing.

 

Setting up the BOSH CLI

We’re going to use the CLI to interact with BOSH. You can download it and set it up on your own machine, but we’re going to use the copy installed on the Ops Manager.

If we ssh into the OpsManager vm and run the bosh env command:

-------

$bosh env                                                                                                                                                                                                                                           Expected non-empty Director URL


Exit code 1

$

You can see that we don’t have the proper configuration for the CLI to work. The easiest place to get the config is through the OpsManager GUI:

 

After clicking on the BOSH Director tile, and selecting the Credentials tab, you’ll see an item appropriately named “BOSH Commandline Credentials.” and if we click on that we’ll see some JSON that looks like this:

{"credential":"BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=xxxxxxxxxxxxxx BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.10 bosh "}

So the data field for the “credential” key is what we need to run our BOSH CLI. If we copy and paste it into a shell prompt and add the env subcommand:

~$ BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=xxxxxxxxxxxxxx BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.10 bosh env

Using environment '10.0.0.10' as client 'ops_manager'



Name      p-bosh
UUID      xxxxxxxxxx
Version   268.2.1 (00000000)
CPI       google_cpi
Features  compiled_package_cache: disabled
         config_server: enabled
         local_dns: enabled
         power_dns: disabled
         snapshots: disabled
User      ops_manager


Succeeded

~$

Well, we can see that worked but who wants to use that whole string every time? There’s a number of ways to set those BOSH environment variables but to me, the easiest is to alias the whole string:

alias bosh="BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=xxxxxxxxxxxxxx BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.10 bosh”

Now if you want to make that persist, you can add that command to the last line of your ~/.bashrc. If we just run `bosh env` we can see we’re all set.

 

Basic CLI Commands

Now that our CLI is working, let’s cover some important commands you’ll want to know.

bosh vms

This command will list all the deployments and their associated instances that the director knows about and is managing.

~$ bosh vms
Using environment '10.0.0.10' as client 'ops_manager'

Task 667
Task 666
Task 667 done

Task 666 done

Deployment 'pivotal-container-service-xxxxx'

Instance                                                        Process State AZ IPs VM CID                  VM Type Active
pivotal-container-service/xxxxxxxx  running us-central1-a 10.0.0.11  vm-xxxxxx large true

1 vms

Deployment 'service-instance_5a677c82-5be2-42f0-ac73-010e53953a21'

Instance                                     Process State AZ IPs VM CID                                   VM Type Active

master/5efdd3d8-ccdd-4523-8dde-65cd4eaf8c75  running us-central1-b 10.0.11.10 vm-6f5cae68-a151-4310-4cb9-43c989b79701  medium.disk true
worker/90ad1b1f-bb69-4af5-8780-a930639fcf49  running us-central1-b 10.0.11.12 vm-dc87366d-659e-4047-4f3f-f2eef5a968aa  medium.disk true
worker/ee0176c5-26b6-4923-b5e8-02fc67a1e7da  running us-central1-a 10.0.11.11 vm-ecf472bc-12a1-46eb-6b2c-0f50163433c2  medium.disk true
worker/f9dd3292-1960-478a-9cf5-071b59c4d843  running us-central1-c 10.0.11.13 vm-f7073cf3-b271-4292-50de-8eb8c1c5daf0  medium.disk true

4 vms


Succeeded

:~$

Here’s the output from my PKS environment. Some things to look at:

  • At the top, you can see the task number. As I mentioned earlier, every operation is a task, has a number, and is logged. It’s useful to know the number of a given task for troubleshooting as you’ll see later.

  • There are two deployments—the first one is the PKS one which has a single API VM (this is what you interact with when you run PKS CLI commands) and the second is a single PKS cluster I have with one master and three workers. (I’ve exed out the UUIDs on the PKS instance since I’ll keep that but the PKS cluster is already gone so I won’t bother hiding those UUIDs)

  • For the VMs, the name on the left side (Instance) is what BOSH calls it and the right side (VM CID) is what the cloud provider (in this case GCP) calls it. You’ll need the former to interact with vms using BOSH commands and the latter if you need to do something on the cloud provider side (adjust security groups, etc).

  • The ‘Process State’ tells you if the BOSH agent is up and running and communicating with the director. In this case, running indicates it is.

bosh tasks

This will show you a list of currently running tasks and details about them. If you want to see recently completed tasks you can add the `-r` option.

bosh task [taskid]

This will show you a running log of whatever task is in process, similar to a docker log -f command. If the task finishes (even if another one starts immediately after) the command will exit. If there is no task running, unsurprisingly it will respond with ‘No task found.’

If you optionally add a task ID number (remember earlier when I said knowing the task ID would be useful?) it will show you the log of that completed task.

In either situation, you can also add the --debug flag for the full debug details.

bosh ssh

If I want to ssh into one of the vms for whatever reason, I’ll use the bosh ssh command. This logs in with a temporary ssh key and a home dir that gets wiped out. The syntax is:

bosh ssh -d <deployment> <instance>

example:

~$ bosh ssh -d service-instance_5a677c82-5be2-42f0-ac73-010e53953a21 master/5efdd3d8-ccdd-4523-8dde-65cd4eaf8c75

bosh delete-deployment

Let’s say you misconfigured your cloud credential permissions which led to a failed PKS deployment that it’s unable to delete (of course, that would never happen as you follow all the documentation perfectly). You’ll be able to clean it up using bosh delete-deployment as so:

bosh delete-deployment -d <deployment>

This command can fail as it tries to gracefully clean up the Instances so if it can’t it will stop. You can add --force at the end to have it ignore those errors and proceed

bosh stop --hard

This command shuts everything down in the deployment. It’s very useful if you’re running on a public cloud and want to save money when you’re not using it. The syntax is simple:

bosh stop -d <deployment> --hard

bosh recreate

This command allows you to force recreation of a whole deployment or individual instances.

bosh recreate -d <deployment> 

Adding --fix will to recover an instance with an unresponsive agent instead of erroring out.

These are just a few basic commands to get you started. The full BOSH CLI reference is available here. Hopefully, this will act as a good quick reference when you’re working with your PKS deployments. For more details on the BOSH architecture, there are some good diagrams available in the documentation.

If you’d like to learn more about BOSH and how it can help automate other operational tasks in your environment, check out this great whitepaper or watch this detailed talk about BOSH by Pivotal’s CTO of Cloud, Colin Humphreys.

About the Author

Tyler Britten

Tyler has spent the last 18 years working with cloud, virtualization, and infrastructure technologies. Prior to joining Pivotal, Tyler worked in technical marketing and developer advocacy roles for Red Hat, IBM, and EMC. He also worked as a consultant and a network engineer for a Fortune 1000 company. When the computers are off, he likes to spend time outdoors hiking, biking, snowboarding, or tailgating at a Penn State game. He also has his pilot’s license and enjoys cooking and brewing beer.

More Content by Tyler Britten
Previous
Equal Pay Day—The Pivotal Way on Equal Pay
Equal Pay Day—The Pivotal Way on Equal Pay

Pivotal marks Equal Pay Day 2019 by publishing our own pay data for US employees.

Next
Pivotal Cloud Foundry 2.5, Now GA, Harnesses the Power of Istio and Envoy to Make Your Developers More Productive
Pivotal Cloud Foundry 2.5, Now GA, Harnesses the Power of Istio and Envoy to Make Your Developers More Productive

Pivotal Cloud Foundry 2.5 is now GA. We review top enhancements, including a new routing tier powered by Is...