Just a few weeks ago, Cloud Foundry Summit took place in Santa Clara, and one of the most compelling talks was Running Cloud Foundry in a Compliance and Security Focused Environment by Diego Lapiduz and Bret Mogilefsky of 18F.
If you haven’t heard of the organization, 18F has the onerous job of shepherding the digital transformation of the U.S. government. One of their more notorious projects is Cloud.gov—a Cloud Foundry-based platform for government innovation, which is built “by government developers, for government developers.” The platform is helping many federal agencies move into the modern world of cloud development. And, the two presented how Cloud Foundry’s development paradigm is transforming a 6 to 14 month compliance effort into a continuous, agile process which can support daily and weekly builds.
That’s right—the US government is using Cloud Foundry to satisfy compliance and deliver daily builds.
Shipping Code Is A Huge Bottleneck Because Of Compliance
The presentation started with the problem at hand. Mogilefsky described it as “having a freeway run into a country lane” because the compliance process brought releases to a standstill.
To give you an idea of what it is like to address compliance in a government organization, teams typically write 200 to 1000 pages of documentation for their whole stack for every single software release. Before apps can get the stamp of approval or authority to operate (ATO), compliance experts spend 6-14 months in this stage, reviewing the the app and documentation alongside 4000 pages of regulations.
Not only is the process extremely tedious, it stifles the stream of innovation and distances developers from important feedback loops. This process is also poor for team momentum and morale.
So, 18F decided it was high “time to disrupt infrastructure and compliance in government,” and worked to automate or “continuously deliver” many aspects of compliance.
Automating The IaaS
To start, 18F chose Terraform by Hashicorp to automate both AWS and their Concourse.ci setup, then Concourse.ci is used for automating app deployment. Lapiduz pointed out that this tooling, alongside Cloud Foundry, gives them the ability to migrate applications across clouds—one of the most valuable capabilities they see.
Lapiduz further explained how BOSH and Concourse automate the PaaS layer, sharing how they use the public Cloud Foundry release and their own custom manifests from GitHub. With GitHub, they get one authoritative source for everything, and they know about any changes to any repos as well as pull requests, etc.
This BOSH-Concourse combination provides many important benefits. First, pipelines and deployments are repeatable, consistent, and can be triggered at any time—they are automated across the stack. There is also a commit history and audit trail for all repos, BOSH logs, and Concourse logs. So, they know exactly what is in production and what changed. The compliance group usually works in other environments that require extensive log analysis, but cloud.gov’s logs support auditing quite easily.
In fact, it supports and automates change review board processes in depth. “That kind of blows their minds in federal compliance and architecture,” explained Mogilefsky. Given the multi-cloud support, the cloud.gov team can also quickly redeploy to other regions if disaster strikes. This setup satisfies regulatory controls and security needs for “no humans in production,” making iterations much faster with much less effort.
Altogether, automation has removed variability and waste at Cloud.gov across the board.
Given security’s role in compliance, the team uses BOSH stemcells for full automation here—to provide an OS skeleton, minimum common utilities and configuration files, and the BOSH agent. This makes for secure configuration by default. They use Cloud Foundry’s User Account and Authentication Server (UAA) to provide identity management, primarily as an OAuth2 provider. Lapiduz portrayed user management as “fantastic,” with well-organized permissions. Their UAA setup actually delegates user access to an upstream SAML provider who has implemented multi-factor and biometric authentication.
Their team also uses BOSH releases for all security components, allowing an iterative approach to release, test, and then apply security to all servers with absolute consistency. Intrusion protection and detection, hardening scripts, vulnerability scans, SSH compliance checks, integrity checks, and log analysis components are consistently built into each server. And, the team can track problems back to source stemcells or buildpacks, rebuild, retest, and automatically deploy across hundreds or thousands of servers.
With this automation in place, security can be addressed continuously. In fact, they were recently able to respond to a vulnerability and deploy the fix to production in one day.
It may not sound intuitive, but 18F’s approach to automating compliance documentation is probably their most innovative practice. In a nutshell, their ultimate goal is continuous compliance.
While the team can make things easier by using templates or copy and paste from previous documents to create the 200-1000 pages of necessary documentation, they saw an opportunity to really change the game by eliminating documentation as a separate process. The 18F team decided to re-think compliance documentation and treat it like code—just like tests and infrastructure can be treated like code. They architected the documentation “app” so that output could be configured and composed from different pieces of content, like standards, certification, and components in the stack.
They decomposed documentation into each module, “coded” them in YAML, stored them in GitHub, and made them all part of the build pipeline through Concourse.ci. Developers and documentation team members create branches, make commits, tracks changes, and discuss alternatives just like code. The YAML files also render into the FedRAMP HTML template, as shown at compliance.cloud.gov. They have designed the documentation to include tests, based on behavior driven design, that run and confirm that what is in the docs is true in the system—for example an admin console in AWS is switched to on. This means auditors can validate the tests once and avoid repetitive manual reviews. It also confirms the documentation matches the system state with each and every build.
Watch The Video:
- Download the slides
- Check out 18F’s open source project on GitHub at 18F Compliance Toolkit
- Learn more about Cloud Foundry or Pivotal Cloud Foundry from product pages or the blog
- Check out all the Cloud Foundry Summit videos
About the AuthorMore Content by Adam Bloom