Cloud-Native Operations in 6 Parts. Super Suit Not Required

February 8, 2017 John Allwright

Operations teams are the unsung superheroes of the software economy.

They design, procure, install and configure the infrastructure. On day 1 they'll deploy and test the apps on this infrastructure. Day 2 onwards they provide 24x7 on-call support for the apps in production, fire-fighting when the lights blink red.

Epic stuff.

Yet in the eyes of the CEO, IT operations is still regarded as a cost center to be shrunk, whose advice is only sought when the system's down or there's a security breach.

Pivotal’s Casey West captures the sentiment in his talk:

“Good Job Patching the Kernel” - No CEO Ever

For developers, cloud-native practices have driven a revolution in their day to day work, made possible by the availability of ubiquitous, cheap computing power - the cloud. Cloud-native development brings together agile development practices, microservice architectures, and continuous integration/deployment. All-in-all, a host of business transforming changes that speed software development, turning it into a key tool for driving business strategy.

CEO, attention grabbed.

Meanwhile, the operations team are expected to deploy and operate the developers’ code, running it securely, reliably and available to customers whenever they demand. It’s no surprise that the only time the CEO wants to hear from operations is when there’s a security breach or a service is crashing or slow. The wrong type of CEO visibility.

There’s a natural tension between developers wanting to go faster and faster versus operations whose only means to stability and predictability is to slam on the developer brakes. Enter self-service cloud resources and developers bypass in-house IT for the path of least resistance, growing the rogue-IT estate. IT is leaving the datacenter; cloud-native is this is the opportunity for the operations team to reclaim it as they evolve into more interesting and strategic roles.

High performing operations teams have innovated in waves to match step with the growing role of software in business, Colin Humphreys, Pivotal’s Cloud CTO thinks it’s time to bring on the cloud-native wave. Based on his 20 years experience of operations, he's seen a common progression to cloud-native maturity:

  • Manual Server configuration
  • Repeatable shell scripting
  • Single server configuration with Puppet/Chef/Ansible    
  • Cloud Native Operations: Day 1, Day 2 and beyond with BOSH

Pivotal’s philosophy for cloud-native operations is based on our agile experience at Pivotal Labs, running Pivotal Web Services at scale, customer and partner best practices including Google’s SRE and a wealth of Pivots’ experience from all corners of the industry.

We’ll share our thoughts on cloud-native operations as a series of blog posts, webinars, whitepapers over the coming months. It’s a rapidly changing space and we’d love to get your thoughts and comments as we embark on the journey together.

Out of the gate, we have a 6-part look at cloud-native operations:

Part 1: The Cloud-Native Ops Opportunity

Maybe you’ve already optimized your software-development lifecycle and your delivery pipeline? great! But what if your day 2 reality is still full of bespoke scripting, manual intervention, and late-night outages? That’s a sure sign your business is missing out on the benefits of cloud-native operability and your operations team are stuck toiling on pager alerts rather than advancing their career on with more interesting and impactful roles.

In this first installment we’ll explore the ‘why’ of cloud-native operations for CEOs, operations and development teams.

Part 2: Automated Ops; freedom to innovate

The age of the cloud unleashed unprecedented compute scale and, in turn, has made possible many new and innovative solutions. IoT, Big Data and Machine Learning demand server resources orders of magnitude larger than 10-15 years ago, yet established operations practices aren’t ready for this explosion in scale, continuing to treat servers as Pets instead of Cattle.

With containers, you could argue that we also need to extend the analogy to Pets and Cattle to Ants. But the message is clear; we’re now operating at a scale, speed, and complexity level that only computers can deal with. Manual monitoring and intervention is too slow and brings a proportional increase in the probability of failures through human error.

Such a high level of repetition or "toil" is also tedious for teams.  

Cloud-native operations can help recast operations roles to be more interesting and rewarding. Using software to manage servers we can scale massively and direct human interactions at a meta level, with a focus on making incremental improvements and aligning with strategic goals and process improvement.

From a recent McKinsey research article on automation:

As roles and processes get redefined, the economic benefits of automation will extend far beyond labor savings. Particularly in the highest-paid occupations, machines can augment human capabilities to a high degree, and amplify the value of expertise by increasing an individual’s work capacity and freeing the employee to focus on work of higher value.

In this post, I’ll look at how a cloud-native platform automates traditional undifferentiated heavy lifting, repetitive ops tasks and how this transforms the role of the ops team from keeping the lights on to driving business success. I’ll also look at how cloud-native operations open up the possibility of evolving the procedural Infrastructure-as-Code to the declarative Infrastructure-is-Code with BOSH.

Part 3: Seeding & Feeding Cloud-native Operations

How to get started with cloud-native operations? There are many parallels with the DevOps movement as it approaches its own chasm-crossing moment. Enterprises appreciate the potential in DevOps, embarking on the bi-modal IT journey with “tiger teams’ embracing the approach while the rest of the company continues with the old ‘throw it over the wall” mentality for putting apps into production.

Yet taking the movement from “tiger team” to company-wide adoption too often stalls at this point. Many C-suite execs aren't willing to take on the widespread organizational, cultural, technology and process change to mission critical applications. And yet that’s what they must do as they’re risking the future of the business and their personal success by not making this change. We’ll often see our customers hire external IT execs with a track record in cloud-native success to cross this chasm, overcoming internal barriers to jumpstart the move to cloud-native.

Pivotal’s experience changing the software development culture and practices at Fortune 1000 companies provides us with a proven model that we’re applying to cloud-native operations. In this post, I’ll dig into what that approach looks like and discuss how operations teams are transformed and their culture scaled across entire businesses.

Part 4: Cloud-native Operations - Architecture and Monitoring

When it comes to infrastructure design, once again we look to agile development practices. Software defined datacenter and cloud models convert hardware into APIs. Long range planning of infrastructure architecture for capital budgets and space planning is a thing of the past. Elastic infrastructure via API supports a more agile approach and the concept of JEA - just enough architecture - as a base configuration that can be evolved and improved over time. Cloud-native platforms and technologies like BOSH enable a malleable, dynamic architecture with predictability and security.

Part 5: Cloud-native Operations and the Developer Dance

In the world of the cloud-native platform, developers take the limelight, building and deploying software quickly and frequently to drive the business' digital transformation agenda.  Meanwhile, operations embrace DevOps culture, process, and tools to meet the needs of developers.

The operations team gets to watch the developers’ dance, taking satisfaction in their elegant, streamlined cloud-native platform that helps developers succeed. The elimination of on-call rotas, regular all-nighters to triage the most gnarly OS patch conflict is a major bonus too!

In this post, I’ll discuss the elements of the developer experience that cloud-native ops provide and how that aligns directly with business goals and success.

Part 6: Real World Cloud-native Operations - Customer Stories

To wrap up we’ll look at the real-world experience of deploying cloud-native operations among Pivotal's real world customers including Ford, GE, FedEx, The Home Depot, Comic Relief and Allstate. How do they approach operations? How do they organize their teams? What hurdles did they overcome and how did they do it so far? What success have they seen? How has cloud-native transformed the operations teams and their profile within the business?

Thanks for reading and I hope you’ll stick around for the journey and hang up your Super Suit too!

Sign up here to get notified each time a new operations article is published. We’d love to hear from you!


About the Author

John Allwright

John works in product at Pivotal, on a mission to help organizations get the most out of software and the cloud. His current focus is Kubernetes and the Pivotal Container Service (PKS). Prior to Pivotal, John launched application platforms at IBM, BEA Systems and Microsoft.

Follow on Twitter Follow on Linkedin More Content by John Allwright
Equal Opportunity Product Development
Equal Opportunity Product Development

Building accessible products for everyoneDC Pivots Gabriel Ramirez (L) and Adam Bray (R) testing features w...

How We Interview at Pivotal
How We Interview at Pivotal

Pivotal does not conduct puzzle interviews. Pivotal interviews involve real work, so candidates can show th...