How Pivotal Cloud Foundry Transformed IT at Comcast

July 9, 2015 Abby Kearns

sfeatured-CFSummitDeploying Pivotal Cloud Foundry within an enterprise as large and multifaceted as Comcast presents a number of challenges. The platform must scale while being capable of serving a number of application needs, spread across multiple teams. During a talk at the Cloud Foundry Summit in May, Comcast’s Neville George, Sam Guerrero, Tim Leong, and Sergey Matochkin discussed the decisions the Cloud Architecture team made when they introduced Pivotal Cloud Foundry to the operations team.

Leong opened the presentation with a discussion of how the team managed custom URLs, while leveraging Pivotal Cloud Foundry’s global availability features. Not only did the platform need to route traffic assigned to custom domains, but it also had to support SSL authentication. This presented a need to manage multiple SSL certificates within a scalable HAProxy layer. The solution that Comcast’s team landed upon was to control HAProxy Config and Custom Host Header replacement using Puppet.

Matochkin followed with a discussion of how the team used Pivotal Cloud Foundry’s managed services to provide the company’s developers with self-service development environments. The earliest managed services that Comcast’s developers needed were Logger and an Outbound Proxy layer, which were key to ensuring the security of their applications. The team followed three principles when creating their framework – it had to be easy to use, capable of supporting the entire life cycle of applications, and be updatable without major disruptions or data loss. To achieve these goals, the team chose Pivotal Cloud Foundry, in conjunction with Docker and OpenStack.

The presentation concluded with a look at how the team implemented Pivotal Cloud Foundry from an engineering perspective, courtesy of Sam Guerrero and Neville George. As in many enterprises with a mature IT infrastructure, transitioning to a new architecture is a large and imposing project. Pivotal Cloud Foundry represents a significant shift for IT teams, from provisioning VM’s for developers to switching to a self-service model that empowers developers to build customer-facing applications in a rapid, agile manner. These capabilities are bolstered by the platform’s automation capabilities and ability to monitor performance and predict potential outages.

For the more details on how Comcast is embracing the Platform as a Service, check out the transcript, or watch the full Summit talk below:

Learn More

Transcript

Tim Leong:
Where application development teams use our platforms to run and develop key applications that some of you might be familiar with as Comcast customers. So these platforms can include things like OpenStack or VMware and obviously Cloud Foundry. Just a quick note, next week for the OpenStack summit we’re all also going to be present there, so anybody who’s attending that, we look forward to seeing you at that conference as well.

So anyway I sit on our Cloud Architecture team. So we provide strategic direction for cloud services. And it was actually our team that made the decision to go with Cloud Foundry as opposed to some of the other past providers that exist out there. I welcome any conversation about why we decided. Why we made that decision, with any of you throughout the conference. What I’m going to talk to you about is a challenge we came up with in supporting custom URL’s for our customers.

So I’ll be talking to you about that in a little bit. Sergey is our application platform architect, and what he does, is he works with our development teams and makes sure they are leveraging proper architectures and design patterns that fit well within the cloud. I think everybody’s aware of the 12 factor app. And Sergey is the champion for that in our company, so he’s going to talk to you a little bit about some of the custom service brokers that he wrote that provide a lot of value to our development teams. Sam and Neville are our cloud engineers and they work on our engineering team and they’re going to talk to you about what it’s like to take Cloud Foundry and run it within an engineering team and the kind of change in mindset that takes. That will be pretty interesting as well.

First challenge I’m going to talk to you about is custom URLs. This seems like a relatively easy problem, but it added some complexity for us. Obviously Cloud Foundry supports custom domains within Cloud Foundry. It allows people to choose their own host name so that their URLs can be whatever they want to make them.

However, when you add things like global availability, making sure that a single site can be hosted on multiple Cloud Foundry instances. And then the URL hosted at a GSLB layer, so it can be globally available or geographically available can present some challenges. That URL, once it makes it down to Cloud Foundry, how does it route that traffic now that it’s trying to handle a URL that is foreign to it, and it has to be supported on both sites? And then how do you enable SSL for some situation like that? And then also, how do you make it on demand? So the first thing I’m going to talk to you about is HTTP host header replacement. So basically when that URL makes it down to a local Cloud Foundry instance we have our load balance layer do header replacements on the HTTP layer. And what that will do is allow Cloud Foundry to understand where to route that traffic based on how our HAProxy layers translate one URL into a locally hosted URL. And that would enable GSLB support so that people can have a globally available URL that translates properly once it makes it down. And then multiple SSL certificates, so when you have multiple URLs that need SSL enablement, you’re going to have a bunch of certificates. And those certificates will need to be hosted on your HAProxy layers. And they’re going to be multiple certificates for a single HAProxy layer. So that presented some challenges for us as well.

How do we get around that? So what we do is we leverage Puppet. So Puppet is responsible for making sure that HTTP header replacements are properly injected into the HAProxy configs. We put Hiera in front of that so that the values are stored in a database. And what that enables for is that you can put any web server in front of that, any UI that you want to put in front of your Hiera database and it will dynamically update the database and dynamically update Puppet and then update the HAProxy layer. This works well for HTTP headers and we can make it on-demand for our customers. And it also works with SSL certificates. So if our users need SSL certificates, that are custom or specific to their application they could do it through that service as well. And as long as your HAProxy layer supports SNI you can support multiple certificates for a single IP that’s hosted on your HAProxy layer. So that’s the first challenge I wanted to talk to you about and next I’m going to pass it off to Sergey, who’s going to talk to you about some of the really cool work he’s doing with custom services and custom service brokers.

Sergey Matochkin:
Thank you Tim. Hello everybody. My name is Sergey Matochkin and I am working on cloud architecture team and mostly responsible for a layer between Cloud Foundry and our developers, our developing community. Today I want to focus on one aspect of Cloud Foundry is managed services and managed services API interface. Cloud Foundry provides a great, very convenient way to create managed services like MongoDB, Rabbit MQ, you name it. So Cloud Foundry comes And managed services can be created with just a one common line or a few API calls.

Our development team started to release Cloud Foundry to our development community in Comcast. They immediately started to use it and they see value for development process because it gives them the freedom to start making services right away and use it and remove it, we don’t need it. So it’s completely self service, we don’t need the help from anybody. But we use this attachment with this likeness of managed services, they start coming back to us and asking, “Is Kafka supported in managed services?”. If something else was supported to managed services. So we quickly realized that there is a good demand for managed services, and we need to expand our library of managed services. It is something we need to create on our own. First couple of managed services that everybody asked and we absolutely feel needed to be created right away are Logger and Outbound Proxy. Logger, this is sort of obvious, Cloud Foundry has log aggregator but the actual consumers need to be able to store this application log somewhere and be able to access and search them. And the second thing is a proxy layer. Proxy layer is required for increasing security of our applications, because we want to very strictly control communication between our applications and partner source or parties like Amazon Web Services and such. So with understanding of the need to extend our offering of managed service library in Cloud Foundry we developed three principles that we need to follow to create our framework to extend the library development efforts. It should be easy and simple to use because we need to continue extending the library. And the last, but not least is support service life cycle. Particularly, we need to be able to update our applications without any major disruptions and data loss. So with this in mind we decided that we want to use three building blocks, mix three building blocks together to build our framework for managed services. And those building blocks are Cloud Foundry, Docker and OpenStack.

Well OpenStack is a very convenient infrastructure of the service platform that allows us to add computer storage or networking sources to our managed services platform as needed. So that is a perfect tool to support our guiding goals. Docker is here … Well it’s just because it’s Docker right? Everybody loves Docker. We want to have Docker here. That is actually a justified joke. We were able to justify it. We were able to justify presence of Docker here. And the justification is, you can develop Docker containers and guarantee that it will be around consistently across different departments. Second is, Docker provides just right level of isolation that we need and it’s very economical to run because we can run multiple Docker containers on the same VM, it doesn’t have much overhead. Also Docker is convenient because Docker helps to support application life cycle. We can do updates, we can use Docker images to manage our service life cycle. So with these building blocks we need to put some glue together to build the solution. And here on the right you can see a pool of OpenStack VM’s that we run in OpenStack, and each of the VM containers at any point in time can run several Docker containers. That actually each Docker container represents a service. To manage the pool we have created Docker pool controller. So Docker pool controller is responsible to track and manage all the resources in the pool, including VM’s, including Docker images, Docker containers, portal occasions, storage. All this is managed by the pool controllers that contains three elements, container manager, database of the resources, and capacity manager. Capacity manager provides constant … Its ability is capacity of the pool and insures that at any point of time we have enough resources in the pool to spin up, more services to spin up, more containers. So this way we don’t need to wait for new VM to boot we already have prepared enough resources for next few services to start. And container manager is the core of the solution.

Container manager is actually responsible for spinning up, bringing up new Docker containers and services inside Docker containers. Or tear them down based on their request from the consumer of this resource. And consumer is … Actually, this element altogether you see here is a service broker. So for those who are not familiar with the service broker interface and API in Cloud Foundry, Cloud Foundry controller on the top … When it is to provision a service it talks through service broker API. So service broker API is very simple, it’s literally like a five restful calls that needs to be implemented. Service broker API is defined how service … How Cloud Foundry controller requests need services. That API is easy to use, but it has nothing to do with actual provisioning infrastructure. So that’s why we put Docker pool controller to manage all the infrastructure elements. And once we have Docker pool controller in your new horizontal here … This is our services library … Becomes a trivial task. Just as an example, this is a technical conference right? I want to show an example of a request response to Docker pool controller. So in this case the service broker is asking, go and create a new Docker container using this specific image, Comcast layer for this example. Allocate 1GB of memory for this container and expose a couple of ports. Port 80 and 5000 to the consumer.

When Docker pool manager gets this request it checks inventory of the resources available. It identifies the VM that came around that specific image that has enough memory and resources. It allocates ports for port mapping, port assignments and start a new Docker container. Then it returns back to the requester information about that container on how the container can be accessed. Not exactly … Not the container, but the services. It provides entry points to all the map services back to the requester. So that was a sample of this call on this API of Docker pool controller API. With this all elements in place, we now have enough to fill all our three goals. We can very easily extend our library, our offerings for managed services because implementation of this layer becomes trivial. And we do all the the production of the actual infrastructure through very simple, straight forward API. We have scalability thanks to OpenStack and capacity manager. And we have the ability to manage life cycle of our services through the mechanics provided by Docker. That’s it on this part and the next section I want to pass to Sam, to my friend Sam. Sam is from engineering team and he’s going to talk about how introduction of a Cloud Foundry platform service changed mindset of engineering and supporting. Thank you.

Sam Guerrero:
Have a pretty busy slide there, so I’ll give you some time to take pictures. So hello, my name is Sam Guerrero and as Tim mentioned I work on the Cloud engineering team, along with my colleague, Neville George. Today, I want to spend a little time talking to you about our experience from an engineering perspective with implementing Cloud Foundry. First I’d like to thank everyone for the opportunity to let me share a little bit of our story with you today. This is my first Cloud Foundry Summit and I’m really excited to be here. So at Comcast we have a really small engineering team compared to the enormous virtual footprint that we have. So the thought of bringing in a new architecture was a little daunting for us at first. You know we thought, there’s a lot of things that may change for a service model that’s been really successful for us. But I need to remind myself, that’s kind of what I was thinking 12 years ago when I was handed eight servers and asked to see if I could get VM or ESX to run on them. So over the last few years, where the infrastructure of a service team where the focus has really been how quickly can we deploy VM’s. And then how can we automate those processes? Well that’s great for most teams and it’s really an obtainable goal. But it leaves our developers and application owners, our customers with quite a few tasks to have to complete after receiving their VM or group of VM’s. So, I’m sure as most of you know, receiving a new VM kind of leaves you with a little bit of a black hole. I mean you have a nice VM but there’s quite a few things to do with it after that. So we wanted to kind of change that for our customers.

With Cloud Foundry we’ve introduced a paradigm shift in thinking for our architecture and engineering teams. You know, we want to change our mentality where we really focus more on the end product of the services we provide versus just kind of deploying a VM quickly. You know we have to really focus on lowering those barriers of innovation for our product teams and our developers. So with Cloud Foundry we really introduced a self service model to our teams for applications and developers. Well that’s really decreased the time between release cycles for these teams and really helped them out. But the key to that agility is really a careful coordination between developers, architecture and engineering. You know, we have to be more involved in the end now to make sure that we are part of that process to offer more of a holistic service model and service offering. And we do that by kind of exerting ourselves further along the assembly line, if you will. With Cloud Foundry, it’s really offered … It’s offered more of a self service model for our application and development teams. With that model, what it’s doing for us is it’s actually … It’s allowing us to be more engaged. And what we have to do now, is we can no longer say that it’s okay to give our customers a brand new car that they have to go home and assemble the transmission before they can drive it.

We believe that if we make our factory better everything else will improve. So we have had some technical difficulties or challenges, not difficulties, but challenges with most new things when introduced in Cloud Foundry. Some of those challenges have been, having to maintain our CMDB to really reflect back from Cloud Foundry to our applications. Before it was really easy. We had an application that we would map to a VM. Then we’d map to an application owner or group. And another thing is, you know with network, so we’ve had to really expand a lot of the services we provide by now getting more involved with firewall and GSLB and load balancing. Things that you know, we really didn’t do before. They were really more on the application owner to figure out how to get their VM’s to run. And then finally, you know, just maintaining Cloud Foundry itself. Learning how to deploy build packs and create custom build packs. How to introduce new stacks. How we were going to maintain just the releases of Cloud Foundry in general, which can be a little bit on the aggressive side for a team like ours, that we really weren’t heavily involved in a lot of open source or community driven projects in the past. So a lot of that was new to us. So we found that these technical challenges weren’t really as big as we thought they would be. And they’ve actually given us a lot oh new opportunities that we didn’t really expect. We’ve learned to really interface more with our customers, where as before we were just kind of in our engineering hole, we kind of did our … We gave it a platform and it was kind of your VM to take care of from then on. It’s also helped us understand more about how the products that we provide, the service that we provide really go to the in-line. What we’re trying to really do at Comcast. It’s helped us understand what our applications do and how they affect they affect the business and how we’re more a part of that process now. And it’s also helped us become more T-shaped engineers. You know, it’s really increased our set of skills that we have, and it’s really helped us kind of develop and learn this new model, that now we’re part of, this DevOp’s model that is a really exciting place to be right now. So our experience with Cloud Foundry so far from an engineering perspective has really been positive. I mean, it’s really helped learn a lot of new things and it’s helped us really focus and learn about, you know, all these products, and really the end goal of agile product development and time to market. With that, I’d like to thank you one more time and I’ll pass the mic over to my friend, Neville George.

Neville George:
Thank You. Hi everybody, hopefully you guys can hear me. So my name is Neville, I work on the Cloud Services engineering team, along with Sam. I have to say Sam’s a very nice guy. Every time Tim and Sergey come up with ideas we still have to support them and keep our sanity. So it’s really, real nice of him to do that. So what I will do today is talk about some of the operational aspects of Cloud Foundry that we have in our Cloud Foundry environment at Comcast and some of the tools and things that we have done in our environment in order to support the Cloud Foundry instance that we have at Comcast. I’ll talk about some of the proactive monitoring stuff and also about visibility into your environment, as related to Cloud Foundry. And how they have helped us, what we have done, what are the tools that we have used in order to support the environment. So starting off with proactive monitoring. The success of any engineering team is the ability to actually prevent an outage, right? Proactively monitoring, looking at the key performance indicators, to know what is building up in order to make an outage. In addition, you know it would be great if you could actually reach out proactively to your customers or even better, if you can resolve problems.

Say for example, a customer called us for example if they are developing they are innovating and they are starting to run out of coders, if we can proactively manage that and make sure that they have enough space and stuff like that. It definitely helps, helps avoiding that midnight escalation call saying, “Hey we’re running out of space,”, and things like that. Also an aspect of how proactively you manage an environment it’s inevitable that there will be outages. So when an outage occurs, the most important thing is to make sure that it doesn’t occur again, right? What are the additional configurations that we can help and proactively manage all these things before we actually complete handing this off to the operational team. So we have actually chosen Nagios for our proactive management, there’s a lot of information available for you to configure what you want to monitor and things like that. Now it might seem very simple , but in a very traditional company most of the time you have off the shelf monitoring tools that are run by a monitoring team that have an SLA and that has an intake process, and all this takes time, right? So what we have done here, like Sam mentioned, the T-shaped person. So we manage the complete instance of Nagios. And we make sure that we set up all the counters and key performance indicators that we need to monitor. So in case there is a problem, and we feel that you know, hey x is not being monitored. We’d be actually able to do that, in say five minutes as opposed to like the OLA’s and SLA’s associated with a team that is outside our control. So moving on, we’ll talk about the visibility of the environment. It’s very important that we understand what is out in our environment and things like that. Cloud Foundry has a great CLI, that you can use to get a lot of information. The only problem is that it’s not a single ping, where you can see everything and click through everything. So we have had the same problems. What we found is a toolkit admin, UI tool. It’s available in the Cloud Foundry incubator.

Before I move on, a show of hands on how many of you know about the admin UI tool. Okay great, great. We have a few of us. But for everybody who doesn’t know, It provides a GUI interface for knowing your organizations, your spaces, who has access to your spaces, how many spaces you have, what are the quotas, what are the DEA’s, how are they being utilized, your utilization metrics of your DEA’s, how many applications are running on it. You can also … It also shows you the growth of your environment, in terms of organizations and spaces, and over a period of time how your environment has been growing. It also aids in certain operational aspects. So you could create organizations using the tool, You could apply quarters to your organization, and things like that. So it has been a very useful tool for us. That pretty much is everything that I had on the slide for us to talk about. I would like to close by saying Cloud Foundry has been great for Comcast. Having the T-shaped people as well as having the run your own business kind of mentality has definitely helped us make it better. So that is the end of the presentation.

About the Author

Biography

Previous
Spring Cloud Services 1.1 Now Available
Spring Cloud Services 1.1 Now Available

Spring Cloud Services (SCS) 1.1 is now available for download from Pivotal Network. This post describes the...

Next
Lessons in Agile: Six Months at Pivotal
Lessons in Agile: Six Months at Pivotal

A conglomeration of Pivotal process life hacks for all your newbie needs.

×

Subscribe to our Newsletter

!
Thank you!
Error - something went wrong!