All Things Pivotal Podcast Episode #12: Logging–How Hard Could it Be?

January 20, 2015 Simon Elisha

featured-pivotal-podcastWhen you are operating applications—be it one, or 1,000—logging is a key element of your operational model. Logs enable you to understand what is happening with all components of your system. More importantly, it gives you a baseline with which to determine what is “normal” in your environment. Without this baseline, the first you know about your system behaving outside the “normal” is via angry phone calls or aggravated social media feedback!

Too often logging is an “afterthought” incomplete and poorly configured. Common issues include elements of the system using different timezones for logging, or not logging at all!

But having comprehensive, consistent logging does not need to be difficult—in fact, it should be “baked in” to how all applications are deployed in your environment.

In this week’s episode, we look at the Cloud Foundry Loggregator component—what it does, and how you can connect it to 3rd-party log aggregation services.

PLAY EPISODE #12

 

RESOURCES:

Transcript

Announcer:
Welcome to the All Things Pivotal podcast, a podcast at the intersection or agile, cloud, and big data. Stay tuned for regular updates, technical deep dives, architecture discussions, and interviews. Please share your feedback with us by e-mailing podcast@pivotal.io.

Simon Elisha:
Hello everyone. Welcome back to the All Things Pivotal podcast. My name is Simon Elisha, CTO and senior manager of field engineering here in beautiful Australia and New Zealand. As always, fantastic that you could listen in. I hopefully have a nice bit of information to share with you today. In fact, today we’ll be talking a bit about logging. As many people say, logging, how hard could it be? Logs are really, really important in your life from an operational and support perspective. If you don’t have visibility of logs in terms of what your application is doing and what the system your application is running on is doing, you’re really flying blind.

This is the telemetry. This is the information flow that we rely upon to adequately monitor, diagnose, and resolve issues in our environment. Having good logs is a fundamental to having a well rounded IT department and having an organization that can tolerate the rapid change and rapid pressures that are on applications today. Too often I see organizations where logging is an after thought. It’s bolted on later on or it’s not comprehensive. There are massive gaping black holes because particular applications are either not logging or more commonly, those logs are not being collected.

Even more commonly, they do not share the time zone settings of the log entries, which could be a nightmare. What do we do from a pivotal CF or a Cloud Foundry perspective to handle application logging? There are two considerations. One is the way we gather the logs together. The second is where do we store them for the long term analysis of those logs? The first thing I wanted to talk about today was the loggregator. The loggregator is the is the Cloud Foundry component that does the logging for you. It’s responsible for logging. It provides a stream of log output from both your application and from the Cloud Foundry system components that work with your applications as well.

It’s logging the applications running on the platform and all the logs from the platform itself, which is nice. It puts them all into one place so that you can get access to them. What’s in a typical log line? You have four key components. The first is the time stamp. This will typically be a UTC time stamp. I recommend everyone to operate their world from an IT perspective in UTC and then adjust accordingly for that. The second component is the log type, which is an origin code. We’ll explain what those codes are shortly. The third is the channel, so it’s either going to be one of two, standard app or standard error.

The fourth part is the message itself. The message part is the opaque part to the loggregator. Essentially loggregator accepts the log entry from the application or the component itself and puts it into the message component. The nice thing is that it will not do anything to that message component, so the application itself can include its own time stamp data, its own other type of information. Whenever it’s touched or modified by the loggregator, the loggregator simply passes it through transparently. I’ve mentioned that there’s a log type field also known as an origin code. These are three letter codes that get applied to the front of the log line that indicate what that log is for.

If we were looking through streams of logs pausing, it could be quite difficult. Having some sort of key value or some sort of information can really help. What are these codes? Let’s run through them quickly. The first kind is API and this is when you make a call to the platform itself to make a change in application state. It could be making an update. It could be changing an action, etc. This will be prefaced by the code API. We also have a code called STG, which really is logged by the droplet execution agent whenever something happens in terms of staging or restarting an app. These are highlight things you want to look at. You say ah ha, something happened to my application at that point.

The DEA itself, the droplet execution agent will also be emitting logs when it starts or stops an application, and this is prefaced by the DEA code. The router is the thing that handles traffic between the user and the application. That’s HTTP, HTTPS requests from the application. All of that information in terms of that request is stored and logged, and that is prefaced by [RTR throughout our log 04:45]. The loggregator itself may do some logging and that will be LGR for loggregator. Logs [inaudible 04:53]. It’s a meta-log type situation. Then obviously every application will admit its own logs as the developer specifies to standard error or standard app. That will be prefaced with the APP or app code.

Also what we’ll do is [inaudible 05:11] to the code which instance of the app was running. That will be a number. Zero is the first, one is the second, etc. You can not just see logs coming out of your app in general. You’re seeing logs coming out of your app specifically by the instance that’s emitting that particular information. When you’re logging from your application, really it’s logging from two places, standard error and standard app. That’s the two places you want to go to. Other than that, it will not log. You’ve got to make sure that those get logged. It’s very easy to view the logs in real time from real platform. You can use the command CF logs and you can see all the logs.

You can tail it, you can dump it, you can filter it, etc. The traditional filter that people will use in the first instance is obviously by the app name. You could use CF logs app name and then only see the logs related to that particular application. You can see that real time stream. This is really useful because you can see what’s happening in real time in your environment and very quickly you could do some quick triage. You can check what’s going on. You can do a sanity check to see how there’s crazy errors being thrown throughout my environment or for this particular application or are things running normally, etc.

What I find most people use logs for is a more longitudinal study of what’s happening in their environment. What’s happening in the long term and then what looks out of the ordinary? Without creating an adequate baseline in your environment of what normal looks like, it’s very difficult to pick abnormal except when users start ringing angrily. Really you want to be more proactive than that and be ahead of the game. Know what looks abnormal as soon as it happens rather than being at the end of that feedback loop. What this means is we need to keep some historical information about those logs. We need to have historical reference to the logs that were collected.

Typically the way people will do this is to use a third party log management service. Fortunately Cloud Foundry supports many different kinds of log services. You can use tools like log entries, log stash, paper trails, [inaudible 07:15] just to name a few. You basically establish a connection to their services by creating those services for yourself. You may create a user ID. You may have an account. You may have an enterprise based installation of that particular service. That’s all fine. Essentially what you do is you create a situation where you can drain the logs from the loggregator into that management service.

Really it’s a four step procedure. The first is to configure that service itself to treat your application as a source of data. It needs to recognize it and know it can accept it. The second is to create the user provided service instance with a syslog drain. This basically allows you to bind that particular service to an application and send it off to the drain location, which is your loggregation service. The third part is to bind the instance, the applications to that service. The common way to do that is to indicate that service binding is part of the manifest file that gets pushed with the application.

What this means is that every time the application gets deployed on the platform, it will automatically bind to the relevant logging service and be sending those logs from all the instances in the application to the logging service automatically. The last thing to do is to just make sure it’s working. You want to test these things once you’ve set them up the first few times to make sure that your logs are draining appropriately. When the logs are being sent to the third party services, loggregator formats the data according to the syslog protocol, which is defined in our old friend RFC5424.

We know that it’s coming in a format that the services are expecting. We’re not doing any ETL, any pausing, any fiddling, any dodgy work with these logs. They’re just being passed through in a standard appropriate way. What’s the message here? The message is is that logs are being gathered for your applications in pivotal CF all the time. You can view them. You can analyze them. You can filter them. You can access them whenever you want. You can send them off to any particular service that you’d like and then use them in a long term study. As I mentioned, it is vital that you having logging set up and ready for all your applications, that you are looking at these logs on a regular basis so that you can establish what a baseline looks like.

You can take action when these logs move outside the norm. Only by having a consistent baseline can you know what different looks like. It’s very important to be ahead of this. It’s a little project for you if you’re running an environment. Whether you’re using pivotal CF or not, make sure you’ve got your logging in place and make sure that you have that baseline set up. I hope that’s been useful for you. As always, links in the share notes as to the steps that are spoken about so you can do it yourself and till then, I look forward to speaking to you soon. Keep on going.

Announcer:
Thanks for listening to the All Things Pivotal podcast. If you enjoyed it, please share it with others. We love hearing your feedback, so please send any comments or suggestions to podcast@pivotal.io.

About the Author

Simon Elisha is CTO & Senior Manager of Field Engineering for Australia & New Zealand at Pivotal. With over 24 years industry experience in everything from Mainframes to the latest Cloud architectures - Simon brings a refreshing and insightful view of the business value of IT. Passionate about technology, he is a pragmatist who looks for the best solution to the task at hand. He has held roles at EDS, PricewaterhouseCoopers, VERITAS Software, Hitachi Data Systems, Cisco Systems and Amazon Web Services.

More Content by Simon Elisha
Previous
Re-Architecting Genomics Pipelines to Handle the Rising Wave of Data
Re-Architecting Genomics Pipelines to Handle the Rising Wave of Data

Genomic data is quickly becoming a central part of the next generation of medicine. The future of treatment...

Next
GemFire XD 1.4 Now Available for Download
GemFire XD 1.4 Now Available for Download

The latest release of GemFire XD 1.4 is now available for download. Its biggest improvements include single...

How do you measure digital transformation?

Take the Benchmark