Implementing logging in our applications is a key component in making them more observable. Maintaining consistent structure and instrumenting logging from the very first function or service will help us gain the visibility we need into how our code is performing.
But how do we do this?
The practical answer to this question depends of course on our code — what programming language are we using, what specific logging framework are we working with, and plenty more. However, there is a set of general best practices that can and should be adhered to in order to ensure logging is embedded correctly in our application.
What are we logging?
Some recommend logging as much as possible as a best practice. Fact is, that in some cases, we’ve seen this “log as if there is no tomorrow” approach result in: a) log noise comprised of unimportant data, and b) needlessly expensive logging costs for storage and retention.
Once you’ve understood what you’re aiming to achieve with your logging, you can begin devising a strategy that defines what you want to log, and perhaps even more importantly — what you don’t want to log. If you are planning to build alerts on top of the logs, for example, try and log actionable data.
What logging framework do we use
Logging frameworks give developers a mechanism to implement logging in a standard and easy-to-configure way. They allow developers to control verbosity, define log levels, configure multiple connections or targets, devise a log rotation policy, and so forth.
You could build your own logging framework, but why do that if there are already tools out there that are easy to use, have community support, and most importantly — do the job perfectly well?
So one of the most important decisions you will make is which logging library, or framework, to use. This task can be complicated at times and pretty time consuming as there are a large number of these tools available, but key considerations here should be ease of use, community, feature-richness, and the impact on your application’s performance.
The more standardized your logs, the easier it is to parse them and subsequently analyze them. The logging framework you end up using will help you with this, but there is still plenty of work to be done to ensure your log messages are all constructed the same way.
For starters, be sure that developers understand when to use each log level. This will help avoid situations in which the same log message is assigned the same severity or where critical events go unnoticed because of the wrong severity assigned to it.
Second, create a standard for formatting and naming fields. Decide, for example, if the field containing the request is “request” or “requestUrl”. Choose the format you want to use for the timestamp field. Think about whether you want to format your logs in JSON or as key=value pairs (this will depend on how you intend on collecting and analyzing the logs).
How do we format our logs?
Formatting structures your logs. Structuring, in turn, helps both machines and humans read the data more efficiently.
In this context, the most commonly used formatting methods are JSON and KVPs (key=value pairs). Below are examples of the same log message written in both format types.
"@timestamp": "2017-07025 17:02:12",
"message": "connection refused",
2017-07025 17:02:12 level=error message="connection refused"
service="listener" thread=125 customerid=776622 ip=220.127.116.11
Both formats will help you achieve the same purpose — making the logs human readable and enable more efficient parsing and analysis, but which one you choose to use will depend on the analysis tool you want to use. If it’s the ELK Stack, for example, JSON is the format that you should probably try.
Being concise and logging short messages is in general a good law to abide by. But there is a huge difference between writing concise logs and writing incomprehensible logs.
Consider this log message:
12-19-17 13:40:42:000 login failed.
Not very insightful, right? But how about:
12-19-17 13:40:42:000 userId=23 action=login status=failure
In logging, context is everything. Adding contextual information to your log messages creates a story and allows you, and any other party in your organization, to more easily understand and analyze the data.
Part of the context that can be added to logs are fields containing metadata. Common examples are application name, function name, class name, and so on.
Using unique identifiers
When troubleshooting a specific event using logs, one can easily get lost in the data. Without having some kind of map to use as reference, especially microservice-based architectures, it’s virtually impossible to track specific actions across all the different services involved in the transaction.
Adding unique tags or IDs to the logs, when possible, will help you navigate within the data by following specific identifying labels that are passed through the different processing steps. These could be user IDs, transaction IDs, account IDs, and others.
Once your application is shipping logs, the next practical step is to figure out how to efficiently aggregate, process, and analyze it. This implies selecting a tool, and in this context the most popular open source tool of choice is the ELK Stack — Elasticsearch, Logstash, and Kibana.
The ELK Stack is relatively easy to set up, provides users with the tools to collect log data from multiple data sources, store it in one centralized location, and analyze it with queries and visualizations.
Depending on how you are outputting your logs, you will need to figure out how to construct your data pipeline. Keep it as simple as possible — ELK handles JSON documents quite well, and if you consistently adhere to a united structure, then you will see some pretty good results.
As your code and application grows, the challenge of maintaining a production-grade ELK Stack grows with it. There are various hosted ELK solutions that will help you concentrate on coding and developing new functionality instead of maintaining the stack itself. Logz.io is one such solution, offering a bunch of features missing in the open source version of ELK on top of the stack, such as alerts, security, Live Tail and S3 archiving.
Planning is Key
Logging your application requires some thought, and the earlier you plan and strategize instrumentation the better. The list of best practices above contains some basic steps that can be implemented from the start.
More often than not, developers apply logging as an afterthought or after a critical issue already impacts the application. Why wait for that to happen before implementing a system for keeping track of your product?