Observability has recently become quite the buzzword, populating headlines in DevOps and IT publications. Industry experts like Charity Majors, CTO and co-founder of Honeycomb, and Cindy Sridharan, to name a few, have been spreading the word about the importance of observability, making it clear that it’s more than just a passing trend, it’s an approach that DevOps organizations need to adopt.
The increasing complexity of software systems is one of the main reasons that observability is much more than a buzzword or trend. Daniel “Spoons” Spoonhower, Co-founder and CTO of Lightstep, recently described the growing need for observability, pointing out that whether organizations are migrating to microservices, building on Kubernetes, or using AWS Lambda, teams are all relying on software that they didn’t write and don’t control. We are all likely using hosted storage systems to build projects on the cloud, or using managed open source systems as part of the production and deployment process, or relying on third-party APIs to build applications. Spoonhower explained that the increasing integration of these new processes and platforms leads to what he calls “layers of distinct ownership,” requiring teams to understand the relationships between all of these interdependent components.
What Is Observability?
The term observability originally comes from control theory, where observability refers to how well internal states of a system can be inferred from its external outputs. As our software systems are distributed across an ever-rising number of platforms, tools, and teams, development organizations are adopting observability to achieve both comprehensive and granular visibility over their systems.
James Governor explained in a recent RedMonk post that observability helps teams understand both the overall health of a system and the health of all the components making up that system, making troubleshooting easier. As HoneyComb’s Guide to Achieving Observability puts it: "Observability is about being able to ask arbitrary questions about your environment without---and this is the key part---having to know ahead of time what you wanted to ask."
Observability vs. Monitoring: What Observability Is Not
Another way to understand observability is by distinguishing it from monitoring. Many tend to mistakenly think that monitoring is observability’s less successful predecessor, perhaps because the term ‘monitoring’ was popular before observability became a trendy buzzword, but that is not the case.
Cindy Sridharan explains that monitoring and observability are two separate and complementary practices. While monitoring provides a “panoramic view of systems’ performance and behavior in the wild,” it can help us understand “the shortcomings and the evolving needs of a system,” and according to Sridharan is “best suited to report the overall health of systems.” Observability, on the other hand, “aims to provide highly granular insights into the behavior of systems along with rich context, perfect for debugging purposes.”
The Three Pillars of Observability
Many define observability as the sum of three key components: metrics, logs, and distributed traces:
# Metrics are a numeric value measured over a period of time. They can help us determine the health of our system and are often used to trigger alerts.
# Logs are timestamped records of events that can provide us with a detailed account of what happened in our application.
# Distributed traces are representations of an execution of code. They help us see the end-to-end flow of an execution path through a distributed system.
While all three of these tools help us get visibility into our systems and find the root cause of a problem, some have argued that observability is much more than the sum of these three parts.
Brian Cox points out that “the goal of an observability team is not to collect logs, metrics or traces. It is to build a culture of engineering based on facts and feedback, and then spread that culture within the broader organization.” Sridharan adds that observability “is not about logs, metrics, or traces, but about being data driven during debugging and using the feedback to iterate on and improve the product.”
Ben Sigelman, CEO and co-founder at Lightstep, also refutes the idea that observability is simply the combination of metrics, logs, and distributed traces, stating that none of them “directly addresses a particular pain point, use case, or business need.” According to Sigelman, when it comes to observability, “what matters is what you do with the data, not the data itself.”
Observability: Taking a Closer Look at Our Distributed Systems
As systems become more complex, it’s harder to get a clear picture of what’s happening or comprehensive visibility over how the system is behaving. The scale of our systems keeps growing, and reviewing the dashboards and sorting through logs is no longer enough to make sure that we can swiftly attend to the issues as soon as they arise.
Describing the measures of success for organizations using observability, Spoonhower lists improved and even more frequent deployments, quicker on-call responses thanks to the ability to shorten the time it takes to understand the underlying cause of a problem and solve it, or even solve problems before they happen, and finally, optimized performance to provide better user experience to customers.
Confident and frequent deployment, swift troubleshooting, and a positive user experience are successes that we all work hard to achieve. Embracing observability and implementing the right tools, processes, mindset, and teamwork will help organizations take their DevOps game one step further.