Monitoring vs. Logging Checklist
How to choose what to use
To meet critical SLAs and maintain reliability, modern digital enterprises running applications both in the cloud and on on-premise must measure the performance of their essential services, distributed applications, and infrastructures. For developers, DevOps and TechOps engineers, it can be confusing to know when to use metrics or log monitoring to isolate code performance anomalies, proactively monitor and baseline their scaled out, dynamic and distributed applications.
Metrics describe numeric measurements in time. The metric format includes the measured metric name, the metric data value, the timestamp, the metric source, and an optional tag. Metrics convey small information bits, much lighter than logs. Logs, unlike metrics, contain textual information about an event that occurred. Logs are meant to convey detailed information about the application, user, or system activity. The primary purpose of logs is troubleshooting a specific issue after the fact, e.g., code error, exception, security issue, or other.
This checklist is designed to help you select the right approach for your environment and your specific application or service.
Metric-based Analytics
Metrics should be used if you need to:
- Need to continuously measure and get split-second insights from your cloud application code performance, business KPIs, and infrastructure metrics at high scale. The almost instant insights are essential for digital businesses generating revenue from customer-facing applications.
- Are concerned with CPU, memory, or storage consumption, in particular, when you are developing and monitoring complex distributed applications requiring benchmarking and storing large code performance data sets. As numeric measurements, metrics can be highly compressed.
- Run many microservices and containers.
- Use messaging pipelines for your application monitoring data including Kafka or others.
- Work for an organization that has many developers that need to collaborate and share metrics analysis and dashboards (such as self-service analytics for engineering teams).
- Need to apply complex processing on your code performance measurements or business KPI data such as using aggregates, histograms (distributions), and other mathematical transformations.
Log-based Analytics
Logs should be used if you need to:
- Need to analyze only unstructured text-based data from your applications and infrastructure.
- Can afford application performance data under-sampling and coarser monitoring.
- Don’t need to develop and don’t need to run highly distributed applications that require high scalability.
- Are developing monolithic applications that typically do not require frequent code updates requiring continuous monitoring
- Are not concerned with slower processing of your application performance data, such as in batch-like processing.
Both Metric and Log-based Analytics
Both should be used if you need to:
- Need to process both continuous metric data events and logs. Metrics analytics helps you get the first-pane of glass across the entire application stack. Then use log monitoring to deep-dive into a specific issue to investigate the root-cause after an issue happened.
- Need proactive query-driven smart alerting.
- Implementing DevOps principles and continuous delivery of your code.
- Need to troubleshoot and deep dive into a particular system such as storage or network, after an issue occurred that generated a log.
. . .
It is important to keep in mind that the needs of the system you are attempting to monitor are what matters. It is very possible that different applications or services in your organization will use slightly different approaches for their monitoring. There is no absolute right or wrong answer other than do what works for you and your team.