Monitoring and observability in practice
We need to start from describing what monitoring and observability
of the platform mean because there are a lot of different definitions.
Monitoring describes the process of gathering metrics about IT
environment, running applications and observing the system performance
while observability is about measuring how well internal
states of the system can be inferred from knowledge of its external
outputs (according to the control theory). It sounds a bit difficult
due to its mathematics source but it can be easily translated into IT
use cases.
We can start with considering a use case of large and complex
Hadoop cluster that consists of several dozens of nodes. Using available
monitoring tools incorrectly by analyzing too many data
points would cause unnecessary alerts and false flags so we do not
have clear view of the situation. We can call it as low observability of
the infrastructure. If we want to achieve high observability we need
to provide well-matched metrics and correctly set up alerts. The
target is to deliver information about current status of each component.
Great example could be even a simple data processing job written
in Spark or Flink, that rewrites data from location A to B. Gathering
its metrics and setting up alerts or creating dashboard with simple
runtime visualization are a quite simple tasks. However to achieve
observability we should collect metrics about the amount of processed
data, JVM statistics and some metrics about infrastructure
under the hood. There are more complex data pipelines in real life
so we suggest to think about observing each part of the system.
Our Ideas
Explore More Blogs
Contact