Easily understanding the status of your infrastructure and systems is essential for ensuring the reliability and stability of your services. Realtime, accurate information about the health and performance of your services and applications not only helps your team react to issues, it also gives them the security to make changes with confidence. If done right, it also helps ypur team proactively discover changes in the environment before they affect their applications. One of the best ways to gain this insight is with a robust monitoring system that gathers metrics, visualizes data, and alerts operators when things appear to be broken.

In this section, we will discuss what metrics, monitoring, and alerting are. We will talk about why they are important, what types of opportunities they provide, and the type of data you may wish to track. I will try to introduce each topic so that it can stand alone; but, they really should be considered as part of a whole to really do it right. I hope you enjoy exploring this space.