Monitoring
Lesson 1
Lesson 2
Lesson 3
Our Monitoring Implementation
We use the product StatusCake to monitor the uptime on our various different websites. In our current configuration, StatusCake simply pings our URLs, and sends alerts when it encounters discouraging HTTP Status Codes.
We've configured StatusCake to send emails to the project owners, as well as Slack messages to appropriate, project-specific channels.
There's definitely room for improvement here, but StatusCake is a good starting place. Here are some things that we might consider when improving on this foundation:
- StatusCake is a black box solution -- it doesn't have any visibility of the internals of our program, it provides us with data on how our website looks to users. It would be nice to have a monitoring solution that combines black-box reporting with logs and stack traces.
- StatusCake, as it's configured right now, only checks for HTTP status codes. However, it's possible for our web server to ACK with empty or incorrect pages. Ideally, we would test for content, in addition to headers, to mitigate this scenario.
- We're not collecting our logs or runtime metrics in any meaningful way. That should be a crucial next step in aiding disaster response.
Other Readings
- Top 14 Monitoring tools that every DevOps needs
- Monitoring in the DevOps Pipeline
- DevOps monitoring tools
- Google Analytics Adapts to GDPR, But Questions Remain
- Nagios
- What is Nagios?
- Zabbix vs Nagios Comparison for Network and Bandwidth Monitoring
- What is Sensu?
- Sysdig
- New relic: Change the way you monitor infrastructure
- Google Stackdriver
- Introducing New Relic Applied Intelligence
- Comparison of 18 APM & Application Monitoring Tools
- BigPanda
Quiz
There is no quiz available for this module.