Monitoring

KDE System Guard

Lesson 1
Lesson 2
Lesson 3
Our Monitoring Implementation

We use the product StatusCake to monitor the uptime on our various different websites. In our current configuration, StatusCake simply pings our URLs, and sends alerts when it encounters discouraging HTTP Status Codes.

We've configured StatusCake to send emails to the project owners, as well as Slack messages to appropriate, project-specific channels.

There's definitely room for improvement here, but StatusCake is a good starting place. Here are some things that we might consider when improving on this foundation:

  • StatusCake is a black box solution -- it doesn't have any visibility of the internals of our program, it provides us with data on how our website looks to users. It would be nice to have a monitoring solution that combines black-box reporting with logs and stack traces.
  • StatusCake, as it's configured right now, only checks for HTTP status codes. However, it's possible that our web server could be ACKing with empty pages. Ideally, we would want to test for content, in addition to headers, to mitigate this scenario.
  • We're not collecting our logs or runtime metrics in any meaningful way. That should be a crucial next step in aiding disaster response.
Other Readings
Quiz

    There is no quiz available for this module.