Docker setup – part 7: Monitor uptime / status page

When running a blog or webshop, uptime monitoring becomes essential. We usually do not visit our own site 24/7, so we need some help to make sure we are notified if anything breaks. There exist many different tools for this. But one tool that I think solves this very cheap and easy is uptime robot, it is a hosted monitor service that monitors uptime for our site. How this work is explained in this article, and I will also touch on a few alternatives.

When monitoring uptime for a service we need to consider a few different cases.

  1. We do not want to run the monitoring service on the same server that we run the service being monitored, for obvious reasons
  2. We would like a (public) status page that can give an overview of the uptime
  3. If our service becomes inaccessible, we must get notified so we can resolve the issue fast
  4. As a bonus, we would like to monitor the response speed of the service if possible

Monitoring services

Many different services exist, with different feature sets. Just to name a few: Cachethq.io, Uptimerobot.com, Status page, Pingdom

Cachethq.io is an open source system that you can host your self. It makes use case 1 from above a bit difficult. You could of course host it for example on AWS in a different region, or on a separate cloud provider. But because it is self-hosted, it requires updating and monitoring. For hobby use, I find it a bit overkill. But for a larger team, it could be a good solution.

Uptime Robot is a hosted monitoring service that allows you to monitor up to 50 services for free. It has a public status page to communicate uptime and notification service.

Status Page is made by Atlassian and is an excellent product. It allows you to communicate incident reports for downtime event on the status page. And of course, it is a monitoring service. It costs a bit $29 / mth currently for their smallest plan.

Pingdom is another monitoring system; it does not have a public status page, but only monitoring. But it focuses a lot on monitoring different aspects of the service. From just plain uptime to more in-depth functional tests.

 

Because Uptime Robot provides both a hosted solution and is free, I believe that it is the product to choose for most simple use cases. Below I have described the thoughts that went into this choice, the other tools provide excellent alternatives, but for the price of free and that Uptime Robot covers all my needs, it just can not be beaten.

Where to host the monitoring service

When possible I would opt not to host my own monitoring service. Because the purpose of the monitoring service is to notify when downtime happens, it is essential that the service is stable. Since it is a service that in my experience is quickly forgotten, I would let others handle the hosting and maintenance of the monitoring. Freeing precious time to develop our own service.

This rules out services like Cachethq.io, because it is self-hosted software.

Notification of downtime

Depending on the vitality of the service we would like to be notified in different ways. Most monitoring services I have seen supports both SMS and email among many other choices. Where the SMS service often costs a bit, and email is part of the package. For my purpose email notifications are fine, I usually carry my smartphone around and get notified when I receive an email. But depending on the importance, it might be important that multiple people are informed when downtime happens. All of this is handled fine by for example Uptime Robot, they support 11 different notification method, twitter, slack, email, SMS, iPhone push message and so on.

The different monitoring solutions provide different intervals for checking the service, Uptime Robots’ free services checks every 10min. This might be a bit to slow for some services, but for a hobby project, I think it is more than enough.

Public status page

For support purpose, I like to have a public status page. It allows users of my service to see if the monitor system knows that there is downtime. It causes the number of support requests to go down because they know that I have been notified and are working to resolve the issue. Hopefully, if the service is stable, it will allow bragging rights with a high uptime percentage 🙂

A status page can look like this.

 

Response speed monitoring

Usually, most services are evolving and changing over time. It causes the response time to fluctuate because of changes in the underlying system. Or the response time can change because of capacity problems on the servers that run the service. A simple first step to monitor this is to look at the response time of the service. Uptime Robot tracks the response time as part of the monitoring process. It is then plotted on the status page.

As shown below:

My Status page

You can find the status page for this site here power by Uptime Robot.

Final thoughts

Most of the monitor services allow monitoring of specific URLs. It enables us to implement many different use cases. One example could be to have a script that checks if the backup files are newer than a day. So we get a warning if the backup procedure fails. Only our imagination is the limit here.

 


Also published on Medium.