We do not log in to our servers every day to check how the resource usage is. Just like with uptime monitoring we need a system to help us monitor if everything is inside reasonable limits so we can scale the servers if required. And detect any potential problem before it becomes a problem.
This article extends the setup explained in the previous article.
Briefly, the setup consists of a load balancer, an HTTP server, and a PHP-fpm backend, all running in a Docker Swarm environment as explained here.
Previously the load balancer was bound to the manager node in the Docker swarm because it needed access to the Let’s Encrypt certificate files. To prepare for a fully replicated and fault tolerant design, this needs to be fixed so it can run from any node.
Because of the mesh network in Docker swarm, the load balancer does not need to run on the manager node where the external IP is bound. It can run on any host; the mesh network will route the request to the right container. But that requires us to replicate the Let’s Encrypt certificates and make sure they can be renewed and reloaded independently of which host the load balancer is running. This article explains how I changed that and moved renewal into Docker.
This article builds on the platform described in the last seven parts, a WordPress setup running on AWS using Docker. In this article, we will look into how to improve uptime and scalability for the service by replicating it across multiple servers. To allow for replication, several challenges need to be solved. How to handle this is covered in this article, including comments on a few problems I found like Docker nodes running out of memory and how to fix it. And network problems in Docker swarm mode.