I sincerely apologise for the extended downtime our site experienced over the past few days. I understand how frustrating it can be when you’re unable to access the blog content, and I want to be completely transparent about what happened and what we’ve done to prevent similar issues in future.
What Went Wrong
The outage was caused by a critical error that occurred whilst launching a Podman container on our server. This error didn’t just affect the specific container we were deploying—it unfortunately disrupted the entire Podman service, bringing down all containerised applications and rendering our site inaccessible.
It took a few days to began investigating the root cause and working on a solution, but the complexity of the issue meant that a simple fix wasn’t possible.
What We’ve Done
Rather than implement a temporary patch, we’ve taken this opportunity to completely rebuild our infrastructure from the ground up. Here’s what we’ve implemented:
Complete Server Rebuild: We’ve rebuilt our entire server infrastructure to ensure a clean, stable foundation.
Migration to Docker: We’ve moved away from Podman and migrated all our containerised services to Docker, which offers greater stability and better community support for our use case.
Operating System Upgrade: As part of the rebuild, we’ve upgraded our server to Rocky Linux 9.6, providing enhanced security, performance improvements, and better long-term support.
Moving Forward
These infrastructure improvements position us much better for the future. The new setup is more robust, easier to maintain, and should significantly reduce the likelihood of similar issues occurring again.
Our Commitment
I know that downtime affects you as well. I’m committed to maintaining the highest possible uptime and will continue investing in our processes and procedures to provide you with the reliable blog site service.
Thank you for your patience during this challenging period. I’m pleased to be back online and look forward to serving you with improved reliability.
Leave a Reply