Root Cause
The outage was caused due to an unannounced change in Cloudflare load balancer settings, performed on Cloudflare side. We use multiple accounts in Cloudlfare, and they share global load balancers deployed in hostinger.com zone. Accounts which were using Cloudflare Business plan stopped working as they no longer had access to Load Balancers deployed on hostinger.com. They started getting an “Origin DNS error”.
Resolution and recovery
After connecting the dots about the issue, we quickly upgraded all non-working accounts to Cloudflare Enterprise plan. Full deployment took about 1 hour, which caused rolling downtime for each affected domain until it was fully upgraded.
Some non-hostinger domains required a different load balancer to be created (for example, zyro.com). After deploying a new load balancer, all services were restored for zyro.com as well.
Corrective and Preventative Measures
We are consulting with Cloudflare about this, as the change seems to be triggered by their Development Teams internally and Cloudflare Support Team were not aware of this change. We are also starting to manage all Cloudflare settings via Terraform (We already use Terraform to manage GCP resources), so any setting change to multiple Cloudflare accounts could be rolled very quickly. This should reduce the time to recovery when manual intervention is required (for example an urgent change on www CNAME zone for all hostinger.tld domains).