We have confirmed that the outage was isolated to external connectivity and was not caused by service or server failure. Upon contacting our rackspace supplier and on site personnel, we learned that scheduled maintenance was being carried out on one of the three power feeds to the data centre. Our rack is served by two of these feeds, one of which was affected by the maintenance.
Unfortunately, we were not notified of this planned work. Had we been informed, we would have arranged for staff to be present on site to proactively manage any potential impact.
Although our infrastructure is designed with power redundancy (dual power supplies and failover hardware where dual PSUs aren't available), the outage highlighted a single point of failure in our core networking stack.
Specifically, a core switch with a single PSU providing uplink connectivity to two key routers lost power. This severed the connection between our backend infrastructure and external facing routers, resulting in a total network outage.