Kart Technologies Ltd - Power outage causing total loss of external network connectivity – Incident details
All systems operational
Power outage causing total loss of external network connectivity
Resolved
Major outage
Started about 2 months agoLasted about 3 hours
Affected
Hosting Services
Major outage from 7:20 AM to 10:30 AM
Customer Portal
Major outage from 7:20 AM to 10:30 AM
Web Services
Major outage from 7:20 AM to 10:30 AM
DNS Services
Major outage from 7:20 AM to 10:30 AM
Database Services
Major outage from 7:20 AM to 10:30 AM
Email Services
Major outage from 7:20 AM to 10:30 AM
Updates
Resolved
Resolved
Power has been rerouted via a different feed to our rack, and services have been restored.
Update
Update
Our engineers are on site at the data centre
Identified
Identified
We have confirmed that the outage was isolated to external connectivity and was not caused by service or server failure. Upon contacting our rackspace supplier and on site personnel, we learned that scheduled maintenance was being carried out on one of the three power feeds to the data centre. Our rack is served by two of these feeds, one of which was affected by the maintenance.
Unfortunately, we were not notified of this planned work. Had we been informed, we would have arranged for staff to be present on site to proactively manage any potential impact.
Although our infrastructure is designed with power redundancy (dual power supplies and failover hardware where dual PSUs aren't available), the outage highlighted a single point of failure in our core networking stack.
Specifically, a core switch with a single PSU providing uplink connectivity to two key routers lost power. This severed the connection between our backend infrastructure and external facing routers, resulting in a total network outage.
Investigating
Investigating
Our monitoring systems have detected a complete loss of network connectivity affecting all services hosted on our platform