Network Outage
Incident Report for dataforest / Avoro / PHP-Friends
Postmortem

On 23.11.2024 at 23:12 (CET), our network experienced a sophisticated attack directly targeting our edge routing infrastructure. Due to the complexity of the attack, initial debugging efforts mistakenly concluded that we were dealing with a severe hardware failure within our edge routing equipment and that redundancy mechanisms were not functioning as expected. After manually triggering failovers to standby hardware in an attempt to rule out defective components without success, we intentionally disconnected our network from the internet to isolate and restore components one by one. This approach proved effective, with services beginning to recover at 23:41. IPv6 connectivity was restored quickly, while the last IPv4 prefixes came back online at 23:55. Unfortunately, the attackers quickly adjusted their methods, temporarily bypassing the newly implemented filters. As a result, another brief IPv4 outage occurred between 00:22 and 00:35, while IPv6 connectivity remained stable since the first fix was implemented.

Since the incident occurred six hours ago, our team has been working through the night in collaboration with our upstream providers to implement permanent fixes against these new attack vectors. Although the network has faced further attacks in the meantime, it has remained stable, as our solutions have proven effective. Nevertheless, we remain on high alert over the coming hours and days to respond swiftly to any potential new attack patterns.

This incident marks the first outage of our entire edge routing infrastructure, impacting every single dataforest customer, including IP transit services across all datacenters, since the launch of AS58212 nearly five years ago. Such an outage falls far short of the standards we set for ourselves, and we deeply apologize for the disruption caused to our customers. It is particularly frustrating that this attack succeeded despite our daily efforts to mitigate hundreds of similar attacks unnoticed. In this case, the attack managed to overwhelm our routers due to insufficient filtering against this unprecedented level of complexity. While the malicious traffic should not have been forwarded by our upstreams at all, we must admit that we should have covered this rare but possible scenario on our own just for extra safety and we are sorry for this mistake.

Please note that this incident was purely a reachability issue. There was no power outage, no hacking attempt, and no data breach.

Posted Nov 24, 2024 - 05:28 CET

Resolved
This incident has been resolved.
Posted Nov 24, 2024 - 05:24 CET
Monitoring
Another fix has been implemented a few minutes ago and operation is stable since then. We are working on a permanent solution to avoid further outages. If any occur, we will report here.
Posted Nov 24, 2024 - 00:42 CET
Identified
We currently see another outage and work on a fix.
Posted Nov 24, 2024 - 00:26 CET
Monitoring
We have implemented a fix. Network is stable for about five minutes now, IPv6 was not affected as much as IPv4 and remained available during the issue most of the time while IPv4 was affected heaviliy, leading to a major outage for a longer period of time. A post-mortem will follow tomorrow. Please rest assured we are 24/7 monitoring the status of our edge routing devices and will take action immediately if needed.
Posted Nov 24, 2024 - 00:05 CET
Identified
We identified the issue and work on a solution.
Posted Nov 23, 2024 - 23:56 CET
Investigating
We currently experience a major outage of our edge routing and are investigating the situation.
Posted Nov 23, 2024 - 23:18 CET
This incident affected: Interxion FRA8 (Edge Network).