RFO - Packet Loss Affecting NZS Datacenter Prefixes (Status)
« Back
[#1052] RFO - Packet Loss Affecting NZS Datacenter Prefixes (Status)
Posted: 2026-02-28 17:29Start: 2026-02-25 05:20:00
End : 2026-02-25 06:39:00
Affects: NZS Datacenter - Outgoing Routing via R1
Estimated Start: 05:20
Traffic Mitigated: 06:18 (R2 became master)
Redundancy Restored: 06:39
Summary
Intermittent packet loss impacted several NZS Datacenter prefixes. The issue was flow-dependent (specific SrcIP-DstIP pairings) due to internal load-balancing. Average packet loss during the window was ~3% for affected flows.
Timeline (GMT+1 - NL Time)
05:20 -- Issue onset (estimated)
05:20-06:18 -- Investigation: no CRC errors, alarms, or logs; slightly reduced bandwidth observed
06:18 -- Traffic migrated to R2 (master); packet loss immediately ceased
06:18-06:39 -- Deep diagnostics on R1; issue isolated to Switch Fabric Module (SFM) links
06:39 -- Faulty SFM links disabled; redundancy restored; monitoring continued
Root Cause
Failing/degraded internal Switch Fabric Module (SFM) links in router R1. The platform normally operates with 20 fabric links; several links were internally shut down, leaving 18 active. Because traffic is load-balanced across fabric paths, only flows hashed across the degraded links experienced loss. The fault was intermittent and did not trigger system logs or automated alerts.
Resolution
Traffic remains stable via R2. Faulty SFM links on R1 were identified using controlled testing with traffic offloaded and were administratively disabled to prevent reactivation via failover.
Preventative Actions
Hardware replacement is being expedited.
Migration to a new routing platform is in progress, targeted for introduction within the next ~2 months.
Continued monitoring to confirm stability.