Ring VPLS Firmware Upgrades (Maintenance)

« Back

[#665] Ring VPLS Firmware Upgrades (Maintenance)

Posted: 2018-06-08 07:56

Start: 2018-08-01 01:00:00
End : 2018-11-03 05:00:00

Affects: (See message for more information)

To improve our network quality and resolve minor bugs in the current firmware, we will be performing firmware upgrades at the following days and times:

Nikhef - 1 Augustus 2018 At 1 AM CEST (Completed)
Affected: Local connected customers. Some peering/transit traffic, which will be redirected over other connections.

EQX-AM5 - 3 Augustus 2018 At 1 AM CEST (Completed)
Affected: Local connected customers. Some peering/transit traffic, which will be redirected over other connections.

BNR - 6 Augustus 2018 At 1 AM CEST (Completed)
Affected: Internet Customers. Local connected customers. Some peering/transit traffic, which will be redirected over other connections.

NZS - Rescheduled to: 4 October 2018 At 4 AM CEST (Completed)
Affected: Network connectivity NZS.

DBA - Rescheduled to: 11 October 2018 At 1 AM CEST (Completed)
Affected: Network connectivity DBA.

DBC - Rescheduled to: 18 October 2018 At 1 AM CEST (Completed)
Affected: Network connectivity DBC.

GSA - Rescheduled to: 27 October 2018 At 1 AM CEST (Completed)
Affected: Internet Customers. Local connected customers. Some peering/transit traffic, which will be redirected over other connections.

EQX-AM7 - Rescheduled to: 3 November 2018 At 1 AM CET (Completed)
Affected: Internet Customers. Local connected customers. Some peering/transit traffic, which will be redirected over other connections.


Update 2018-08-03 01.00:
While upgrading the EQX-AM5 VPLS switches all appeared to have been going smooth, but after having both switches upgrades traffic became intermittent on both of leading to random packet loss, engineers are currently on-site debugging this issue.

Update 2018-08-03 03.51:
We have reverted the switch back to the previous firmware which restored all connectivity, we will send our debug logs to the vendor for investigation of why this failed.

Update 2018-08-03 09.16:
Due to a possible firmware bug, which caused the firmware upgrade of EQX-AM5 to cause intermittent packet loss we have decided to postpone major locations. We have not postponed BNR at this time as part of required debugging. As NIKHEF is completed and we are not experiencing any issues here, it is currently not planned in for a downgrade.

We will be awaiting our vendor for more details before we continue on with the firmware upgrades at our major locations. After our vendor have investigated the issue, we will plan in the new firmware upgrades and update this post.

Update 2018-08-09 11.23:
We will continue with the firmware upgrade of EQX-AM5 and debug the issue we are seeing with the vendor on Monday 13 August 2018. All traffic will be balanced over our other transit providers and peers. The firmware upgrade and debug sessions will be during the day.

We do not expect that the maintenance will cause disruptions as traffic will be offloaded before maintenance.

Update 2018-08-13 17.32:
We were able to debug and resolve the issues encountered previously. The firmware of EQX-AM5 has been updated. We will monitor this site closely the coming days. We will replan this maintenance window for the rest of the switches, and sent an update once available.

Update 2018-08-15 12.15:
We have re-scheduled the firmware upgrades for our Ring infrastructure, for more information on the date, please refer to the initial post above.

Update 2018-09-22 02.20:
While flashing the switches in EQX-AM7 after the first switch was flashed with the new firmware it got stuck in a boot loop, after trying to reboot it twice it booted with the new firmware causing packetloss on ports on that same switch, after flashing it back to the old firmware and rebooting it, the stack got stable again. We've created diagnostics on these switches which will be sent to our vendor for debugging.

Update 2018-09-25 13.50:
We have changed the time for NZS firmware upgrade to preform debug operations and tests in case we experience the same issue as with EQX-AM7. We will keep any downtime as low as possible.

Update 2018-10-04 15.00:
Upgrade at NZS has been finished last night.

Update 2018-10-05 8.50:
As the alternative firmware upgrade method is working as expected, we will be performing this method on the remaining firmware upgrades. As EQX-AM7 was delayed, we have now planned the maintenance on the 3rd of November 2018 at 1 AM CET.

Update 2018-10-11 03.38:
The upgrade at DBA has been finished.

Update 2018-10-18 02.10:
Firmware upgrades finished at DBC. We see packetloss at connectivity towards DBC routers.

Update 02.30:
2x 100 Gbps ports (total 4) connecting the ring to each router are disabled. Packetloss issue is resolved. Further debugging problematic ports.

Update 03.55:
Debugging has been finished, a hard reset of the 400 Gbps bridge seems to have resolved the issue. This is currently done for ring - R2. We are awaiting traffic and bgp sessions to restore to its full state. Once done we will offload traffic of R1 to R2 and do the same for ring - R1.

Update 04.25:
The 400 Gbps bridge for ring - R1 has been hard reset as well.

Update 04.35:
Maintenance for DBC is hereby completed.

Update 2018-10-26 3:50:
Maintenance for GSA is completed.

Update 2018-11-03 03:08:
Maintenance for EQX-AM7 is completed.