Today in the morning – worst case I can expect when updating some switches firmware, but read yourself:
Updated 4 client switches (ProCurve 2848) firmware. Everything worked fine before and after the update. Just 1 hour after the update, Nagios reported first servers as down, some went up and down again. Then my Outlook and Firefox began with flapping state, some other colleagues reported problems too. Firmware update and 1 hour ago problems in LAN? => Firmware was on primary image at the switches, tried to restore firmware from secondary image. SSH sessions terminated and terminated again, went to server room. Looked at the switches => Status lights blinked like disco. Heavy traffic? NoNo… Connected Notebook to Serial Port, watched at the logs => “Excessive Broadcasts”. Although restored the firmware version, booted the switches, problems occured furthermore. Disconnected all uplinks from backbone switches and connected one by one. One switch was at high load without uplink, had a look at the logs, 2 ports Excessive Broadcasting => LOOP! Damn bad situation, firmware update and some minutes ago a loop in the network. It was the first time this happened.
Now I’m looking for a solution to get this broadcast problems reported or get this ports automaticaly disabled by the switches. Quickly searched the net, found some options:
fault-finder bad-driver sensitivity high fault-finder bad-transceiver sensitivity high fault-finder bad-cable sensitivity high fault-finder too-long-cable sensitivity high fault-finder over-bandwidth sensitivity high fault-finder broadcast-storm sensitivity high fault-finder loss-of-link sensitivity high fault-finder duplex-mismatch-HDx sensitivity high fault-finder duplex-mismatch-FDx sensitivity high
Default they all are (and were in my case) set to “medium”.
Seems the options only set the sensitivity level of detecting and logging the problems.
Then I spoted something much more interesting in the documentation of some other (newer) HP switches:
Usage: [no] loop-protect <…> [[ethernet] PORT-LIST [receiver-action <send-disable|no-disable>]| [transmit-interval <1-10>]| [disable-period <0-604800>]| [trap <loop-detected>] Description: Configure Loop protection on the switch. Parameters:
- ethernet PORT-LIST Port(s) to configure loop protection on. By default loop protection is disabled on a port
- receiver-action Sets the loop detected action per port. When a loop is detected the port that received the loop protection packet determines the action taken. If send-disable is selected the port that transmitted the packet will be disabled. If no-disable is selected, the port will not be disabled. The default action is ‘send-disable’.
- trap <loop-detected> Configure Loop protection traps. The following traps are generated by Loop protection – ‘loop-detected’ signifies that a loop was detected on a port.
- disable-timer <0-604800> (default:0) Sets the time in seconds to disable a port for when a loop has been detected. A value of 0 disables the auto reenable functionality. By default the timer is disabled.
- transmit-interval <1-10> (default:5) Time in seconds between transmission of loop protection packets.
Here some interesting Links:
- HP Forums: Excessive Broadcast Problem with Procurve 2626
- HP Forums: switch loops
- 5300xl Release Notes (Page 47)
- 6200,5400,3500 CLI Reference Guide
Have to watch some additionaly documentations for this feature – I still have some questions:
- Do the switches only detect loops on themselfes or loops with ports on other switches too?
- Is this feature new? It is tested enough to give it a try in a productive network?
- Are there known issues?
I’ll watch for this feature at the ProCurve 2848 tomorrow. Hope it is there (Maybe not in the old but in the new firmware?), this option looks really nice. Sending traps would be nice, so i could send them to Nagios, check the switches for that traps to alert the admins when some traps like them occur.