Apr/074
HP ProCurve: Network Loop Detection
Today in the morning – worst case I can expect when updating some switches firmware, but read yourself:
Updated 4 client switches (ProCurve 2848) firmware. Everything worked fine before and after the update. Just 1 hour after the update, Nagios reported first servers as down, some went up and down again. Then my Outlook and Firefox began with flapping state, some other colleagues reported problems too. Firmware update and 1 hour ago problems in LAN? => Firmware was on primary image at the switches, tried to restore firmware from secondary image. SSH sessions terminated and terminated again, went to server room. Looked at the switches => Status lights blinked like disco. Heavy traffic? NoNo… Connected Notebook to Serial Port, watched at the logs => “Excessive Broadcasts”. Although restored the firmware version, booted the switches, problems occured furthermore. Disconnected all uplinks from backbone switches and connected one by one. One switch was at high load without uplink, had a look at the logs, 2 ports Excessive Broadcasting => LOOP! Damn bad situation, firmware update and some minutes ago a loop in the network. It was the first time this happened.
Now I’m looking for a solution to get this broadcast problems reported or get this ports automaticaly disabled by the switches. Quickly searched the net, found some options:
fault-finder bad-driver sensitivity high
fault-finder bad-transceiver sensitivity high
fault-finder bad-cable sensitivity high
fault-finder too-long-cable sensitivity high
fault-finder over-bandwidth sensitivity high
fault-finder broadcast-storm sensitivity high
fault-finder loss-of-link sensitivity high
fault-finder duplex-mismatch-HDx sensitivity high
fault-finder duplex-mismatch-FDx sensitivity high
Default they all are (and were in my case) set to “medium”.
Seems the options only set the sensitivity level of detecting and logging the problems.
Then I spoted something much more interesting in the documentation of some other (newer) HP switches: loop-protect:
Usage: [no] loop-protect <…> [[ethernet] PORT-LIST [receiver-action <send-disable|no-disable>]| [transmit-interval <1-10>]| [disable-period <0-604800>]| [trap <loop-detected>] Description: Configure Loop protection on the switch. Parameters:
- ethernet PORT-LIST Port(s) to configure loop protection on. By default loop protection is disabled on a port
- receiver-action Sets the loop detected action per port. When a loop is detected the port that received the loop protection packet determines the action taken. If send-disable is selected the port that transmitted the packet will be disabled. If no-disable is selected, the port will not be disabled. The default action is ‘send-disable’.
- trap <loop-detected> Configure Loop protection traps. The following traps are generated by Loop protection – ‘loop-detected’ signifies that a loop was detected on a port.
- disable-timer <0-604800> (default:0) Sets the time in seconds to disable a port for when a loop has been detected. A value of 0 disables the auto reenable functionality. By default the timer is disabled.
- transmit-interval <1-10> (default:5) Time in seconds between transmission of loop protection packets.
Here some interesting Links:
- HP Forums: Excessive Broadcast Problem with Procurve 2626
- HP Forums: switch loops
- 5300xl Release Notes (Page 47)
- 6200,5400,3500 CLI Reference Guide
Have to watch some additionaly documentations for this feature – I still have some questions:
- Do the switches only detect loops on themselfes or loops with ports on other switches too?
- Is this feature new? It is tested enough to give it a try in a productive network?
- Are there known issues?
I’ll watch for this feature at the ProCurve 2848 tomorrow. Hope it is there (Maybe not in the old but in the new firmware?), this option looks really nice. Sending traps would be nice, so i could send them to Nagios, check the switches for that traps to alert the admins when some traps like them occur.




























08:36 on April 21st, 2008
how to verify whether our loop-protection is working correctly or not in the 5400 switches..?
12:00 on April 21st, 2008
I think the best way to verify is to manualy produce a loop. If your loop protection is properly configured you can do this in a productive envoirnment.
I tested the loop protection by watching the logfile while plugging both ends of a cable in the same switch. The loop should be detected/logged within less than 5 seconds (transmit-interval).
Beware: If the loop is not detectet, unplug the cable fast to not get your switch out of service..
16:21 on August 19th, 2008
We are using loop-protect and it works fine.