[Pacemaker] Fencing order
pavel at levshin.spb.ru
Mon Mar 21 11:06:13 EDT 2011
Today, we had a network outage. Quite a few problems suddenly arised in
out setup, including crashed corosync, known notify bug in DRBD RA and
some problem with VirtualDomain RA timeout on stop.
But particularly strange was fencing behaviour.
Initially, one node (wapgw1-1) has parted from the cluster. When
connection was restored, corosync has died on that node. It was
considered "offline unclean" and was scheduled to be fenced. Fencing by
HP iLO did not work (currently, I do not know why). Second priority
fencing method is meatware, and it did take time.
Second node, wapgw1-2, hit DRBD notify bug and failed to stop some
resources. It was "online unclean". It also was scheduled to be fenced.
HP iLO was available for this node, but it had not been STONITHed until
I manually confirmed STONITH for wapgw1-1.
When I confirmed first node restart, second node was fenced automatically.
Is this ordering intended behaviour or a bug?
It's pacemaker 1.0.10, corosync 1.2.7. Three-node cluster.
More information about the Pacemaker