[Pacemaker] [RFC PATCH] Try to fix startup-fencing not happening

Simone Gotti simone.gotti at gmail.com
Thu Mar 17 18:54:28 EDT 2011


Hi,

When using corosync + pcmk v1 starting both corosync and pacemakerd (and
I think also using heartbeat or anything other than cman) as quorum
provider, at startup in the CIB will not be a <node_state/> entry for
the nodes that are not in cluster.

Instead when using cman as quorum provider there will be a <node_state>
for every node known by cman as lib/common/ais.c:cman_event_callback
calls crm_update_peer for every node reported by cman_get_nodes.

Something similar will happen when using corosync+pcmkv1 if corosync is
started on N nodes but pacemakerd is started only on N-M nodes.

All of this will break 'startup-fencing' because, from my understanding,
the logic is this:

1) At startup all the nodes are marked (in
lib/pengine/unpack.c:unpack_node) as unclean.
2) lib/pengine/unpack.c:unpack_status will cycle only the available
<node_state/> in the cib status section resetting them to a clean status
at the start and then putting them as unclean if some conditions are met.
3) pengine/allocate.c:stage6 all the unclean nodes are fenced.

In the above conditions you'll have a <node_state/> in the cib status
section also for nodes without pacemakerd enabled and the startup
fencing won't happen because there isn't any condition in unpack_status
that will mark them as unclean.


I'm not very expert of the code. I discarded the solution to not
register at startup all the nodes known by cman but only the active ones
as it won't fix the corosync+pcmkv1 case.

Instead I tried to understand when a node that has its status in the cib
should be startup fenced and a possible solution is in the attached patch.
I noticed that when crm_update_peer inserts a new node this one doesn't
have the expected attribute set. So if startup-fencing is enabled I'm
going to set the node as expected up.


Thanks!
Bye!

-- 
Simone Gotti

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fix_startup_fencing_try01-20110317-01.patch
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110317/4f661658/attachment-0002.ksh>


More information about the Pacemaker mailing list