[Pacemaker] [RFC PATCH] Try to fix startup-fencing not happening
simone.gotti at gmail.com
Fri Mar 18 21:30:28 EDT 2011
On 03/17/2011 11:54 PM, Simone Gotti wrote:
> When using corosync + pcmk v1 starting both corosync and pacemakerd (and
> I think also using heartbeat or anything other than cman) as quorum
> provider, at startup in the CIB will not be a <node_state/> entry for
> the nodes that are not in cluster.
> Instead when using cman as quorum provider there will be a <node_state>
> for every node known by cman as lib/common/ais.c:cman_event_callback
> calls crm_update_peer for every node reported by cman_get_nodes.
> Something similar will happen when using corosync+pcmkv1 if corosync is
> started on N nodes but pacemakerd is started only on N-M nodes.
> All of this will break 'startup-fencing' because, from my understanding,
> the logic is this:
> 1) At startup all the nodes are marked (in
> lib/pengine/unpack.c:unpack_node) as unclean.
> 2) lib/pengine/unpack.c:unpack_status will cycle only the available
> <node_state/> in the cib status section resetting them to a clean status
> at the start and then putting them as unclean if some conditions are met.
> 3) pengine/allocate.c:stage6 all the unclean nodes are fenced.
> In the above conditions you'll have a <node_state/> in the cib status
> section also for nodes without pacemakerd enabled and the startup
> fencing won't happen because there isn't any condition in unpack_status
> that will mark them as unclean.
> I'm not very expert of the code. I discarded the solution to not
> register at startup all the nodes known by cman but only the active ones
> as it won't fix the corosync+pcmkv1 case.
> Instead I tried to understand when a node that has its status in the cib
> should be startup fenced and a possible solution is in the attached patch.
> I noticed that when crm_update_peer inserts a new node this one doesn't
> have the expected attribute set. So if startup-fencing is enabled I'm
> going to set the node as expected up.
Thinking a little more about this I think that the cman case and the
pcmkv1 case are quite different.
It's probably correct to have cman + pacemaker started on some nodes and
only cman started on other nodes.
So it would be better, as a first step, to make the cman integration
work as the other cases and then look at some problems already presents
in all the implementations that comes to my mind (I've got some corner
cases in mind that I'd like to explain in the next days).
The attached patch tries to add at startup to the cib status section
only the active nodes.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the Pacemaker