[Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff
Gianluca Cecchi
gianluca.cecchi at gmail.com
Wed Mar 5 12:34:25 CET 2014
On Mon, Mar 3, 2014 at 9:29 PM, Digimer wrote:
> Two possible problems;
>
> 1. cman's cluster.conf needs the '<cman two_node="1" expected_votes="1" />'.
>
> 2. You don't have fencing setup. The 'fence_pcmk' script only works if
> pacemaker's stonith is enabled and configured properly. Likewise, you will
> need to configure DRBD to use the 'crm-fence-peer.sh' handler and have the
> 'fencing resource-and-stonith;' policy.
>
> digimer
>
Thanks for your answer digimer,
so that no-quorum-policy=ignore part is only for resources, while the
cluster.conf has to be put as in RHCS for cluster memberships?
I think the problem is partly due to missing stonith configuration,
but actually the drbd crm-fence-peer.sh script takes important part
too. See below.
As my test nodes are vSphere VMs I have installed
VMware-vSphere-CLI-5.5 so that next days I can test fence_vmware agent
and stonith
(I verified that basic "status" and "off" commands work)
An insight about configuration:
Note that my hostnames are
node01.localdomain.com
node02.localdomain.com
with their ip on 192.168.33.x network
I also have another network interface where I used these names:
iclnode01
iclnode02
with their ip on 192.168.230.x network
and I want to use it for drbd and cluster communication
As drbd needs hostnames in its config (at least I read so), I have
configured the drbd resource this way:
on node01.localdomain.local {
address 192.168.230.221:7788;
meta-disk internal;
}
using hostname (node01.localdomain.local) but the ip on the other
network (the one of iclnode01).
Is this correct?
I also put iclnode01 and iclnode02 names in cluster.conf
And so pacemaker knows the nodes with iclnode01/02
So in normal situation, crm_mon gives:
"
Online: [ icloveng01 icloveng02 ]
Master/Slave Set: ms_OvirtData [OvirtData]
Masters: [ icloveng01 ]
Slaves: [ icloveng02 ]
"
Suppose I power off the slave host (node02), I still get a stop of the
drbd resource on master node01 and so of the whole group, because
when crm-fence-peer.sh runs it put this kind of constraint
Mar 5 10:42:39 node01 crm-fence-peer.sh[18113]: invoked for res0
Mar 5 10:42:39 node01 cibadmin[18144]: notice: crm_log_args:
Invoked: cibadmin -C -o constraints -X <rsc_location
rsc="ms_MyData" id="drbd-fence-by-handler-res0-ms_MyData">#012 <rule
role="Master" score="-INFINITY" id="drbd-fen
ce-by-handler-res0-rule-ms_MyData">#012 <expression
attribute="#uname" operation="ne" value="node01.localdomai
n.local" id="drbd-fence-by-handler-res0-expr-ms_MyData"/>#012
</rule>#012</rsc_location>
Mar 5 10:42:39 node01 crmd[1972]: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=
I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Mar 5 10:42:39 node01 stonith-ng[1968]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: Diff: --- 0.142.3
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: Diff: +++
0.143.1 7a98665c4dd4697f6ed0be42e8c49de5
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: -- <cib
admin_epoch="0" epoch="142" num_updates="3"/>
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++
<rsc_location rsc="ms_MyData" id="drbd-fence-by-han
dler-res0-ms_MyData">
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ <rule
role="Master" score="-INFINITY" id="drbd-fence
-by-handler-res0-rule-ms_MyData">
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++
<expression attribute="#uname" operation="ne" valu
e="node01.localdomain.local" id="drbd-fence-by-handler-res0-expr-ms_MyData"/>
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ </rule>
Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ </rsc_location>
Mar 5 10:42:39 node01 pengine[1971]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Mar 5 10:42:39 node01 pengine[1971]: notice: LogActions: Demote
MyData:0#011(Master -> Slave iclnode01)
And note that it uses the hostname (node01.localdomain.local), not the
intracluster node name (iclnode01)
So, as it puts -INFINITY to all that is different by
node01.localdomain.local it demotes iclnode01 itself that was
master....
Hope to have let more clear my opinion...
I walked through the script and found
it gets cluster properties this way
dc-version=1.1.10-14.el6_5.2-368c726
cluster-infrastructure=cman
stonith-enabled=false
last-lrm-refresh=1393868222
no-quorum-policy=ignore
default-resource-stickiness=200
the constraint is put as:
<rsc_location rsc=\"$master_id\" id=\"$id_prefix-$master_id\">
<rule role=\"$role\" score=\"-INFINITY\" id=\"$id_prefix-rule-$master_id\">
<expression attribute=\"$fencing_attribute\" operation=\"ne\"
value=\"$fencing_value\" id=\"$id_prefix-expr-$master_id\"/>
</rule>
</rsc_location>"
and $fencing_value is the actor in place..
but it is assigned this way I don't completely understand (#uname in
particular...)
if [[ $fencing_attribute = "#uname" ]]; then
fencing_value=$HOSTNAME
elif ! fencing_value=$(crm_attribute -Q -t nodes -n
$fencing_attribute 2>/dev/null); then
fencing_attribute="#uname"
fencing_value=$HOSTNAME
fi
Inside the script there is statically
HOSTNAME=$(uname -n)
so there is no much chance to customize.. I suppose
and I have to use hostname and its network for my intracluster???
BTW inside the script there are some references to crm_attibute that I
don't find in the manual page
(eg option -t with "status" and "nodes" values that I don't find in
manual page or --help invocation)
BBTW: clustering is going to become more and more difficult and
confusing: I'm still trying to use now cman instead of corosync with
pacemaker as recommended for 6.5 and I just found that in RH EL 7 beta
cman is out completely and there is again corosync.... ;-)
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/High_Availability_Add-On_Reference/s1-configfileoverview-HAAR.html
Gianluca
More information about the Pacemaker
mailing list