[Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

Gianluca Cecchi gianluca.cecchi at gmail.com
Wed Mar 5 12:34:25 CET 2014


On Mon, Mar 3, 2014 at 9:29 PM, Digimer  wrote:
> Two possible problems;
>
> 1. cman's cluster.conf needs the '<cman two_node="1" expected_votes="1" />'.
>
> 2. You don't have fencing setup. The 'fence_pcmk' script only works if
> pacemaker's stonith is enabled and configured properly. Likewise, you will
> need to configure DRBD to use the 'crm-fence-peer.sh' handler and have the
> 'fencing resource-and-stonith;' policy.
>
> digimer
>


Thanks for your answer digimer,
so that no-quorum-policy=ignore part is only for resources, while the
cluster.conf has to be put as in RHCS for cluster memberships?

I think the problem is partly due to missing stonith configuration,
but actually the drbd crm-fence-peer.sh script takes important part
too. See below.
As my test nodes are vSphere VMs I have installed
VMware-vSphere-CLI-5.5 so that next days I can test fence_vmware agent
and stonith
(I verified that basic "status" and "off" commands work)

An insight about configuration:
Note that my hostnames are
node01.localdomain.com
node02.localdomain.com
with their ip on 192.168.33.x network

I also have another network interface where I used these names:
iclnode01
iclnode02
with their ip on 192.168.230.x network
and I want to use it for drbd and cluster communication

As drbd needs hostnames in its config (at least I read so), I have
configured the drbd resource this way:
on node01.localdomain.local {
 address 192.168.230.221:7788;
 meta-disk internal;
 }

using hostname (node01.localdomain.local) but the ip on the other
network (the one of iclnode01).
Is this correct?

I also put iclnode01 and iclnode02 names in cluster.conf
And so pacemaker knows the nodes with iclnode01/02

So in normal situation, crm_mon gives:
"
Online: [ icloveng01 icloveng02 ]

 Master/Slave Set: ms_OvirtData [OvirtData]
     Masters: [ icloveng01 ]
     Slaves: [ icloveng02 ]
"

Suppose I power off the slave host (node02), I still get a stop of the
drbd resource on master node01 and so of the whole group, because
when crm-fence-peer.sh runs it put this kind of constraint

Mar  5 10:42:39 node01 crm-fence-peer.sh[18113]: invoked for res0
Mar  5 10:42:39 node01 cibadmin[18144]:   notice: crm_log_args:
Invoked: cibadmin -C -o constraints -X <rsc_location
rsc="ms_MyData" id="drbd-fence-by-handler-res0-ms_MyData">#012  <rule
role="Master" score="-INFINITY" id="drbd-fen
ce-by-handler-res0-rule-ms_MyData">#012    <expression
attribute="#uname" operation="ne" value="node01.localdomai
n.local" id="drbd-fence-by-handler-res0-expr-ms_MyData"/>#012
</rule>#012</rsc_location>
Mar  5 10:42:39 node01 crmd[1972]:   notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=
I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Mar  5 10:42:39 node01 stonith-ng[1968]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: Diff: --- 0.142.3
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: Diff: +++
0.143.1 7a98665c4dd4697f6ed0be42e8c49de5
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: -- <cib
admin_epoch="0" epoch="142" num_updates="3"/>
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: ++
<rsc_location rsc="ms_MyData" id="drbd-fence-by-han
dler-res0-ms_MyData">
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: ++         <rule
role="Master" score="-INFINITY" id="drbd-fence
-by-handler-res0-rule-ms_MyData">
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: ++
<expression attribute="#uname" operation="ne" valu
e="node01.localdomain.local" id="drbd-fence-by-handler-res0-expr-ms_MyData"/>
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: ++         </rule>
Mar  5 10:42:39 node01 cib[1967]:   notice: cib:diff: ++       </rsc_location>
Mar  5 10:42:39 node01 pengine[1971]:   notice: unpack_config: On loss
of CCM Quorum: Ignore
Mar  5 10:42:39 node01 pengine[1971]:   notice: LogActions: Demote
MyData:0#011(Master -> Slave iclnode01)


And note that it uses the hostname (node01.localdomain.local), not the
intracluster node name (iclnode01)
So, as it puts -INFINITY to all that is different by
node01.localdomain.local it demotes iclnode01 itself that was
master....

Hope to have let more clear my opinion...
I walked through the script and found

it gets cluster properties this way
dc-version=1.1.10-14.el6_5.2-368c726
cluster-infrastructure=cman
stonith-enabled=false
last-lrm-refresh=1393868222
no-quorum-policy=ignore
default-resource-stickiness=200

the constraint is put as:

<rsc_location rsc=\"$master_id\" id=\"$id_prefix-$master_id\">
  <rule role=\"$role\" score=\"-INFINITY\" id=\"$id_prefix-rule-$master_id\">
    <expression attribute=\"$fencing_attribute\" operation=\"ne\"
value=\"$fencing_value\" id=\"$id_prefix-expr-$master_id\"/>
  </rule>
</rsc_location>"

and $fencing_value is the actor in place..

but it is assigned this way I don't completely understand (#uname in
particular...)

                if [[ $fencing_attribute = "#uname" ]]; then
                        fencing_value=$HOSTNAME
                elif ! fencing_value=$(crm_attribute -Q -t nodes -n
$fencing_attribute 2>/dev/null); then
                        fencing_attribute="#uname"
                        fencing_value=$HOSTNAME
                fi

Inside the script there is statically
HOSTNAME=$(uname -n)

so there is no much chance to customize.. I suppose
and I have to use hostname and its network for my intracluster???

BTW inside the script there are some references to crm_attibute that I
don't find in the manual page
(eg option -t with "status" and "nodes" values that I don't find in
manual page or --help invocation)

BBTW: clustering is going to become more and more difficult and
confusing: I'm still trying to use now cman instead of corosync with
pacemaker as recommended for 6.5 and I just found that in RH EL 7 beta
cman is out completely and there is again corosync.... ;-)
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/High_Availability_Add-On_Reference/s1-configfileoverview-HAAR.html

Gianluca



More information about the Pacemaker mailing list