[Pacemaker] node offline after fencing (pacemakerd hangs)
Ulrich Leodolter
ulrich.leodolter at obvsg.at
Tue Jul 17 13:24:29 UTC 2012
hi,
i have setup a very basic 2-node cluster on RHEL 6.3
first thing i tried was to setup stonith/fencing_ipmilan
resource.
fencing seems to work, if i kill corosync on one node
it is restarted (ipmi reboot) by the other node.
but after restart the cluster doesn't come back to normal
operation, i looks like the pacemakerd hangs and the
node status is offline.
i found only one way to fix the problem:
killall -9 pacemakerd
service pacemakerd start
after that both nodes are online. below you can see my
cluster configuration and the corosync.log messages which
repeat forever when pacemakerd hangs.
i am new to pacemaker and followed the "Clusters from Scratch"
guide for the first setup. information about fence_ipmilan
is from google :-)
can u give me tips ?? what is wrong with this basic cluster
config. i don't want to add more resources (kvm virtual
machines) until fencing is configured correctly.
thx
ulrich
[root at pcmk1 ~]# crm configure show
node pcmk1 \
attributes standby="off"
node pcmk2 \
attributes standby="off"
primitive p_stonith_pcmk1 stonith:fence_ipmilan \
params auth="password" ipaddr="192.168.120.171" passwd="xxx" lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" pcmk_host_check="static-list" pcmk_host_list="pcmk1" \
meta target-role="started"
primitive p_stonith_pcmk2 stonith:fence_ipmilan \
params auth="password" ipaddr="192.168.120.172" passwd="xxx" lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" pcmk_host_check="static-list" pcmk_host_list="pcmk2" \
meta target-role="started"
location loc_p_stonith_pcmk1_pcmk1 p_stonith_pcmk1 -inf: pcmk1
location loc_p_stonith_pcmk2_pcmk2 p_stonith_pcmk2 -inf: pcmk2
property $id="cib-bootstrap-options" \
expected-quorum-votes="2" \
dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
no-quorum-policy="ignore" \
cluster-infrastructure="openais"
rsc_defaults $id="rsc-options" \
resource-stickiness="200"
/var/log/cluster/corosync.log:
Jul 13 11:29:41 [1859] pcmk2 crmd: info: do_dc_release: DC role released
Jul 13 11:29:41 [1859] pcmk2 crmd: info: do_te_control: Transitioner is now inactive
Jul 13 11:29:41 [1854] pcmk2 cib: info: set_crm_log_level: New log level: 3 0
Jul 13 11:30:01 [1859] pcmk2 crmd: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms)
Jul 13 11:30:01 [1859] pcmk2 crmd: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jul 13 11:30:01 [1859] pcmk2 crmd: notice: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_poppe
d ]
Jul 13 11:30:01 [1859] pcmk2 crmd: info: do_election_count_vote: Election 8 (owner: pcmk1) lost: vote from pcmk1 (Uptime)
Jul 13 11:30:01 [1859] pcmk2 crmd: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_
vote ]
--
Ulrich Leodolter <ulrich.leodolter at obvsg.at>
OBVSG
More information about the Pacemaker
mailing list