[Pacemaker] Stonith issue with fence_virsh
Beo Banks
beo.banks at googlemail.com
Wed Oct 23 08:57:28 UTC 2013
*hi,
i wants to testing the fail-over capabilities of my cluster.
i run pkill -9 corosync on 2nd node and i saw on the 1node that he wants to
stonith the node2 but he "giving up after too many failures to fence node"
via commandline it works without any problems
fence_virsh -a host2 -l root -x -k /root/.ssh/id_rsa -o reboot -v -n
zarafa02
**setup
2x kvm guest (zarafa01=node1 / zarafa02=node2)
2x kvm host
rhel 6.4
pacemaker,corosync,drbd*
*
*
*hopefully somebody can help me with the issue and the 2nd issue after run
the fence_virsh via commandline the pacemaker service isn´t up on the 2nd
node.
*
*
node1/var/log/messages
Oct 23 09:35:28 zarafa01 pengine[2866]: warning: stage6: Scheduling Node
zarafa02for STONITH
Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop
drbd_mysql:1#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop
drbd_zarafa:1#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop
apache:1#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop
stonith-zarafa01#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]: warning: process_pe_message:
Calculated Transition 183: (null)
Oct 23 09:35:28 zarafa01 crmd[29263]: notice: te_fence_node: Executing
reboot fencing operation (124) on zarafa02 (timeout=60000)
Oct 23 09:35:28 zarafa01 stonith-ng[2863]: notice: handle_request: Client
crmd.29263.8f8f06d0 wants to fence (reboot) 'zarafa02' with device '(any)'
Oct 23 09:35:28 zarafa01 stonith-ng[2863]: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
zarafa02: 88604a94-8e2e-4ce4-9d08-85559e339f8e (0)
Oct 23 09:35:28 zarafa01 crmd[29263]: notice: process_lrm_event: LRM
operation drbd_mysql_notify_0 (call=710, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:35:28 zarafa01 crmd[29263]: notice: process_lrm_event: LRM
operation drbd_zarafa_notify_0 (call=712, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:36:40 zarafa01 stonith-ng[2863]: error: remote_op_done:
Operation reboot of zarafa02 by zarafa01 for crmd.29263 at zarafa01.88604a94:
Timer expired
Oct 23 09:36:40 zarafa01 crmd[29263]: notice: tengine_stonith_callback:
Stonith operation 5/124:183:0:cf74ef64-3995-414e-8ebd-ebacc89ace85: Timer
expired (-62)
Oct 23 09:36:40 zarafa01 crmd[29263]: notice: tengine_stonith_callback:
Stonith operation 5 for zarafa02 failed (Timer expired): aborting
transition.
Oct 23 09:36:40 zarafa01 crmd[29263]: notice: tengine_stonith_notify:
Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for
zarafa01: Timer expired (ref=88604a94-8e2e-4ce4-9d08-85559e339f8e) by
client crmd.29263
Oct 23 09:36:40 zarafa01 crmd[29263]: notice: run_graph: Transition 183
(Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown):
Stopped
Oct 23 09:36:40 zarafa01 pengine[2866]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Oct 23 09:36:40 zarafa01 pengine[2866]: warning: pe_fence_node: Node
zarafa02 will be fenced because the node is no longer part of the cluster
Oct 23 09:36:40 zarafa01 pengine[2866]: warning: determine_online_status:
Node zarafa02 is unclean
Oct 23 09:37:52 zarafa01 crmd[29263]: notice: tengine_stonith_callback:
Stonith operation 6 for zarafa02 failed (Timer expired): aborting
transition.
Oct 23 09:37:52 zarafa01 crmd[29263]: notice: tengine_stonith_notify:
Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for
zarafa01: Timer expired (ref=b13b2562-4124-4e6c-acca-e1114f7d9b98) by
client crmd.29263
Oct 23 09:37:52 zarafa01 crmd[29263]: notice: run_graph: Transition 184
(Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown):
Stopped
Oct 23 09:37:52 zarafa01 pengine[2866]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Oct 23 09:37:52 zarafa01 pengine[2866]: warning: pe_fence_node: Node
zarafa02 will be fenced because the node is no longer part of the cluster
Oct 23 09:37:52 zarafa01 pengine[2866]: warning: determine_online_status:
Node zarafa02 is unclean
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: determine_online_status:
Node zarafa02 is unclean
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
apache:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
apache:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
stonith-zarafa01_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action
stonith-zarafa01_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:43:52 zarafa01 pengine[2866]: notice: LogActions: Stop
apache:1#011(zarafa02)
Oct 23 09:43:52 zarafa01 pengine[2866]: notice: LogActions: Stop
stonith-zarafa01#011(zarafa02)
Oct 23 09:43:52 zarafa01 crmd[29263]: notice: te_fence_node: Executing
reboot fencing operation (124) on zarafa02 (timeout=60000)
Oct 23 09:43:52 zarafa01 pengine[2866]: warning: process_pe_message:
Calculated Transition 190: (null)
Oct 23 09:43:52 zarafa01 stonith-ng[2863]: notice: handle_request: Client
crmd.29263.8f8f06d0 wants to fence (reboot) 'zarafa02' with device '(any)'
Oct 23 09:43:52 zarafa01 stonith-ng[2863]: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
zarafa02: de24f595-81e3-49f5-8886-07c8c1b22ec7 (0)
Oct 23 09:43:52 zarafa01 crmd[29263]: notice: process_lrm_event: LRM
operation drbd_mysql_notify_0 (call=752, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:43:52 zarafa01 crmd[29263]: notice: process_lrm_event: LRM
operation drbd_zarafa_notify_0 (call=754, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:44:04 zarafa01 rsyslogd-2177: imuxsock lost 92458 messages from
pid 1927 due to rate-limiting
Oct 23 09:44:04 zarafa01 rsyslogd-2177: imuxsock begins to drop messages
from pid 1927 due to rate-limiting
Oct 23 09:45:02 zarafa01 rsyslogd-2177: imuxsock lost 13836 messages from
pid 1927 due to rate-limiting
Oct 23 09:45:03 zarafa01 rsyslogd-2177: imuxsock begins to drop messages
from pid 1927 due to rate-limiting
Oct 23 09:45:04 zarafa01 stonith-ng[2863]: error: remote_op_done:
Operation reboot of zarafa02 by zarafa01 for crmd.29263 at zarafa01.de24f595:
Timer expired
Oct 23 09:45:04 zarafa01 crmd[29263]: notice: tengine_stonith_callback:
Stonith operation 12/124:190:0:cf74ef64-3995-414e-8ebd-ebacc89ace85: Timer
expired (-62)
Oct 23 09:45:04 zarafa01 crmd[29263]: notice: tengine_stonith_callback:
Stonith operation 12 for zarafa02 failed (Timer expired): aborting
transition.
Oct 23 09:45:04 zarafa01 crmd[29263]: notice: tengine_stonith_notify:
Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for
zarafa01: Timer expired (ref=de24f595-81e3-49f5-8886-07c8c1b22ec7) by
client crmd.29263
Oct 23 09:45:04 zarafa01 crmd[29263]: notice: run_graph: Transition 190
(Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown):
Stopped
Oct 23 09:45:04 zarafa01 crmd[29263]: notice: too_many_st_failures: Too
many failures to fence zarafa02 (11), giving up
Oct 23 09:45:08 zarafa01 rsyslogd-2177: imuxsock lost 178501 messages from
pid 1927 due to rate-limiting
node zarafa01\
attributes standby="off"
node zarafa02 \
attributes standby="off"
primitive apache ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval="60s" \
op start interval="0" timeout="40s" \
op stop interval="0" timeout="60s"
primitive drbd_mysql ocf:linbit:drbd \
params drbd_resource="mysql" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op monitor interval="59s" role="Master" timeout="30s" \
op monitor interval="60s" role="Slave" timeout="30s"
primitive drbd_zarafa ocf:linbit:drbd \
params drbd_resource="zarafa" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="240" \
op monitor interval="59s" role="Master" timeout="30s" \
op monitor interval="60s" role="Slave" timeout="30s"
primitive mysql_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/data/mysql" fstype="ext4"
options="noatime" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op monitor interval="30s" timeout="40s"
primitive mysql_ip ocf:heartbeat:IPaddr2 \
params ip="0.0.0.0" iflabel="MYSQL" cidr_netmask="20" nic="eth0" \
op monitor interval="30s"
primitive mysqld lsb:mysqld \
op monitor interval="10" timeout="30" \
op start interval="0" timeout="500" \
op stop interval="0" timeout="500"
primitive stonith-zarafa01 stonith:fence_virsh \
params pcmk_host_list="zarafa01" pcmk_host_check="static-list"
action="reboot" ipaddr="host01" secure="true" login="root"
identity_file="/root/.ssh/id_rsa" \
op monitor interval="300s" \
op start interval="0" timeout="60s" \
meta failure-timeout="180s"
primitive stonith-zarafa02 stonith:fence_virsh \
params pcmk_host_list="zarafa02" pcmk_host_check="static-list"
action="reboot" ipaddr="host02" secure="true" delay="5" login="root"
identity_file="/root/.ssh/id_rsa" \
op monitor interval="300s" \
op start interval="0" timeout="60s" \
meta failure-timeout="180s"
primitive zarafa-dagent lsb:zarafa-dagent \
op monitor interval="30" timeout="30" \
meta target-role="Started"
primitive zarafa-gateway lsb:zarafa-gateway \
op monitor interval="30" timeout="30"
primitive zarafa-ical lsb:zarafa-ical \
op monitor interval="30" timeout="30"
primitive zarafa-indexer lsb:zarafa-indexer \
op monitor interval="60" timeout="60" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
primitive zarafa-licensed lsb:zarafa-licensed \
op monitor interval="30" timeout="30"
primitive zarafa-monitor lsb:zarafa-monitor \
op monitor interval="30" timeout="30"
primitive zarafa-server lsb:zarafa-server \
op monitor interval="30" timeout="90" \
meta target-role="Started"
primitive zarafa-spooler lsb:zarafa-spooler \
op monitor interval="30" timeout="30"
primitive zarafa_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/data/zarafa" fstype="ext4" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op monitor interval="30s" timeout="40s" \
meta target-role="Started"
primitive zarafa_ip ocf:heartbeat:IPaddr2 \
params ip="0.0.0.1" iflabel="ZARAFA" cidr_netmask="20" nic="eth0" \
op monitor interval="30s" \
meta target-role="Started"
group mysql mysql_fs mysql_ip mysqld \
meta target-role="Started"
group zarafa zarafa_fs zarafa_ip zarafa-server zarafa-spooler zarafa-dagent
zarafa-licensed zarafa-monitor zarafa-gateway zarafa-ical zarafa-indexer \
meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
ms ms_drbd_zarafa drbd_zarafa \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
clone apache_clone apache
location cli-prefer-mysql mysql \
rule $id="cli-prefer-rule-mysql" inf: #uname eq zarafa01
location drbd-fence-by-handler-mysql-ms_drbd_mysql ms_drbd_mysql \
rule $id="drbd-fence-by-handler-mysql-rule-ms_drbd_mysql"
$role="Master" -inf: #uname ne zarafa01
location drbd-fence-by-handler-zarafa-ms_drbd_zarafa ms_drbd_zarafa \
rule $id="drbd-fence-by-handler-zarafa-rule-ms_drbd_zarafa"
$role="Master" -inf: #uname ne zarafa01
location preferred_on_mysql mysql 100: zarafa01
location preferred_on_zarafa zarafa 100: zarafa01
location stonith-by-zarafa01 stonith-zarafa02 -inf: zarafa02
location stonith-by-zarafa02 stonith-zarafa01 -inf: zarafa01
colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
colocation zarafa_on_drbd inf: zarafa ms_drbd_zarafa:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
order zarafa_after_drbd inf: ms_drbd_zarafa:promote zarafa:start
order zarafa_after_mysql inf: mysql:start zarafa:start
property $id="cib-bootstrap-options" \
dc-version="1.1.8-7.el6-394e906" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="true" \
cluster-recheck-interval="5min" \
no-quorum-policy="ignore" \
last-lrm-refresh="1382443560" \
maintenance-mode="off"
rsc_defaults $id="rsc-options" \
resource-stickiness="200" \
failure-timeout="10min" \
migration-threshold="3"
crm status
Last updated: Wed Oct 23 10:51:51 2013
Last change: Wed Oct 23 10:12:17 2013 via cibadmin on zarafa01
Stack: classic openais (with plugin)
Current DC: zarafa01 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
21 Resources configured.
Online: [ zarafa01 zarafa02]
Resource Group: mysql
mysql_fs (ocf::heartbeat:Filesystem): Started zarafa01
mysql_ip (ocf::heartbeat:IPaddr2): Started zarafa01
mysqld (lsb:mysqld): Started zarafa01
Master/Slave Set: ms_drbd_mysql [drbd_mysql]
Masters: [ zarafa01 ]
Stopped: [ drbd_mysql:1 ]
Resource Group: zarafa
zarafa_fs (ocf::heartbeat:Filesystem): Started zarafa01
zarafa_ip (ocf::heartbeat:IPaddr2): Started zarafa01
zarafa-server (lsb:zarafa-server): Started zarafa01
zarafa-spooler (lsb:zarafa-spooler): Started zarafa01
zarafa-dagent (lsb:zarafa-dagent): Started zarafa01
zarafa-licensed (lsb:zarafa-licensed): Started zarafa01
zarafa-monitor (lsb:zarafa-monitor): Started zarafa01
zarafa-gateway (lsb:zarafa-gateway): Started zarafa01
zarafa-ical (lsb:zarafa-ical): Started zarafa01
zarafa-indexer (lsb:zarafa-indexer): Started zarafa01
Master/Slave Set: ms_drbd_zarafa [drbd_zarafa]
Masters: [ zarafa01 ]
Stopped: [ drbd_zarafa:1 ]
Clone Set: apache_clone [apache]
Started: [ zarafa01 ]
Stopped: [ apache:1 ]
stonith-zarafa02 (stonith:fence_virsh): Started zarafa01
*
*thanks
beo
*
*
*
**
*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131023/e2d545c8/attachment-0003.html>
More information about the Pacemaker
mailing list