[Pacemaker] Resource does not auto recover from failed state
tetsuo shima
tetsuo.41.shima at gmail.com
Thu Aug 29 10:04:53 UTC 2013
Update : I tried to spot the problem by running 2 Wheezy virtual machines
configured with debian pinning like this :
# cat /etc/apt/preferences
Package: *
Pin: release a=wheezy
Pin-Priority: 900
Package: *
Pin: release a=squeeze
Pin-Priority: 800
# aptitude install corosync
# aptitude install pacemaker/squeeze
so :
root at pcmk2:/etc/corosync# dpkg -l | grep pacem
ii pacemaker 1.0.9.1+hg15626-1
amd64 HA cluster resource manager
root at pcmk2:/etc/corosync# dpkg -l | grep corosync
ii corosync 1.4.2-3
amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync4 1.4.2-3
all Standards-based cluster framework (transitional package)
and the problem did not occur :
root at pcmk1:~/pacemaker# crm_mon -1
============
Last updated: Thu Aug 29 05:53:50 2013
Stack: openais
Current DC: pcmk1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ pcmk2 pcmk1 ]
ip (ocf::heartbeat:IPaddr2): Started pcmk1
Clone Set: mysql-mm (unmanaged)
mysql:0 (ocf::heartbeat:mysql): Started pcmk2 (unmanaged)
mysql:1 (ocf::heartbeat:mysql): Started pcmk1 (unmanaged)
root at pcmk2:/etc/corosync# /etc/init.d/mysql stop
[ ok ] Stopping MySQL database server: mysqld.
root at pcmk1:~/pacemaker# crm_mon -1
============
Last updated: Thu Aug 29 05:55:39 2013
Stack: openais
Current DC: pcmk1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ pcmk2 pcmk1 ]
ip (ocf::heartbeat:IPaddr2): Started pcmk1
Clone Set: mysql-mm (unmanaged)
mysql:0 (ocf::heartbeat:mysql): Started pcmk2 (unmanaged) FAILED
mysql:1 (ocf::heartbeat:mysql): Started pcmk1 (unmanaged)
Failed actions:
mysql:0_monitor_15000 (node=pcmk2, call=5, rc=7, status=complete): not
running
root at pcmk2:/etc/corosync# /etc/init.d/mysql start
[ ok ] Starting MySQL database server: mysqld ..
[info] Checking for tables which need an upgrade, are corrupt or were
not closed cleanly..
root at pcmk1:~/pacemaker# crm_mon -1
============
Last updated: Thu Aug 29 05:56:34 2013
Stack: openais
Current DC: pcmk1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ pcmk2 pcmk1 ]
ip (ocf::heartbeat:IPaddr2): Started pcmk1
Clone Set: mysql-mm (unmanaged)
mysql:0 (ocf::heartbeat:mysql): Started pcmk2 (unmanaged)
mysql:1 (ocf::heartbeat:mysql): Started pcmk1 (unmanaged)
-----
What I noticed :
with pacemaker 1.1.7, crm see 3 resources configured when in 1.0.9 it sees
2 resources (for the exact same configuration)
2013/8/27 tetsuo shima <tetsuo.41.shima at gmail.com>
> Hi list !
>
> I'm having an issue with corosync, here is the scenario :
>
> # crm_mon -1
> ============
> Last updated: Tue Aug 27 09:50:13 2013
> Last change: Mon Aug 26 16:06:01 2013 via cibadmin on node2
> Stack: openais
> Current DC: node1 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
>
> Online: [ node2 node1 ]
>
> ip (ocf::heartbeat:IPaddr2): Started node1
> Clone Set: mysql-mm [mysql] (unmanaged)
> mysql:0 (ocf::heartbeat:mysql): Started node1 (unmanaged)
> mysql:1 (ocf::heartbeat:mysql): Started node2 (unmanaged)
>
> # /etc/init.d/mysql stop
> [ ok ] Stopping MySQL database server: mysqld.
>
> # crm_mon -1
> ============
> Last updated: Tue Aug 27 09:50:30 2013
> Last change: Mon Aug 26 16:06:01 2013 via cibadmin on node2
> Stack: openais
> Current DC: node1 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
>
> Online: [ node2 node1 ]
>
> ip (ocf::heartbeat:IPaddr2): Started node1
> Clone Set: mysql-mm [mysql] (unmanaged)
> mysql:0 (ocf::heartbeat:mysql): Started node1 (unmanaged)
> mysql:1 (ocf::heartbeat:mysql): Started node2 (unmanaged) FAILED
>
> Failed actions:
> mysql:0_monitor_15000 (node=node2, call=27, rc=7, status=complete):
> not running
>
> # /etc/init.d/mysql start
> [ ok ] Starting MySQL database server: mysqld ..
> [info] Checking for tables which need an upgrade, are corrupt or were
> not closed cleanly..
>
> # sleep 60 && crm_mon -1
> ============
> Last updated: Tue Aug 27 09:51:54 2013
> Last change: Mon Aug 26 16:06:01 2013 via cibadmin on node2
> Stack: openais
> Current DC: node1 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
>
> Online: [ node2 node1 ]
>
> ip (ocf::heartbeat:IPaddr2): Started node1
> Clone Set: mysql-mm [mysql] (unmanaged)
> mysql:0 (ocf::heartbeat:mysql): Started node1 (unmanaged)
> mysql:1 (ocf::heartbeat:mysql): Started node2 (unmanaged) FAILED
>
> Failed actions:
> mysql:0_monitor_15000 (node=node2, call=27, rc=7, status=complete):
> not running
>
> As you can see, every time I stop Mysql (which is unmanaged), the resource
> is marked as failed :
>
> crmd: [1828]: info: process_lrm_event: LRM operation mysql:0_monitor_15000
> (call=4, rc=7, cib-update=10, confirmed=false) not running
>
> When I restart the resource :
>
> crmd: [1828]: info: process_lrm_event: LRM operation mysql:0_monitor_15000
> (call=4, rc=0, cib-update=11, confirmed=false) ok
>
> The resource is still in failed state and does not recover until I
> manually clean up the resource.
>
> # crm_mon --one-shot --operations
> ============
> Last updated: Tue Aug 27 10:17:30 2013
> Last change: Mon Aug 26 16:06:01 2013 via cibadmin on node2
> Stack: openais
> Current DC: node1 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
>
> Online: [ node2 node1 ]
>
> ip (ocf::heartbeat:IPaddr2): Started node1
> Clone Set: mysql-mm [mysql] (unmanaged)
> mysql:0 (ocf::heartbeat:mysql): Started node1 (unmanaged)
> mysql:1 (ocf::heartbeat:mysql): Started node2 (unmanaged) FAILED
>
> Operations:
> * Node node1:
> ip: migration-threshold=1
> + (57) probe: rc=0 (ok)
> mysql:0: migration-threshold=1 fail-count=1
> + (58) probe: rc=0 (ok)
> + (59) monitor: interval=15000ms rc=0 (ok)
> * Node node2:
> mysql:0: migration-threshold=1 fail-count=3
> + (27) monitor: interval=15000ms rc=7 (not running)
> + (27) monitor: interval=15000ms rc=0 (ok)
>
> Failed actions:
> mysql:0_monitor_15000 (node=node2, call=27, rc=7, status=complete):
> not running
>
> ---
>
> Here is some details about my configuration :
>
> # cat /etc/debian_version
> 7.1
>
> # dpk# dpkg -l | grep corosync
> ii corosync 1.4.2-3
> amd64 Standards-based cluster framework
>
> # dpkg -l | grep pacem
> ii pacemaker 1.1.7-1
> amd64 HA cluster resource manager
>
> # crm configure show
> node node2 \
> attributes standby="off"
> node node1
> primitive ip ocf:heartbeat:IPaddr2 \
> params ip="192.168.0.20" cidr_netmask="255.255.0.0" nic="eth2.2755"
> iflabel="mysql" \
> meta is-managed="true" target-role="Started" \
> meta resource-stickiness="100"
> primitive mysql ocf:heartbeat:mysql \
> op monitor interval="15" timeout="30"
> clone mysql-mm mysql \
> meta is-managed="false"
> location cli-prefer-ip ip 50: node1
> colocation ip-on-mysql-mm 200: ip mysql-mm
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1377513557" \
> start-failure-is-fatal="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="1" \
> migration-threshold="1"
>
> ---
>
> Does anyone know what is wrong with my configuration ?
>
> Thanks for the help,
>
> Best regards.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130829/c9fa1970/attachment.htm>
More information about the Pacemaker
mailing list