[Pacemaker] cluster misbehaving after update
Xzarth
xzarth at gmail.com
Fri Aug 2 06:57:25 UTC 2013
On 08/02/2013 02:16 AM, Andrew Beekhof wrote:
> On 01/08/2013, at 10:24 PM, Xzarth <xzarth at gmail.com> wrote:
>
>> Hi,
>>
>> I updated from pacemaker 1.0.9 to 1.1.7
> Distro? Seems strange to be upgrading to a release from 1.5 years ago.
> We're up to 1.1.10 now
>
I have debian, i have one with stable (wheezy), and one with oldstable
(squeeze), installed from backports. Behavior is same on both.
>> After the update, cluster behaves differently than before. I have a
>> resource with migration-treshold="1", once that resource fails
>> everything used to migrate to another node (what i would expect).
>> After the upgrade, once that resource fails, cluster stops any resources
>> that depend on that resource and just hangs there. What changed, since i
>> haven't touched the config?
> Can you attach the result of cibadmin -Ql when the cluster is in this state?
>
here it is
>>
>> Here is the config:
>>
>> node $id="1bb92e1d" asttest1 \
>> attributes standby="off"
>> node $id="5e583c54" asttest2 \
>> attributes standby="off"
>> node asttest1
>> node asttest2
>> primitive asterisk lsb:asterisk-11.0.1 \
>> op start interval="0" timeout="15s" \
>> op stop interval="0" timeout="15s" \
>> op monitor interval="1s" timeout="15s" start-delay="10"
>> primitive dahdi lsb:dahdi \
>> op start interval="0" timeout="15s" \
>> op stop interval="0" timeout="15s" \
>> op monitor interval="1s" timeout="15s"
>> primitive drbd ocf:linbit:drbd \
>> params drbd_resource="r0" \
>> op monitor interval="29s" role="Master" \
>> op monitor interval="31s" role="Slave"
>> primitive fonulator lsb:fonulator \
>> op start interval="0" timeout="20s" \
>> op stop interval="0" timeout="20s" \
>> op monitor interval="1s" timeout="20s" start-delay="30" \
>> meta migration-threshold="1" failure-timeout="60s"
>> primitive fs_drbd ocf:heartbeat:Filesystem \
>> params device="/dev/drbd/by-res/r0" directory="/mnt/drbd" fstype="ext3" \
>> op start interval="0" timeout="60s" start-delay="1" \
>> op stop interval="0" timeout="60s" start-delay="1" \
>> op monitor interval="1s" timeout="40s" start-delay="30" \
>> meta is-managed="true" target-role="Started"
>> primitive httpd lsb:apache2 \
>> op start interval="0" timeout="20s" \
>> op stop interval="0" timeout="20s" \
>> op monitor interval="1s" timeout="20s" start-delay="10"
>> primitive iax2_mon lsb:iax2_mon \
>> op start interval="0" timeout="20s" \
>> op stop interval="0" timeout="20s" \
>> op monitor interval="60s" timeout="20s" start-delay="30" \
>> meta failure-timeout="60s"
>> primitive ip_voip_route_default ocf:heartbeat:Route \
>> params destination="default" gateway="10.2.4.1" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_route_test1 ocf:heartbeat:Route \
>> params destination="X.X.X.X/32" gateway="X.X.X.X" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_route_test2 ocf:heartbeat:Route \
>> params destination="X.X.X.X/32" gateway="X.X.X.X.1" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth0 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="1" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth1 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="2" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth2 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="3" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth3 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="4" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth4 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="5" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth5 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="6" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth6 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="7" \
>> op monitor interval="1s" timeout="20s"
>> primitive ip_voip_eth8 ocf:heartbeat:IPaddr2 \
>> params ip="X.X.X.X" cidr_netmask="24" nic="eth8" iflabel="1" \
>> op monitor interval="1s" timeout="20s"
>> primitive mysqld lsb:mysql \
>> op monitor interval="1s" timeout="15s" start-delay="10"
>> primitive tftp lsb:tftp-srce \
>> op start interval="0" timeout="20s" \
>> op stop interval="0" timeout="20s" \
>> op monitor interval="60s" timeout="10s" start-delay="10"
>> group ip_voip_addresses_p ip_voip_eth0 ip_voip_eth8 ip_voip_eth1
>> ip_voip_eth2 ip_voip_eth3 ip_voip_eth4 ip_voip_eth5 ip_voip_eth6 \
>> meta ordered="false" collocated="true" priority="8"
>> group ip_voip_routes ip_voip_route_test1 ip_voip_route_test2 \
>> meta ordered="false" collocated="true" priority="9"
>> group voip mysqld dahdi fonulator asterisk iax2_mon httpd tftp \
>> meta ordered="true" collocated="true" priority="10"
>> ms ms_drbd drbd \
>> meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Master"
>> clone cl_route ip_voip_route_default \
>> meta target-role="Started"
>> colocation fs_colocation inf: fs_drbd ms_drbd:Master
>> colocation ip_colocation inf: ip_voip_addresses_p fs_drbd
>> colocation ip_route_colocation inf: ip_voip_routes ip_voip_addresses_p
>> colocation voip_colocation inf: voip ip_voip_addresses_p
>> order fs_order inf: ms_drbd:promote fs_drbd:start
>> order ip_order inf: fs_drbd:start ip_voip_addresses_p:start
>> order ip_route_order inf: ip_voip_addresses_p:start ip_voip_routes:start
>> order voip_order inf: ip_voip_routes:start voip:start
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>> cluster-infrastructure="openais" \
>> stonith-enabled="false" \
>> expected-quorum-votes="2" \
>> last-lrm-refresh="1375355273" \
>> no-quorum-policy="ignore" \
>> symmetric-cluster="true"
>>
>>
>> And here is the state of the cluster after node fails:
>>
>> ============
>> Last updated: Thu Aug 1 13:26:41 2013
>> Last change: Thu Aug 1 13:07:53 2013
>> Stack: openais
>> Current DC: asttest1 - partition with quorum
>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> 4 Nodes configured, 2 expected votes
>> 24 Resources configured.
>> ============
>>
>> Online: [ asttest1 asttest2 ]
>> OFFLINE: [ asttest1 asttest2 ]
>>
>> Resource Group: voip
>> mysqld (lsb:mysql): Started asttest1
>> dahdi (lsb:dahdi): Started asttest1
>> fonulator (lsb:fonulator): Stopped
>> asterisk (lsb:asterisk-11.0.1): Stopped
>> iax2_mon (lsb:iax2_mon): Stopped
>> httpd (lsb:apache2): Stopped
>> tftp (lsb:tftp-srce): Stopped
>> Resource Group: ip_voip_routes
>> ip_voip_route_test1 (ocf::heartbeat:Route): Started asttest1
>> ip_voip_route_test2 (ocf::heartbeat:Route): Started asttest1
>> Resource Group: ip_voip_addresses_p
>> ip_voip_eth0 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth8 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth1 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth2 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth3 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth4 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth5 (ocf::heartbeat:IPaddr2): Started asttest1
>> ip_voip_eth6 (ocf::heartbeat:IPaddr2): Started asttest1
>> Clone Set: cl_route [ip_voip_route_default]
>> Started: [ asttest2 asttest1 ]
>> Stopped: [ ip_voip_route_default:2 ip_voip_route_default:3 ]
>> fs_drbd (ocf::heartbeat:Filesystem): Started asttest1
>> Master/Slave Set: ms_drbd [drbd]
>> Masters: [ asttest1 ]
>> Slaves: [ asttest2 ]
>>
>> Failed actions:
>> fonulator_monitor_1000 (node=asttest1, call=85, rc=7,
>> status=complete): not running
>>
>>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cibadmin_Ql
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130802/be735f8f/attachment-0004.ksh>
More information about the Pacemaker
mailing list