[Pacemaker] Resource group restarts every 15 minutes

Sun Nov 20 18:48:56 EST 2011

On 11/21/2011 12:06 AM, Charles Ulrich wrote:
> Hello,
> 
> First off, I'm brand new to pacemaker and all of its tools. I'm trying
> to come up to speed as quickly as I can, but understand that my
> knowledge is probably lacking in some key areas. As Murphy would have
> it, I've come across a problem that Google has not been able to help
> me with.
> 
> Here's the setup: Two machines. eldon and elisa with heartbeat and
> drbd configured. eldon is running a resource group called "www", which
> contains apache, an IP address, and /dev/www mounted from a drbd
> device. (There's a "mysql" resource group on elisa, but that appears
> to be functioning normally for now.)
> 
> Here's the problem: The www resource group on eldon keeps getting
> restarted every 16 minutes. (Up for 15, down for 1.) Based on the logs
> on elisa, I believe this is happening whenever the
> cluster-recheck-interval is hit, which defaults to 15 minutes. I
> believe that Pacemaker thinks the configuration (or something) in the
> resource group changed and initiates a restart at every recheck
> interval. These are the log messages from elisa that lead me down this
> line of reasoning:
> 
> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters:
> Forcing restart of www on eldon, type changed: Filesystem -> <null>
> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters:
> Forcing restart of www on eldon, class changed: ocf -> <null>
> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters:
> Forcing restart of www on eldon, provider changed: heartbeat -> <null>
> 
> What might be causing this? I've included all of the relevant
> information that I can think of below. If there's anything else I can
> provide that would help, let me know. If it's an RTFM thing, I'd be
> grateful if you could also point me towards the right FM to R.

Yes, the 15min are due to cluster-recheck-interval. I only saw a similar
behavior when changing the provider of a resource that was already
running and I saw it restarting on every monitor event ... btw. maybe
you also want to enable monitoring for all your resources?

Only solution I found was to restart Pacemaker to start with clean
status section.

Don't know how you ran into this problem ... how you created this www
group or if you did anything "unusual" to the fs_www resource ... did
you rename resources?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> node eldon \
>         attributes standby="off"
> node elisa \
>         attributes standby="off"
> primitive apache lsb:apache2
> primitive drbd_mysql ocf:linbit:drbd \
>         params drbd_resource="mysql" \
>         op monitor interval="15s" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="100"
> primitive drbd_www ocf:linbit:drbd \
>         params drbd_resource="www" \
>         op monitor interval="15s" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="100"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/mysql"
> directory="/var/lib/mysql" fstype="ext4"
> options="noatime,nodev,nosuid,noexec" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60"
> primitive fs_www ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/www" directory="/var/www"
> fstype="ext4" options="noatime,nodev,nosuid" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
>         params ip="10.0.2.10"
> primitive ip_www ocf:heartbeat:IPaddr2 \
>         params ip="207.179.127.50"
> primitive mysqld lsb:mysql
> group mysql fs_mysql ip_mysql mysqld \
>         meta target-role="Started" is-managed="true"
> group www fs_www ip_www apache \
>         meta target-role="Started" is-managed="true"
> ms ms_drbd_mysql drbd_mysql \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> ms ms_drbd_www drbd_www \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> location loc_mysql mysql 200: elisa
> location loc_www www 200: eldon
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> colocation www_on_drbd inf: www ms_drbd_www:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> order www_after_drbd inf: ms_drbd_www:promote www:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="false"
> 
> 
> crm(live)# status
> ============
> Last updated: Sat Nov 19 13:34:25 2011
> Stack: openais
> Current DC: elisa - partition with quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ eldon elisa ]
> 
>  Resource Group: mysql
>      fs_mysql	(ocf::heartbeat:Filesystem):	Started elisa
>      ip_mysql	(ocf::heartbeat:IPaddr2):	Started elisa
>      mysqld	(lsb:mysql):	Started elisa
>  Master/Slave Set: ms_drbd_mysql
>      Masters: [ elisa ]
>      Slaves: [ eldon ]
>  Master/Slave Set: ms_drbd_www
>      Masters: [ eldon ]
>      Slaves: [ elisa ]
>  Resource Group: www
>      fs_www	(ocf::heartbeat:Filesystem):	Started eldon
>      ip_www	(ocf::heartbeat:IPaddr2):	Started eldon
>      apache	(lsb:apache2):	Started eldon
> 
> Failed actions:
>     drbd_mysql_monitor_0 (node=elisa, call=2, rc=6, status=complete):
> not configured
>     drbd_mysql_monitor_0 (node=eldon, call=2, rc=6, status=complete):
> not configured
>     fs_mysql_start_0 (node=eldon, call=8, rc=5, status=complete): not installed
> 
> I've also uploaded the syslogs of the restart event here (they're
> rather large and I don't wish to spam the mailing list further than
> necessary):
> 
>   eldon: http://pastebin.com/raw.php?i=p6Kmct9f
>   elisa: http://pastebin.com/raw.php?i=mwddDxKi
> 
> Many thanks,
> Charles
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111121/576f8c7d/attachment-0003.sig>