[Pacemaker] Resource group restarts every 15 minutes

Sun Nov 20 20:16:49 EST 2011

On Mon, Nov 21, 2011 at 10:48 AM, Andreas Kurz <andreas at hastexo.com> wrote:
> On 11/21/2011 12:06 AM, Charles Ulrich wrote:
>> Hello,
>>
>> First off, I'm brand new to pacemaker and all of its tools. I'm trying
>> to come up to speed as quickly as I can, but understand that my
>> knowledge is probably lacking in some key areas. As Murphy would have
>> it, I've come across a problem that Google has not been able to help
>> me with.
>>
>> Here's the setup: Two machines. eldon and elisa with heartbeat and
>> drbd configured. eldon is running a resource group called "www", which
>> contains apache, an IP address, and /dev/www mounted from a drbd
>> device. (There's a "mysql" resource group on elisa, but that appears
>> to be functioning normally for now.)
>>
>> Here's the problem: The www resource group on eldon keeps getting
>> restarted every 16 minutes. (Up for 15, down for 1.) Based on the logs
>> on elisa, I believe this is happening whenever the
>> cluster-recheck-interval is hit, which defaults to 15 minutes. I
>> believe that Pacemaker thinks the configuration (or something) in the
>> resource group changed and initiates a restart at every recheck
>> interval. These are the log messages from elisa that lead me down this
>> line of reasoning:
>>
>> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters:
>> Forcing restart of www on eldon, type changed: Filesystem -> <null>
>> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters:
>> Forcing restart of www on eldon, class changed: ocf -> <null>
>> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters:
>> Forcing restart of www on eldon, provider changed: heartbeat -> <null>
>>
>> What might be causing this? I've included all of the relevant
>> information that I can think of below. If there's anything else I can
>> provide that would help, let me know. If it's an RTFM thing, I'd be
>> grateful if you could also point me towards the right FM to R.
>
> Yes, the 15min are due to cluster-recheck-interval. I only saw a similar
> behavior when changing the provider of a resource that was already
> running and I saw it restarting on every monitor event ... btw. maybe
> you also want to enable monitoring for all your resources?
>
> Only solution I found was to restart Pacemaker to start with clean
> status section.
>
> Don't know how you ran into this problem ... how you created this www
> group or if you did anything "unusual" to the fs_www resource ... did
> you rename resources?
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>>
>> node eldon \
>>         attributes standby="off"
>> node elisa \
>>         attributes standby="off"
>> primitive apache lsb:apache2
>> primitive drbd_mysql ocf:linbit:drbd \
>>         params drbd_resource="mysql" \
>>         op monitor interval="15s" \
>>         op start interval="0" timeout="240" \
>>         op stop interval="0" timeout="100"
>> primitive drbd_www ocf:linbit:drbd \
>>         params drbd_resource="www" \
>>         op monitor interval="15s" \
>>         op start interval="0" timeout="240" \
>>         op stop interval="0" timeout="100"
>> primitive fs_mysql ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd/by-res/mysql"
>> directory="/var/lib/mysql" fstype="ext4"
>> options="noatime,nodev,nosuid,noexec" \
>>         op start interval="0" timeout="60" \
>>         op stop interval="0" timeout="60"
>> primitive fs_www ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd/by-res/www" directory="/var/www"
>> fstype="ext4" options="noatime,nodev,nosuid" \
>>         op start interval="0" timeout="60" \
>>         op stop interval="0" timeout="60"
>> primitive ip_mysql ocf:heartbeat:IPaddr2 \
>>         params ip="10.0.2.10"
>> primitive ip_www ocf:heartbeat:IPaddr2 \
>>         params ip="207.179.127.50"
>> primitive mysqld lsb:mysql
>> group mysql fs_mysql ip_mysql mysqld \
>>         meta target-role="Started" is-managed="true"
>> group www fs_www ip_www apache \
>>         meta target-role="Started" is-managed="true"
>> ms ms_drbd_mysql drbd_mysql \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Started"
>> ms ms_drbd_www drbd_www \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Started"
>> location loc_mysql mysql 200: elisa
>> location loc_www www 200: eldon
>> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
>> colocation www_on_drbd inf: www ms_drbd_www:Master
>> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
>> order www_after_drbd inf: ms_drbd_www:promote www:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         no-quorum-policy="ignore" \
>>         stonith-enabled="false"
>>
>>
>> crm(live)# status
>> ============
>> Last updated: Sat Nov 19 13:34:25 2011
>> Stack: openais
>> Current DC: elisa - partition with quorum
>> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
>> 2 Nodes configured, 2 expected votes
>> 4 Resources configured.
>> ============
>>
>> Online: [ eldon elisa ]
>>
>>  Resource Group: mysql
>>      fs_mysql (ocf::heartbeat:Filesystem):    Started elisa
>>      ip_mysql (ocf::heartbeat:IPaddr2):       Started elisa
>>      mysqld   (lsb:mysql):    Started elisa
>>  Master/Slave Set: ms_drbd_mysql
>>      Masters: [ elisa ]
>>      Slaves: [ eldon ]
>>  Master/Slave Set: ms_drbd_www
>>      Masters: [ eldon ]
>>      Slaves: [ elisa ]
>>  Resource Group: www
>>      fs_www   (ocf::heartbeat:Filesystem):    Started eldon
>>      ip_www   (ocf::heartbeat:IPaddr2):       Started eldon
>>      apache   (lsb:apache2):  Started eldon
>>
>> Failed actions:
>>     drbd_mysql_monitor_0 (node=elisa, call=2, rc=6, status=complete):
>> not configured
>>     drbd_mysql_monitor_0 (node=eldon, call=2, rc=6, status=complete):
>> not configured
>>     fs_mysql_start_0 (node=eldon, call=8, rc=5, status=complete): not installed
>>
>> I've also uploaded the syslogs of the restart event here (they're
>> rather large and I don't wish to spam the mailing list further than
>> necessary):
>>
>>   eldon: http://pastebin.com/raw.php?i=p6Kmct9f
>>   elisa: http://pastebin.com/raw.php?i=mwddDxKi

Could you use hb_report to create a report and file a bug for it please?
http://bugs.clusterlabs.org