[Pacemaker] detecting resource failures after maintenance
Jeffrey Lewis
jlewis at 42lines.net
Fri May 10 15:53:00 UTC 2013
It seems pacemaker is not properly detecting resource failures after
maintenance. Example follows.
Pacemaker is managing two IPaddr2 resources. Both resources are
online, and all is well.
jlewis at qa3db22:~$ sudo crm resource show
shard0_ip (ocf::heartbeat:IPaddr2) Started
shard1_ip (ocf::heartbeat:IPaddr2) Started
I decide to do some maintenance and set is-managed-default=false.
This way, pacemaker will continue monitoring all resources, but will
not take action should a resource fail.
jlewis at qa3db22:~$ sudo crm configure property is-managed-default=false
jlewis at qa3db23:~$ sudo crm resource show
shard0_ip (ocf::heartbeat:IPaddr2) Started (unmanaged)
shard1_ip (ocf::heartbeat:IPaddr2) Started (unmanaged)
I then take resource 'shard1_ip' offline using ifconfig. Pacemaker
correctly notices that this resource has failed.
jlewis at qa3db23:~$ sudo ifconfig eth0:shard1 down
jlewis at qa3db23:~$ sudo crm resource show
shard0_ip (ocf::heartbeat:IPaddr2) Started (unmanaged)
shard1_ip (ocf::heartbeat:IPaddr2) Started (unmanaged) FAILED
However, when I set is-managed-default=true, pacemaker incorrectly
think resource 'shard1_ip' is ok, but the IP address is still down.
jlewis at qa3db23:~$ sudo crm configure property is-managed-default=true
jlewis at qa3db23:~$ sudo crm resource show
shard0_ip (ocf::heartbeat:IPaddr2) Started
shard1_ip (ocf::heartbeat:IPaddr2) Started
I don't necessarily expect pacemaker to start this IP, since it was
stopped when pacemaker was not managing this resource, but I do expect
pacemaker to correctly report current status.
Any hints?
Thanks,
Jeffrey
More information about the Pacemaker
mailing list