[Pacemaker] Failed-over incomplete

Thu Dec 4 06:52:42 UTC 2014

Dear Andrei,

Since the failed over is uncompleted so all the resource isn't failed
over to another node.

I think this case happened because of the res.vBKN is go into unmanaged state.

But why? Since there is no configuration is changed.

--teenigma

On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
> <maillist.tk at gmail.com> wrote:
>> Dear List,
>>
>> We are using Pacemaker and Corosync with CMAN as our HA software as
>> below version.
>>
>>     OS:            CentOS release 6.5 (Final) 64-bit
>>     Pacemaker:        pacemaker.x86_64        1.1.10-14.el6_5.3
>>     Corosync:        corosync.x86_64        1.4.1-17.el6_5.1
>>     CMAN:            cman.x86_64            3.0.12.1-59.el6_5.2
>>     Resource-Agent:    resource-agents.x86_64    3.9.5-3.12
>>
>>     Topology:        2 Nodes with Active/Standby model. (MySQL is
>> Active/Active by clone)
>>
>> All packages are install from CentOS official repository, and the
>> Resource-Agent is only one which be installed from OpenSUSE repository
>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
>>
>> The system is work normally for few months until yesterday morning,
>> around 03:35 UTC+0700, we found that one of resource is go into
>> UNMANAGED state without any configuration changed. After another
>> resource is failed, the pacemaker try to failed-over resource to
>> another node but it incomplete after facing this resource.
>>
>> Configuration of some resource is below and the LOG during event is in
>> attached file.
>>
>
> The log just covers resource monitor failure and stopping of
> resources. It does not contain any event related to starting resources
> on another nodes.
>
> You would need to collect crm_report with start time before resource
> failed and stop time after resources were started on another node.
>
>> primitive res.vBKN6 IPv6addr \
>>         params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
>>         op monitor interval=10s
>>
>> primitive res.vDMZ6 IPv6addr \
>>         params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
>>         op monitor interval=10s
>>
>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
>>
>> rsc_defaults rsc_defaults-options: \
>>         migration-threshold=1
>>
>> Please help me to solve this problem.
>>
>> --teenigma
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org