[Pacemaker] Cannot fail-over Master/Slave resource collocated with ping resource at the HDD crash
Andrew Beekhof
andrew at beekhof.net
Mon Feb 23 02:17:12 UTC 2015
> On 16 Feb 2015, at 8:15 pm, NAKAHIRA Kazutomo <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>
> Hi all,
>
> I encountered trouble that Master/Slave resource collocated
> with ping resource can not fail-over at the HDD crash.
>
> After HDD crash, stop operation of the ping resource is looping
> and notify operation of the Master/Slave resource too.
I'm guessing there is no fencing installed?
>
> I configured "op stop on-fail=ignore" to ping resource, but
> ping resource return OCF_ERR_INSTALLED(5) and it is not ignored.
>
> How do I configure resource to ignore operation error
> even if OCF_ERR_INSTALLED is returned?
> (or fence only OCF_ERR_INSTALLED is returned)
>
> Of course, on-fail=fence make it possible to fail-over.
> But, I do not want fence when OCF_ERR_GENERIC(1) is
> returned in the stop operation of the ping resource.
>
>
> By the way, this feature was introduced at the Pacemaker-1.1.11.
>
> https://github.com/ClusterLabs/pacemaker/commit/767213e4e47e122d3ae89c06bc7b5b670aa26f4d
>
> It seems that retrying operation have no meaning in this case
> because it fail for all time.
Can we get a crm_report for this scenario?
>
> If we assume the agent isn't present(rc-code="5" op-status="5")
> as a hard error, then fence is better way than retry.
>
> Best regards,
> Kazutomo NAKAHIRA
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list