[ClusterLabs] Antw: Re: crmsh configure delete for constraints
Ferenc Wágner
wferi at niif.hu
Wed Feb 10 10:56:53 UTC 2016
Vladislav Bogdanov <bubble at hoster-ok.com> writes:
> If pacemaker has got an error on start, it will run stop with the same
> set of parameters anyways. And will get error again if that one was
> from validation and RA does not differentiate validation for start and
> stop. And then circular fencing over the whole cluster is triggered
> for no reason.
>
> Of course, for safety, RA could save its state if start was successful
> and skip validation on stop only if that state is not found. Otherwise
> removed binary or config file would result in resource running on
> several nodes.
What would happen if we made the start operation return OCF_NOT_RUNNING
if validation fails? Or more broadly: if the start operation knows that
the resource is not running, thus a stop opration would do no good.
>From Pacemaker Explained B.4: "The cluster will not attempt to stop a
resource that returns this for any action." The probes could still
return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
failure could still lead to fencing, protecting data integrity, but
circular fencing would not happen. I hope.
By the way, what are the reasons to run stop after a failed start? To
clean up halfway-started resources? Besides OCF_ERR_GENERIC, the other
error codes pretty much guarrantee that the resource can not be active.
--
Regards,
Feri.
More information about the Users
mailing list