[Pacemaker] monitor on disabled nodes
Lars Marowsky-Bree
lmb at suse.com
Thu Sep 19 07:17:43 UTC 2013
On 2013-09-18T12:20:08, Radoslaw Garbacz <radoslaw.garbacz at xtremedatainc.com> wrote:
> Sorry for not being specific.
>
> The agent is meant to run only on a specific node (the head), and by
> constraints is disabled on all other nodes.
>
> 'pcs constraint' reports:
> Location Constraints:
> Resource: dbx_nfs_head
> Enabled on: ip-10-138-14-225
> Disabled on: ip-10-151-14-34 ip-10-238-146-54
Ah, I wasn't aware that pcs had introduced the enabled/disabled
terminology for location constraints. That may be misleading, because
location constraints don't actually "disable" an agent from running
somewhere - but ban the resource from being hosted.
That means the cluster will still probe for it (the "monitor" interval=0
call you see) to make sure that, if it is found active, it can stop it
to bring the system into compliance with the configuration.
> 'pcs status' reports:
> Failed actions:
> dbx_nfs_head_monitor_0 (node=ip-10-238-146-54, call=1127, rc=6,
> status=complete): not configured
> dbx_nfs_head_monitor_0 (node=ip-10-151-14-34, call=1127, rc=6,
> status=complete): not configured
Probably returning "not configured" is wrong here.
It should check if the service is active and then either return
OCF_SUCCESS if it's healthy or OCF_ERR_GENERIC if it is in a failed
state; or OCF_NOT_RUNNING if the agent can verify that it is indeed
cleanly stopped.
If it's not found (and thus can't be running), for a probe
OCF_ERR_INSTALLED is more appropriate - that means an issue with the
local node (such as binaries gone etc).
OCF_ERR_CONFIGURED means "the cluster definition of this service is
wrong" and implies the service can't be started anywhere.
Please refer to the
http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
> keep it away from all other nodes, but as far as I understand, the
> pacemaker needs to check if it is running, so I would like to recognize
> this situation and skip the check on all nodes except the head.
No. Just skipping it is mostly likely wrong as well; you just need to
return the correct state.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list