[Pacemaker] 2-node cluster doesn't move resources away from a failed node
Andreas Kurz
andreas at hastexo.com
Sun Jul 8 00:12:40 CEST 2012
On 07/05/2012 04:12 PM, David Guyot wrote:
> Hello, everybody.
>
> As the title suggests, I'm configuring a 2-node cluster but I've got a
> strange issue here : when I put a node in standby mode, using "crm node
> standby", its resources are correctly moved to the second node, and stay
> there even if the first is back on-line, which I assume is the preferred
> behavior (preferred by the designers of such systems) to avoid having
> resources on a potentially unstable node. Nevertheless, when I simulate
> failure of the node which run resources by "/etc/init.d/corosync stop",
> the other node correctly fence the failed node by electrically resetting
> it, but it doesn't mean that it will mount resources on himself; rather,
> it waits the failed node to be back on-line, and then re-negotiates
> resource placement, which inevitably leads to the failed node restarting
> the resources, which I suppose is a consequence of the resource
> stickiness still recorded by the intact node : because this node still
> assume that resources are running on the failed node, it assumes that
> resources prefer to stay on the first node, even if it has failed.
>
> When the first node, Vindemiatrix, has shuts down Corosync, the second,
> Malastare, reports this :
>
> root at Malastare:/home/david# crm_mon --one-shot -VrA
> ============
> Last updated: Thu Jul 5 15:27:01 2012
> Last change: Thu Jul 5 15:26:37 2012 via cibadmin on Malastare
> Stack: openais
> Current DC: Malastare - partition WITHOUT quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 17 Resources configured.
> ============
>
> Node Vindemiatrix: UNCLEAN (offline)
Pacemaker thinks fencing was not successful and will not recover
resources until STONITH was successful ... or the node returns an it is
possible to probe resource states
> Online: [ Malastare ]
>
> Full list of resources:
>
> soapi-fencing-malastare (stonith:external/ovh): Started Vindemiatrix
> soapi-fencing-vindemiatrix (stonith:external/ovh): Started Malastare
> Master/Slave Set: ms_drbd_svn [drbd_svn]
> Masters: [ Vindemiatrix ]
> Slaves: [ Malastare ]
> Master/Slave Set: ms_drbd_pgsql [drbd_pgsql]
> Masters: [ Vindemiatrix ]
> Slaves: [ Malastare ]
> Master/Slave Set: ms_drbd_backupvi [drbd_backupvi]
> Masters: [ Vindemiatrix ]
> Slaves: [ Malastare ]
> Master/Slave Set: ms_drbd_www [drbd_www]
> Masters: [ Vindemiatrix ]
> Slaves: [ Malastare ]
> fs_www (ocf::heartbeat:Filesystem): Started Vindemiatrix
> fs_pgsql (ocf::heartbeat:Filesystem): Started Vindemiatrix
> fs_svn (ocf::heartbeat:Filesystem): Started Vindemiatrix
> fs_backupvi (ocf::heartbeat:Filesystem): Started Vindemiatrix
> VirtualIP (ocf::heartbeat:IPaddr2): Started Vindemiatrix
> OVHvIP (ocf::pacemaker:OVHvIP): Started Vindemiatrix
> ProFTPd (ocf::heartbeat:proftpd): Started Vindemiatrix
>
> Node Attributes:
> * Node Malastare:
> + master-drbd_backupvi:0 : 10000
> + master-drbd_pgsql:0 : 10000
> + master-drbd_svn:0 : 10000
> + master-drbd_www:0 : 10000
>
> As you can see, the node failure is detected. This state leads to
> attached log file.
>
> Note that both ocf::pacemaker:OVHvIP and stonith:external/ovh are custom
> resources which uses my server provider's SOAP API to provide intended
> services. The STONITH agent does nothing but returning exit status 0
> when start, stop, on or off actions are required, but returns the 2
> nodes names when hostlist or gethosts actions are required and, when
> reset action is required, effectively resets faulting node using the
> provider API. As this API doesn't provide reliable mean to know the
> exact moment of resetting, the STONITH agent pings the faulting node
> every 5 seconds until ping fails, then forks a process which pings the
> faulting node every 5 seconds until it answers, then, due to external
> VPN being not yet installed by the provider, I'm forced to emulate it
> with OpenVPN (which seems to be unable to re-establish a connection lost
> minutes ago, leading to a dual brain situation), the STONITH agent
> restarts OpenVPN to re-establish the connection, then restarts Corosync
> and Pacemaker.
>
> Aside from the VPN issue, of which I'm fully aware of performance and
> stability issues, I thought that Pacemaker would, as soon as the STONITH
> agent returns exit status 0, start the resources on the remaining node,
> but it doesn't. Instead, it seems that the STONITH reset action waits
> too long to report a successful reset, delay which reaches some internal
> timeout, which in turn leads Pacemaker to assume that STONITH agent
> failed, therefore, while eternally trying to reset the node (which only
> leads to the API issuing an error because the last reset request was
> less than 5 minutes ago, something forbidden) stopping actions without
> restarting resources on the remaining node. I tried to search the
> Internet to this parameter, but the only related thing I found is this
> page
> http://lists.linux-ha.org/pipermail/linux-ha/2010-March/039761.html, a
> Linux-HA mailing list archive, which mentions a stonith-timeout
> property, but I've parsed Pacemaker documentation without finding any
> occurrence, and I got an error when I tried to get its value :
man stonithd
>
> root at Vindemiatrix:/home/david# crm_attribute --name stonith-timeout --query
> scope=crm_config name=stonith-timeout value=(null)
> Error performing operation: The object/attribute does not exist
stonith-timeout defaults to 60s ... crm configure property
stonith-timeout=XY .... to increase it cluster-wide... or you can add an
individual value as resource attribute to your stonith resources.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> So what did I miss? Do I must use this property which is not documented
> nor present in the documentation? Or rewrite my STONITH agent to return
> exit status 0 as soon as the API correctly considered the reset request
> (contrary to what Linux-HA http://linux-ha.org/wiki/STONITH precise to
> be necessary)? Or is there something else I missed?
>
> Thank you now for having read this whole mail, and in advance for your help.
>
> Kind regards.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120708/6eb97257/attachment.sig>
More information about the Pacemaker
mailing list