[Pacemaker] resource stickiness and preventing stonith on failback
Andrew Beekhof
andrew at beekhof.net
Tue Sep 20 03:02:26 UTC 2011
On Wed, Aug 24, 2011 at 6:56 AM, Brian J. Murrell <brian at interlinx.bc.ca> wrote:
> Hi All,
>
> I am trying to configure pacemaker (1.0.10) to make a single filesystem
> highly available by two nodes (please don't be distracted by the dangers
> of multiply mounted filesystems and clustering filesystems, etc., as I
> am absolutely clear about that -- consider that I am using a filesystem
> resource as just an example if you wish). Here is my filesystem
> resource description:
>
> node foo1
> node foo2 \
> attributes standby="off"
> primitive OST1 ocf:heartbeat:Filesystem \
> meta target-role="Started" \
> operations $id="BAR1-operations" \
> op monitor interval="120" timeout="60" \
> op start interval="0" timeout="300" \
> op stop interval="0" timeout="300" \
> params device="/dev/disk/by-uuid/8c500092-5de6-43d7-b59a-ef91fa9667b9"
> directory="/mnt/bar1" fstype="ext3"
> primitive st-pm stonith:external/powerman \
> params serverhost="192.168.122.1:10101" poweroff="0"
> clone fencing st-pm
> property $id="cib-bootstrap-options" \
> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="1" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1306783242" \
> default-resource-stickiness="1000"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
> The two problems I have run into are:
>
> 1. preventing the resource from failing back to the node it was
> previously on after it has failed over and the previous node has
> been restored. Basically what's documented at
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html
>
> 2. preventing the active node from being STONITHed when the resource
> is moved back to it's failed-and-restored node after a failover.
> IOW: BAR1 is available on foo1, which fails and the resource is moved
> to foo2. foo1 returns and the resource is failed back to foo1, but
> in doing that foo2 is STONITHed.
>
> For #1, as you can see, I tried setting the default resource stickiness
> to 100. That didn't seem to work. When I stopped corosync on the
> active node, the service failed over but it promptly failed back when I
> started corosync again, contrary to the example on the referenced URL.
>
> Subsequently I (think I) tried adding the specific resource stickiness
> of 1000. That didn't seem to help either.
>
> As for #2, the issue with STONITHing foo2 when failing back to foo1 is
> that foo1 and foo2 are an active/active pair of servers. STONITHing
> foo2 just to restore foo1's services puts foo2's services out of service,
>
> I do want a node that is believed to be dead to be STONITHed before it's
> resource(s) are failed over though.
Thats a great way to ensure your data gets trashed.
If the "node that is believed to be dead" isn't /actually/ dead,
you'll have two nodes running the same resources and writing to the
same files.
>
> Any hints on what I am doing wrong?
>
> Thanx and cheers,
> b.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
More information about the Pacemaker
mailing list