[Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

Wed Jul 9 12:28:16 UTC 2014

On Tue, Jul 8, 2014, at 02:59, Andrew Beekhof wrote:
> 
> On 4 Jul 2014, at 3:16 pm, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
> 
> > Hi all,
> > I'm trying to create a script as per subject (on CentOS 6.5, CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS monitored by NUT).
> > 
> > Ideally I think that each node should stop (disable) all locally-running VirtualDomain resources (doing so cleanly demotes than downs the DRBD resources underneath), then put itself in standby and finally shutdown.
> 
> Since the end goal is shutdown, why not just run 'pcs cluster stop' ?

I thought that this action would cause communication interruption (since Corosync would be not responding to the peer) and so cause the other node to stonith us; I know that ideally the other node too should perform "pcs cluster stop" in short, since the same UPS powers both, but I worry about timing issues (and "races") in UPS monitoring since it is a large Enterprise UPS monitored by SNMP.

Furthermore I do not know what happens to running resources at "pcs cluster stop": I infer from your suggestion that resources are brought down and not migrated on the other node, correct?

> Possibly with 'pcs cluster standby' first if you're worried that stopping the resources might take too long.

I thought that "pcs cluster standby" would usually migrate the resources to the other node (I actually tried it and confirmed the expected behaviour); so this would risk to become a race with the timing of the other node standby, so this is why I took the hassle of explicitly and orderly stopping all locally-running resources in my script BEFORE putting the local node in standby.

> Pacemaker will stop everything in the required order and stop the node when done... problem solved?

I thought that after a "pcs cluster standby" a regular "shutdown -h" of the operating system would cleanly bring down the cluster too, without the need for a "pcs cluster stop", given that both Pacemaker and CMAN are correctly configured for automatic startup/shutdown as operating system services (SysV initscripts controlled by CentOS 6.5 Upstart, in my case).

Many thanks again for your always thought-provoking and informative answers!

Regards,
Giuseppe

> > 
> > On further startup, manual intervention would be required to unstandby all nodes and enable resources (nodes already in standby and resources already disabled before blackout should be manually distinguished).
> > 
> > Is this strategy conceptually safe?
> > 
> > Unfortunately, various searches have turned out no "prior art" :)
> > 
> > This is my tentative script (consider it in the public domain):
> > 
> > ------------------------------------------------------------------------------------------------------------------------------------
> > #!/bin/bash
> > 
> > # Note: "pcs cluster status" still has a small bug vs. CMAN-controlled Corosync and would always return != 0
> > pcs status > /dev/null 2>&1
> > STATUS=$?
> > 
> > # Detect if cluster is running at all on local node
> > # TODO: detect node already in standby and bypass this
> > if [ "${STATUS}" = 0 ]; then
> >     local_node="$(cman_tool status | grep -i 'Node[[:space:]]*name:' | sed -e 's/^.*Node\s*name:\s*\([^[:space:]]*\).*$/\1/i')"
> >     for local_resource in $(pcs status 2>/dev/null | grep "ocf::heartbeat:VirtualDomain.*${local_node}\\s*\$" | awk '{print $1}'); do
> >         pcs resource disable "${local_resource}"
> >     done
> >     # TODO: each resource disabling above may return without waiting for complete stop - wait here for "no more resources active"? (but avoid endless loops)
> >     pcs cluster standby "${local_node}"
> > fi
> > 
> > # Shut down gracefully anyway at the end
> > /sbin/shutdown -h +0
> > 
> > ------------------------------------------------------------------------------------------------------------------------------------
> > 
> > Comments/suggestions/improvements are more than welcome.
> > 
> > Many thanks in advance.
> > 
> > Regards,
> > Giuseppe
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)
-- 
  Giuseppe Ragusa
  giuseppe.ragusa at fastmail.fm