[Pacemaker] some questions about STONITH

Lars Marowsky-Bree lmb at suse.com
Tue Nov 19 13:19:49 EST 2013


On 2013-11-19T22:10:29, Andrey Groshev <greenx at yandex.ru> wrote:

First, like digimer wrote, clearly stonith-by-ssh is useless for
production since you can't fence nodes that are having problems. But for
testing, it's worth a try.

Note that cluster-glue actually does include an external/ssh script.
You're reinventing the wheel ;-)

> Make next test:
> #stonith_admin --reboot=dev-cluster2-node2
> Node reboot, but resource don't start.
> In crm_mon status - Node dev-cluster2-node2 (172793105): pending.
> And it will be hung.

That is *probably* a race - the node reboots too fast, or still
communicates for a bit after the fence has supposedly completed (if it's
not a reboot -nf, but a mere reboot). We have had problems here in the
past.

You may want to file a proper bug report with crm_report included, and
preferably corosync/pacemaker debugging enabled.

> 2. 
> There is a slight discrepancy in the Pacemaker Expl. and stonith_admin --help.
> stonith_admin --reboot nodename. 
> In one case, the sign of equality is, in other - no.
> Not very important, because operate both.

Yeah, like you said, both work. So it's not actually a problem.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list