[Pacemaker] crm_mon shows nothing about stonith 'reset' failure
Andrew Beekhof
beekhof at gmail.com
Tue Sep 16 08:05:59 UTC 2008
On Tue, Sep 16, 2008 at 03:11, Takenaka Kazuhiro
<takenaka.kazuhiro at oss.ntt.co.jp> wrote:
> Hi All,
>
> I ran a test to see what would happen when stonith 'reset' failed.
> Before the test, I thought 'crm_mon' should show something about the
> failure.
Nope.
This is not stored anywhere since there is nowhere it can be
reconstructed from (like the lrmd for resource operations) when
rebuilding the status section.
And if your stonith resources are failing, a) you have bigger
problems, and b) you'll get nice big ERROR messages in the logs.
> But 'crm_mon' didn't show anything.
>
> What I did is the following.
>
> 1. I started the stonith-enabled two nodes cluster. The names of
> the nodes were 'node01' and 'node02'. See configuration files
> in attached 'hb_reports.tgz' for more details.
>
> I made a few modifications to 'ssh' for the test and renamed it
> to 'sshTEST'. I also attached 'sshTEST'. The diferences are
> written in it.
>
> 2. I performed the following command.
>
> # iptables -A INPUT -i eth3 -p tcp --dport 22 -j REJECT
>
> 'eth3' is connected to the network for 'sshTEST'.
>
> 3. I deleted the state file of 'dummy' at 'node01'.
>
> # rm -f /var/run/heartbeat/rsctmp/Dummy-dummy.state
>
> Soon the failure of 'dummy' was logged into /var/log/ha-log
> and 'crm_mon' also displayed it.
>
> After a while the failure of 'reset' performed by 'sshTEST'
> also logged, but 'crm_mon' didn't display it.
>
> Did I make any misconfigurations or any misoperations that
> made 'crm_mon' work incorrectly.
>
> Or 'crm_mon' really don't show anything about stonith 'reset'
> failure ?
>
> I used Heartbeat(e8154a602bf4) + Pacemaker(d4a14f276c28)
> for this test.
>
> Best regard.
> --
> Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp>
>
>
>
> #!/bin/sh
>
> # 'sshTEST' is the almost same as 'ssh'.
> # The diferences are :
> # * 'sshTEST' logs its arguments and exit code on each execution.
> # * 'sshTEST' qualifies the target nodename to reset using 'extension'
> # parameter given in cib.xml.
>
>
> #
> # External STONITH module for ssh.
> #
> # Copyright (c) 2004 SUSE LINUX AG - Lars Marowsky-Bree <lmb at suse.de>
> #
> # This program is free software; you can redistribute it and/or modify
> # it under the terms of version 2 of the GNU General Public License as
> # published by the Free Software Foundation.
> #
> # This program is distributed in the hope that it would be useful, but
> # WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> #
> # Further, this software is distributed without any warranty that it is
> # free of the rightful claim of any third person regarding infringement
> # or the like. Any license provided herein, whether implied or
> # otherwise, applies only to this software file. Patent licenses, if
> # any, provided herein do not apply to combinations of this program with
> # other software, or any other product whatsoever.
> #
> # You should have received a copy of the GNU General Public License
> # along with this program; if not, write the Free Software Foundation,
> # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
> #
>
> SSH_COMMAND="/usr/bin/ssh -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n -l root"
> #SSH_COMMAND="/usr/bin/ssh -q -x -n -l root"
>
> REBOOT_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
>
> # Warning: If you select this poweroff command, it'll physically
> # power-off the machine, and quite a number of systems won't be remotely
> # revivable.
> # TODO: Probably should touch a file on the server instead to just
> # prevent heartbeat et al from being started after the reboot.
> # POWEROFF_COMMAND="echo 'sleep 2; /sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
> POWEROFF_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
>
> # Rewrite the hostlist to accept "," as a delimeter for hostnames too.
> hostlist=`echo $hostlist | tr ',' ' '`
>
> is_host_up() {
> for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> do
> if
> ping -w1 -c1 "$1" >/dev/null 2>&1
> then
> sleep 1
> else
> return 1
> fi
> done
> return 0
> }
>
> savelog() { echo $(date '+%Y%m%d-%H%M%S') "$@" >> /var/log/ext-ssh.log;}
> EXIT() { savelog EXIT $subcmd "$@"; exit "$@";}
>
> savelog "ARGS" "$@"
> subcmd=$1
>
> case $1 in
> gethosts)
> for h in $hostlist ; do
> echo $h
> done
> EXIT 0
> ;;
> on)
> # Can't really be implemented because ssh cannot power on a system
> # when it is powered off.
> EXIT 1
> ;;
> off)
> # Shouldn't really be implemented because if ssh cannot power on a
> # system, it shouldn't be allowed to power it off.
> EXIT 1
> ;;
> reset)
> for h in $hostlist
> do
> if
> [ "$h" != "$2" ]
> then
> continue
> fi
> if
> case ${livedangerously} in
> [Yy]*) is_host_up $h;;
> *) true;;
> esac
> then
> $SSH_COMMAND "$2$extension" "$REBOOT_COMMAND"
> # Good thing this is only for testing...
> if
> is_host_up $h
> then
> EXIT 1
> else
> EXIT 0
> fi
> else
> # well... Let's call it successful, after all this is only for testing...
> EXIT 0
> fi
> done
> EXIT 1
> ;;
> status)
> if
> [ -z "$hostlist" ]
> then
> EXIT 1
> fi
> for h in $hostlist
> do
> if
> ping -w1 -c1 "$h" 2>&1 | grep "unknown host"
> then
> EXIT 1
> fi
> done
> EXIT 0
> ;;
> getconfignames)
> echo "hostlist"
> EXIT 0
> ;;
> getinfo-devid)
> echo "ssh STONITH device"
> EXIT 0
> ;;
> getinfo-devname)
> echo "ssh STONITH external device"
> EXIT 0
> ;;
> getinfo-devdescr)
> echo "ssh-based Linux host reset"
> echo "Fine for testing, but not suitable for production!"
> EXIT 0
> ;;
> getinfo-devurl)
> echo "http://openssh.org"
> EXIT 0
> ;;
> getinfo-xml)
> cat << SSHXML
> <parameters>
> <parameter name="hostlist" unique="1" required="1">
> <content type="string" />
> <shortdesc lang="en">
> Hostlist
> </shortdesc>
> <longdesc lang="en">
> The list of hosts that the STONITH device controls
> </longdesc>
> </parameter>
>
> <parameter name="livedangerously" unique="0" required="0">
> <content type="enum" />
> <shortdesc lang="en">
> Live Dangerously!!
> </shortdesc>
> <longdesc lang="en">
> Set to "yes" if you want to risk your system's integrity.
> Of course, since this plugin isn't for production, using it
> in production at all is a bad idea. On the other hand,
> setting this parameter to yes makes it an even worse idea.
> Viva la Vida Loca!
> </longdesc>
> </parameter>
>
> </parameters>
> SSHXML
> EXIT 0
> ;;
> *)
> EXIT 1
> ;;
> esac
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
More information about the Pacemaker
mailing list