[Pacemaker] Query regarding configuring STONITH device

sachin garg sachingarg2k1 at gmail.com
Mon Jul 2 13:12:54 EDT 2012


Hi,


On Mon, Jul 02, 2012 at 05:49:38PM +0530, sachin garg wrote:
> Hi,
>
> I am using IPMI plugin for configuring STONITH with heartbeat cluster.
> If a resource fails on one node then the other node STONITHs that node.
But
> when the failed node comes back after the reboot, the STONITH device
itself
> fails on the node which has started again. Logs indicate that IPMI start
> operation returned 1 (i.e. unknown error).

>> Isn't there more in the logs, i.e. a specific reason?
No just a one liner is present in traces. I went through IPMI script to
understand that in what scenario it may return 1. There is juts one flow
(see below), which indicates that execution of IPMI tool fails at start.
But this doesn't happen If I start heartbeat manually and only happens upon
reboot (I have a strict requirement to start heartbeat stack upon restart)

# Yet another convenience wrapper that invokes run_ipmitool, captures
# its output, logs the output, returns either 0 (on success) or 1 (on
# any error)
do_ipmi() {
    if outp=`run_ipmitool $*`; then
        ha_log.sh debug "ipmitool output: `echo $outp`"
        return 0
    else
        ha_log.sh err "error executing ipmitool: `echo $outp`"
        return 1
    fi

}


> I suspect that this may be due
> to some initialization delays at network level. But I am not sure about
> this. What could be the best way to overcome this issue? I consider adding
> a start delay to stonith device but can't say if that is the right
> approach.

>>Happens only once after boot? Afterwards works fine? Strange.
>>Well, it's arguably good practice not to start the cluster stack
>>automatically on boot.
I have a strict requirement to start heartbeat stack upon restart. Will
adding a start delay help; although I have reasons to believe that it
doesn't help.


> Moreover, how should one configure start/monitor operation failure for a
> STONITH device? I have currently configured pacemaker to fence the node if
> start/monitor operation fails for STONITH device. Is this the right
> configuration?

>> No. Nothing special needs to be configured.
Let me rephrase my question: All my resources have been configured for
fencing upon monitor failure. So, should I configure fencing or restart for
STONITH device. Since fencing action is taken out by STONITH device itself,
thats why this question. Moreover, If I configure "fence" for stonith
device start failure, I get one extra reboot but eventually the system
recovers and there are no more failures.


> And what should be the monitoring frequency for STONITH device?
>>Take a look here http://clusterlabs.org/doc/crm_fencing.html
Thanks for directing to the article. The article says that monitoring must
happen only 2-3 times per hour. But if I have got a SRS with the customer
which says that any required failover must happen in 30 seconds. So, in an
extreme scenario when fencing device itself fails, I won't be able to
fulfill the terms of SRS. Please advice.

Thanks,

Dejan

> Regards



On Mon, Jul 2, 2012 at 5:15 PM, sachin garg <sachingarg2k1 at gmail.com> wrote:

> Hi,
>
> I am using IPMI plugin for configuring STONITH with heartbeat cluster.
> If a resource fails on one node then the other node STONITHs that node.
> But when the failed node comes back after the reboot, the STONITH device
> itself fails on the node which has started again. Logs indicate that IPMI
> start operation returned 1 (i.e. unknown error). I suspect that this may be
> due to some initialization delays at network level. But I am not sure about
> this. What could be the best way to overcome this issue? I consider adding
> a start delay to stonith device but can't say if that is the right
> approach.
>
> Moreover, how should one configure start/monitor operation failure for a
> STONITH device? I have currently configured pacemaker to fence the node if
> start/monitor operation fails for STONITH device. Is this the right
> configuration?
>
> And what should be the monitoring frequency for STONITH device?
>
> Regards
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120702/3eab92cf/attachment-0003.html>


More information about the Pacemaker mailing list