[Pacemaker] stonith - using multiple fencing devices for one node to fence device with redundant power sources
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Oct 4 09:43:20 UTC 2013
Hi,
On Fri, Oct 04, 2013 at 10:43:56AM +0200, Nikola Ciprich wrote:
> Hi Guys,
>
> thanks a lot for the tip, fencing_topology seems to be exactly what I
> need! However, there seems to be the problem, I'm not sure whether
> it's me, pacemaker or stonith agent..
>
> I've set 4 stonith primitives, as per document:
>
> primitive stonith-vbox3-1-off stonith:fence_netio \
> params ipaddr="10.76.6.13" login="admin" passwd="admin" port="1" pcmk_host_list="vbox4" verbose="1" debug="/tmp/stonith-1-off.log" power_wait="1" action="off"
> primitive stonith-vbox3-1-on stonith:fence_netio \
> params ipaddr="10.76.6.13" login="admin" passwd="admin" port="1" pcmk_host_list="vbox4" verbose="1" debug="/tmp/stonith-1-on.log" power_wait="1" action="on"
> primitive stonith-vbox3-2-off stonith:fence_netio \
> params ipaddr="10.76.6.12" login="admin" passwd="admin" port="2" pcmk_host_list="vbox4" verbose="1" debug="/tmp/stonith-2-off.log" power_wait="1" action="off"
> primitive stonith-vbox3-2-on stonith:fence_netio \
> params ipaddr="10.76.6.12" login="admin" passwd="admin" port="2" pcmk_host_list="vbox4" verbose="1" debug="/tmp/stonith-2-on.log" power_wait="1" action="on"
>
> one ON and one OFF for each PDU.
Good luck with that.
> then I set fencing topology as follows:
>
> fencing_topology vbox4: stonith-vbox3-1-off,stonith-vbox3-2-off,stonith-vbox3-1-on,stonith-vbox3-2-on
>
> (btw my crmsh complaints about syntax here, but it seems to put in into CIB anyways,
Which version of crmsh do you run? If it's 1.2.6, please open a
bug report. If not, please upgrade :)
Thanks,
Dejan
> so more on this maybe later)
>
> then when I kill the node, thus firing up the node fencing operation, following happens:
>
> (log snippet)
>
> Oct 4 10:11:53 vbox3 stonith-ng[4170]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for vbox4: 0c54accc-6c1c-49dc-be0f-7bbe00fdb917 (0)
> Oct 4 10:11:56 vbox3 stonith-ng[4170]: notice: log_operation: Operation 'reboot' [18469] (call 0 from crmd.4174) for host 'vbox4' with device 'stonith-vbox3-1-off' returned: 0 (OK)
> Oct 4 10:11:56 vbox3 stonith-ng[4170]: notice: process_remote_stonith_exec: Call to stonith-vbox3-1-off for vbox4 on behalf of crmd.4174 at vbox3: passed (0)
> Oct 4 10:11:58 vbox3 stonith-ng[4170]: notice: log_operation: Operation 'reboot' [18527] (call 0 from crmd.4174) for host 'vbox4' with device 'stonith-vbox3-2-off' returned: 0 (OK)
> Oct 4 10:11:58 vbox3 stonith-ng[4170]: notice: process_remote_stonith_exec: Call to stonith-vbox3-2-off for vbox4 on behalf of crmd.4174 at vbox3: passed (0)
> Oct 4 10:12:01 vbox3 stonith-ng[4170]: notice: log_operation: Operation 'reboot' [18533] (call 0 from crmd.4174) for host 'vbox4' with device 'stonith-vbox3-1-on' returned: 0 (OK)
> Oct 4 10:12:01 vbox3 stonith-ng[4170]: notice: process_remote_stonith_exec: Call to stonith-vbox3-1-on for vbox4 on behalf of crmd.4174 at vbox3: passed (0)
> Oct 4 10:12:04 vbox3 stonith-ng[4170]: notice: log_operation: Operation 'reboot' [18539] (call 0 from crmd.4174) for host 'vbox4' with device 'stonith-vbox3-2-on' returned: 0 (OK)
> Oct 4 10:12:04 vbox3 stonith-ng[4170]: notice: process_remote_stonith_exec: Call to stonith-vbox3-2-on for vbox4 on behalf of crmd.4174 at vbox3: passed (0)
>
> the order looks good, but according to stonith agent debug logs, reboot operation is always executed
> (exactly P1: off,on P2: off,on, P1: off,on P2: off,on)
> instead of P1: off P2: off P1: on P2: on!
>
> before I try digging deeper into this, does somebody of You idea on where the problem could be?
>
> Does "Operation 'reboot' [18539] (call 0 from crmd.4174) for host 'vbox4' with device 'stonith-vbox3-2-on'"
> mean that reboot action is always executed? Than would certainly be the problem, but why is that,
> if I have different actions defined? Is this a bug, or some my fault?
>
> thanks a lot in advance!
>
> with best regards
>
> nik
>
>
> On Fri, Oct 04, 2013 at 10:18:56AM +0200, Lars Marowsky-Bree wrote:
> > On 2013-10-03T23:50:15, Digimer <lists at alteeve.ca> wrote:
> >
> > > > digimer's hack works, but it makes my eyes bleed. ;-)
> > > meanie!
> >
> > That's not because of what you diligently debugged and described,
> > though, but because it's necessary. In my opinion, 90%+ of all setups
> > that actually need to use more than one device per level will need your
> > document, and that is quite complex to force on users.
> >
> > (Just imagine that for multiple nodes that share power switches!)
> >
> >
> > Regards,
> > Lars
> >
> > --
> > Architect Storage/HA
> > SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.: +420 591 166 214
> fax: +420 596 621 273
> mobil: +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis at linuxbox.cz
> -------------------------------------
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list