[ClusterLabs] How to cancel a fencing request?

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Tue Apr 3 15:47:02 EDT 2018


On Tue, 3 Apr 2018 07:36:31 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:

> On 04/02/2018 04:02 PM, Ken Gaillot wrote:
> > On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de Rorthais wrote:  
> >> On Sun, 1 Apr 2018 09:01:15 +0300
> >> Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> >>  
> >>> 31.03.2018 23:29, Jehan-Guillaume de Rorthais пишет:  
> >>>> Hi all,
> >>>>
> >>>> I experienced a problem in a two node cluster. It has one FA per
> >>>> node and
> >>>> location constraints to avoid the node each of them are supposed
> >>>> to
> >>>> interrupt.   
> >>> If you mean stonith resource - for all I know location it does not
> >>> affect stonith operations and only changes where monitoring action
> >>> is
> >>> performed.  
> >> Sure.
> >>  
> >>> You can create two stonith resources and declare that each
> >>> can fence only single node, but that is not location constraint, it
> >>> is
> >>> resource configuration. Showing your configuration would be
> >>> helpflul to
> >>> avoid guessing.  
> >> True, I should have done that. A conf worth thousands of words :)
> >>
> >>   crm conf<<EOC
> >>
> >>   primitive fence_vm_srv1 stonith:fence_virsh                   \
> >>     params pcmk_host_check="static-list" pcmk_host_list="srv1"  \
> >>            ipaddr="192.168.2.1" login="<user>"                  \
> >>            identity_file="/root/.ssh/id_rsa"                    \
> >>            port="srv1-d8" action="off"                          \
> >>     op monitor interval=10s
> >>
> >>   location fence_vm_srv1-avoids-srv1 fence_vm_srv1 -inf: srv1
> >>
> >>   primitive fence_vm_srv2 stonith:fence_virsh                   \
> >>     params pcmk_host_check="static-list" pcmk_host_list="srv2"  \
> >>            ipaddr="192.168.2.1" login="<user>"                  \
> >>            identity_file="/root/.ssh/id_rsa"                    \
> >>            port="srv2-d8" action="off"                          \
> >>     op monitor interval=10s
> >>
> >>   location fence_vm_srv2-avoids-srv2 fence_vm_srv2 -inf: srv2
> >>   
> >>   EOC
> >>  
> 
> -inf constraints like that should effectively prevent
> stonith-actions from being executed on that nodes.
> Though there are a few issues with location constraints
> and stonith-devices.

Not sure I understand, I dont want to prevent stonith actions on that nodes. So
a quick clarification of what I had in mind with this:

  * fence_vm_srv2 is suppose to be able to fence srv2
  * should fence_vm_srv2 fence srv2, it must be able to reply then confirm the
    stonith action
  * so fence_vm_srv2 must not start on srv2

Repeat the same for fence_vm_srv1.

So stonith action can run but

  * fence_vm_srv2 from srv1 to kill srv2
  * and fence_vm_srv1 from srv2 to kill srv1.

[...]
> >> In other words, is it:
> >>
> >>   delay -> fence query -> fencing action
> >>
> >> or 
> >>
> >>   fence query -> delay -> fence action
> >>
> >> ?
> >>
> >> The first definition would solve this issue, but not the second. As I
> >> understand it, as soon as the fence query has been sent, the node
> >> status is
> >> "UNCLEAN (online)".  
> > The latter -- you're correct, the node is already unclean by that time.
> > Since the stop did not succeed, the node must be fenced to continue
> > safely.  
> 
> Well, pcmk_delay_base/max are made for the case
> where both nodes in a 2-node-cluster loose contact
> and see the respectively other as unclean.
> If the looser gets fenced it's view of the partner-
> node becomes irrelevant.

IIRC, the survival node was DC and was seeing itself as "UNCLEEN (online)" as
this was the only way to stop the failing resource. There was just no fencing
resource available to kill it.



More information about the Users mailing list