[Pacemaker] Problems with fence_ipmilan

Andreas Mock andreas.mock at web.de
Tue Sep 17 11:17:27 UTC 2013


Hi Digimer,

your hint concerning acpid was very valueable.
I didn't know about that recommendation.
After disabling acpid I could stonith instantly
as I like to do.

The video has no context. It was meant to make
this dry stuff a little bit funny. IMHO worth looking
anyway.

Thank you!

Best regards
Andreas Mock


-----Ursprüngliche Nachricht-----
Von: Digimer [mailto:lists at alteeve.ca] 
Gesendet: Dienstag, 17. September 2013 06:37
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] Problems with fence_ipmilan

On 16/09/13 16:53, Andreas Mock wrote:
> Hi all,
> 
> I'm using (want to use) RHEL 6.4 fence_ipmilan for our IBM x3650 M4 (IMM).
> My problem is the following. In contrast to the documented behaviour
> a 'chassis power off' or a 'chassis power reset' is doing a soft reset as
if
> you have pressed the on-off-button of the server. That means the
> shutdown process is initiated.
> 
> As you can imagine this is like stonithing this way:
> http://www.youtube.com/watch?v=fVJiwuk75Ig#t=1m23s
> Especially when a SAN volume is blocking in 'D' state.
> 
> What I want is a hard reset. It seems that the only solution
> at the moment is to send a 'chassis power reset'.
> fence_ipmilan doesn't support that ipmi command at the
> moment.
> 
> Has anybody experience with similar (bad) behaviour and workarounds?
> 
> Best regards
> Andreas Mock

I can't watch the video (yay hotel internet \o/), so if there is context
there, I am missing it.

The FenceAgentAPI says that "reset" should be "off -> verify -> try on
but don't care if that fails". This is because "reset" doesn't have a
verifiable "off" state.

Next is that you probably have acpid enabled. Most (all?) systems will
instantly turn off if acpid is disabled. For this reason, Red Hat
actually recommends disabling acpid to help avoid this issue.

Third; With IPMI type fence devices, there is no way to prevent one
fence from starting after another one has started because the devices
are independent. So to help deal with this, it's a good idea to set a
'delay="15"' to one of the node's fence methods. This way, if there is a
break and both nodes try to fence the other, the node with the delay
will not be fenced immediately. Say you set the delay against node 1.
Then there is a break and both start a fence. Node 2 will see that Node
1 has a delay of fifteen seconds and pauses. Node 1 will see no delay
against node 2, so it fences immediately. Node 2 will be long dead
before it's timer expires, so you avoid the dual fence. Had node 1
really crashed, node 2 would delay 15 seconds, then proceed with the fence.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?





More information about the Pacemaker mailing list