[Pacemaker] DRBD Split-brain (recovered), but still showing "Failed Actions"

Wed Apr 11 00:12:10 CEST 2012

On 04/10/2012 05:43 PM, Reid, Mike wrote:
> Thank you for the suggestion, Andreas. Unfortunately, that does not appear
> to have cleaned up the Failed Actions either:
> 
>> crm resource cleanup msDRBD
> 
> Cleaning up resDRBD:0 on hostname2
> Cleaning up resDRBD:1 on hostname2
> Cleaning up resDRBD:0 on hostname1
> Cleaning up resDRBD:1 on hostname1
> 
>> crm_mon -1
> 
> [...]
> Failed actions:
>     resDRBD:1_promote_0 (node=hostname2, call=530, rc=-2, status=Timed
> Out): unknown exec error
> 
> 
> Are there any other options that do not involve a failover + restart?

If you switch your cluster into maintenance mode ...

crm configure property maintenance-mode=true

... you can stop pacemaker and even corosync without interrupting your
services ... don't forget to disable it again after restart.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> 
> 
> 
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 10 Apr 2012 11:23:41 +0200
>> From: Andreas Kurz <andreas at hastexo.com>
>> To: pacemaker at oss.clusterlabs.org
>> Subject: Re: [Pacemaker] DRBD Split-brain (recovered), but still
>> 	showing "Failed Actions"
>> Message-ID: <4F83FC1D.9010908 at hastexo.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> On 04/09/2012 07:07 PM, Reid, Mike wrote:
>>> We recently had a DRBD (v8.3.12) split brain scenario on our two-node
>>> Active/Passive web cluster. The failover worked as expected, and we were
>>> manually able to resolve the split brain scenario without issue.
>>>
>>> However, CRM MON is still showing the following "Failed Actions" which I
>>> would like to clean up:
>>>
>>> * resDRBD:1_promote_0 (node=hostname2, call=530, rc=-2, status=Timed
>>> Out): unknown exec error
>>>
>>> Unfortunately, "crm resource cleanup resDRBD" and "crm resource cleanup
>>> resDRBD hostname2" do not seem to be doing the trick even though the
>>> command appears to run correctly. Is there any other option to clean up
>>> the failed actions message short of failing over to the original node,
>>> and rebooting "hostname2" ?
>>
>> cleanup the MS resource to cleanup all its instances.
>>
>> Regards,
>> Andreas
>>
>> -- 
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>> Running Ubuntu 10.10
>>> Stack: openais
>>> Version: 1.0.9
>>> ///
>>> /
>>>
>>> /
>>> /
>>>
>>> /_______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> /
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120411/f480a5f4/attachment.sig>