[Pacemaker] Question about the error when fencing failed

Andrew Beekhof andrew at beekhof.net
Wed Apr 17 06:02:41 EDT 2013


On 17/04/2013, at 6:52 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:

> Hi Andrew,
> 
> I confirmed that this problem was fixed.

Excellent

> Thanks!

And thank you for bringing it to my attention :)

> 
> 
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>> Sent: Wednesday, April 17, 2013 2:04 PM
>> To: The Pacemaker cluster resource manager
>> Cc: shimazakik at intellilink.co.jp
>> Subject: Re: [Pacemaker] Question about the error when fencing failed
>> 
>> This should solve your issue:
>> 
>> 	https://github.com/beekhof/pacemaker/commit/dbbb6a6
>> 
>> On 11/04/2013, at 7:23 PM, Kazunori INOUE <inouekazu at intellilink.co.jp>
> wrote:
>> 
>>> Hi Andrew,
>>> 
>>> (13.04.08 11:04), Andrew Beekhof wrote:
>>>> 
>>>> On 05/04/2013, at 3:21 PM, Kazunori INOUE <inouekazu at intellilink.co.jp>
>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> When fencing failed (*1) on the following conditions, an error occurs
>>>>> in stonith_perform_callback().
>>>>> 
>>>>> - using fencing-topology. (*2)
>>>>> - fence DC node. ($ crm node fence dev2)
>>>>> 
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request:
> Client
>> crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)'
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request:
>> Forwarding complex self fencing request to peer dev1
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_command:
>> Processed st_fence from crmd.2282: Operation now in progress (-115)
>>>>> Apr  3 17:04:47 dev2 pengine[2281]:  warning: process_pe_message:
>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_command:
>> Processed st_query from dev1: OK (0)
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info:
> stonith_action_create:
>> Initiating action list for agent fence_legacy (target=(null))
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info: stonith_command:
>> Processed st_timeout_update from dev1: OK (0)
>>>>> Apr  3 17:04:47 dev2 stonith-ng[2278]:     info:
> dynamic_list_search_cb:
>> Refreshing port list for f-dev1
>>>>> Apr  3 17:04:48 dev2 stonith-ng[2278]:   notice: remote_op_done:
>> Operation reboot of dev2 by dev1 for crmd.2282 at dev1.4494ed41: Generic
>> Pacemaker error
>>>>> Apr  3 17:04:48 dev2 stonith-ng[2278]:     info: stonith_command:
>> Processed st_notify reply from dev1: OK (0)
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: crm_abort:
>> stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id >
> 0
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result   <st-reply st_origin="stonith_construct_reply" t="stonith-ng"
>> st_rc="-201" st_op="st_query" st_callid="0"
>> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1"
>> st_clientname="crmd.2282"
>> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_callopt="0"
>> st_delegate="dev1">
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result     <st_calldata>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result       <st-reply t="st_notify" subt="broadcast" st_op="reboot"
>> count="1" src="dev1" state="4" st_target="dev2">
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result         <st_calldata>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result           <st_notify_fence state="4" st_rc="-201" st_target="dev2"
>> st_device_action="reboot" st_delegate="dev1"
>> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_origin="dev1"
>> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1"
>> st_clientname="crmd.2282"/>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result         </st_calldata>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result       </st-reply>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result     </st_calldata>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:    error: stonith_perform_callback:
> Bad
>> result   </st-reply>
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:  warning: stonith_perform_callback:
>> STONITH command failed: Generic Pacemaker error
>>>>> Apr  3 17:04:48 dev2 crmd[2282]:   notice: tengine_stonith_notify:
> Peer
>> dev2 was not terminated (st_notify_fence) by dev1 for dev1: Generic
> Pacemaker
>> error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client crmd.2282
>>>>> Apr  3 17:07:11 dev2 crmd[2282]:    error:
>> stonith_async_timeout_handler: Async call 2 timed out after 144000ms
>>>>> 
>>>>> Is this the designed behavior?
>>>> 
>>>> Definitely not :-(
>>>> Is this the first fencing operation that has been initiated by the
> cluster?
>>> 
>>> Yes.
>>> I attached crm_report.
>>> 
>>>> Or has the cluster been running for some time?
>>>> 
>>> 
>>> ----
>>> Best Regards,
>>> Kazunori INOUE
>>> 
>>>>> 
>>>>> *1: I added "exit 1" to reset() of stonith-plugin in order to make
>>>>>   fencing fail.
>>>>> 
>>>>> $ diff -u libvirt.ORG libvirt
>>>>> --- libvirt.ORG 2012-12-17 09:56:37.000000000 +0900
>>>>> +++ libvirt     2013-04-03 16:33:08.118157947 +0900
>>>>> @@ -240,6 +240,7 @@
>>>>>      ;;
>>>>> 
>>>>>      reset)
>>>>> +    exit 1
>>>>>      libvirt_check_config
>>>>>      libvirt_set_domain_id $2
>>>>> 
>>>>> *2:
>>>>> node $id="3232261523" dev2
>>>>> node $id="3232261525" dev1
>>>>> primitive f-dev1 stonith:external/libvirt \
>>>>>     params pcmk_reboot_retries="1" hostlist="dev1" \
>>>>>     hypervisor_uri="qemu+ssh://bl460g1n5/system"
>>>>> primitive f-dev2 stonith:external/libvirt \
>>>>>     params pcmk_reboot_retries="1" hostlist="dev2" \
>>>>>     hypervisor_uri="qemu+ssh://bl460g1n6/system"
>>>>> location rsc_location-f-dev1 f-dev1 \
>>>>>     rule $id="rsc_location-f-dev1-rule" -inf: #uname eq dev1
>>>>> location rsc_location-f-dev2 f-dev2 \
>>>>>     rule $id="rsc_location-f-dev2-rule" -inf: #uname eq dev2
>>>>> fencing_topology \
>>>>>     dev1: f-dev1 \
>>>>>     dev2: f-dev2
>>>>> property $id="cib-bootstrap-options" \
>>>>>     dc-version="1.1.10-1.el6-132019b" \
>>>>>     cluster-infrastructure="corosync" \
>>>>>     no-quorum-policy="ignore" \
>>>>>     stonith-timeout="70s"
>>>>> 
>>>>> Best Regards,
>>>>> Kazunori INOUE
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> <unexplained-crmd-error.tar.bz2>__________________________________________
>> _____
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list