[Pacemaker] Question about the error when fencing failed
Kazunori INOUE
inouekazu at intellilink.co.jp
Thu Apr 11 09:23:43 UTC 2013
Hi Andrew,
(13.04.08 11:04), Andrew Beekhof wrote:
>
> On 05/04/2013, at 3:21 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>
>> Hi,
>>
>> When fencing failed (*1) on the following conditions, an error occurs
>> in stonith_perform_callback().
>>
>> - using fencing-topology. (*2)
>> - fence DC node. ($ crm node fence dev2)
>>
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: notice: handle_request: Client crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)'
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: notice: handle_request: Forwarding complex self fencing request to peer dev1
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: Processed st_fence from crmd.2282: Operation now in progress (-115)
>> Apr 3 17:04:47 dev2 pengine[2281]: warning: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: Processed st_query from dev1: OK (0)
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_action_create: Initiating action list for agent fence_legacy (target=(null))
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: Processed st_timeout_update from dev1: OK (0)
>> Apr 3 17:04:47 dev2 stonith-ng[2278]: info: dynamic_list_search_cb: Refreshing port list for f-dev1
>> Apr 3 17:04:48 dev2 stonith-ng[2278]: notice: remote_op_done: Operation reboot of dev2 by dev1 for crmd.2282 at dev1.4494ed41: Generic Pacemaker error
>> Apr 3 17:04:48 dev2 stonith-ng[2278]: info: stonith_command: Processed st_notify reply from dev1: OK (0)
>> Apr 3 17:04:48 dev2 crmd[2282]: error: crm_abort: stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id > 0
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result <st-reply st_origin="stonith_construct_reply" t="stonith-ng" st_rc="-201" st_op="st_query" st_callid="0" st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1" st_clientname="crmd.2282" st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_callopt="0" st_delegate="dev1">
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result <st_calldata>
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result <st-reply t="st_notify" subt="broadcast" st_op="reboot" count="1" src="dev1" state="4" st_target="dev2">
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result <st_calldata>
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result <st_notify_fence state="4" st_rc="-201" st_target="dev2" st_device_action="reboot" st_delegate="dev1" st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_origin="dev1" st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1" st_clientname="crmd.2282"/>
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result </st_calldata>
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result </st-reply>
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result </st_calldata>
>> Apr 3 17:04:48 dev2 crmd[2282]: error: stonith_perform_callback: Bad result </st-reply>
>> Apr 3 17:04:48 dev2 crmd[2282]: warning: stonith_perform_callback: STONITH command failed: Generic Pacemaker error
>> Apr 3 17:04:48 dev2 crmd[2282]: notice: tengine_stonith_notify: Peer dev2 was not terminated (st_notify_fence) by dev1 for dev1: Generic Pacemaker error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client crmd.2282
>> Apr 3 17:07:11 dev2 crmd[2282]: error: stonith_async_timeout_handler: Async call 2 timed out after 144000ms
>>
>> Is this the designed behavior?
>
> Definitely not :-(
> Is this the first fencing operation that has been initiated by the cluster?
Yes.
I attached crm_report.
> Or has the cluster been running for some time?
>
----
Best Regards,
Kazunori INOUE
>>
>> *1: I added "exit 1" to reset() of stonith-plugin in order to make
>> fencing fail.
>>
>> $ diff -u libvirt.ORG libvirt
>> --- libvirt.ORG 2012-12-17 09:56:37.000000000 +0900
>> +++ libvirt 2013-04-03 16:33:08.118157947 +0900
>> @@ -240,6 +240,7 @@
>> ;;
>>
>> reset)
>> + exit 1
>> libvirt_check_config
>> libvirt_set_domain_id $2
>>
>> *2:
>> node $id="3232261523" dev2
>> node $id="3232261525" dev1
>> primitive f-dev1 stonith:external/libvirt \
>> params pcmk_reboot_retries="1" hostlist="dev1" \
>> hypervisor_uri="qemu+ssh://bl460g1n5/system"
>> primitive f-dev2 stonith:external/libvirt \
>> params pcmk_reboot_retries="1" hostlist="dev2" \
>> hypervisor_uri="qemu+ssh://bl460g1n6/system"
>> location rsc_location-f-dev1 f-dev1 \
>> rule $id="rsc_location-f-dev1-rule" -inf: #uname eq dev1
>> location rsc_location-f-dev2 f-dev2 \
>> rule $id="rsc_location-f-dev2-rule" -inf: #uname eq dev2
>> fencing_topology \
>> dev1: f-dev1 \
>> dev2: f-dev2
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-1.el6-132019b" \
>> cluster-infrastructure="corosync" \
>> no-quorum-policy="ignore" \
>> stonith-timeout="70s"
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unexplained-crmd-error.tar.bz2
Type: application/octet-stream
Size: 88035 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130411/9fb4f143/attachment-0004.obj>
More information about the Pacemaker
mailing list