[Pacemaker] question about stonith:external/libvirt

Matthew O'Connor matt at ecsorl.com
Mon May 21 19:35:22 UTC 2012



On 05/21/2012 02:26 PM, Florian Haas wrote:
> On Mon, May 21, 2012 at 8:14 PM, Matthew O'Connor <matt at ecsorl.com> wrote:
>> On 05/21/2012 05:43 AM, Florian Haas wrote:
>>> Does it have "fencing resource-and-stonith" in the DRBD configuration,
>>> and stonith_admin-fence-peer.sh as its fence-peer handler?
>> That was the problem.  Totally forgot to update my DRBD configuration.
> I actually wasn't saying that that was the root cause of your problem.
> :) But it's worth looking into, anyhow.

Ah - well, for sake of barking up the right tree, here is a snippet of
the logs of l2 after l3 was halted, and before making any changes to the
DRBD configuration:

May 19 23:00:13 l2 stonith-ng: [1554]: info: initiate_remote_stonith_op:
Initiating remote operation reboot for l3:
b1374d19-458b-4520-9cbf-e2e5812e6639
May 19 23:00:13 l2 stonith-ng: [1554]: info: can_fence_host_with_device:
p_fence-l3 can fence l3: none
May 19 23:00:13 l2 stonith-ng: [1554]: info: call_remote_stonith:
Requesting that l2 perform op reboot l3
May 19 23:00:13 l2 stonith-ng: [1554]: info: stonith_fence: Exec
<stonith_command t="stonith-ng"
st_async_id="b1374d19-458b-4520-9cbf-e2e5812e6639" st_op="st_fence"
st_callid="0" st_callopt="0"
st_remote_op="b1374d19-458b-4520-9cbf-e2e5812e6639" st_target="l3"
st_device_action="reboot" st_timeout="54000" src="l2" seq="10" />
May 19 23:00:13 l2 stonith-ng: [1554]: info: can_fence_host_with_device:
p_fence-l3 can fence l3: none
May 19 23:00:13 l2 stonith-ng: [1554]: info: stonith_fence: Found 1
matching devices for 'l3'
May 19 23:00:13 l2 stonith-ng: [1554]: info: stonith_command: Processed
st_fence from l2: rc=-1
May 19 23:00:13 l2 stonith-ng: [1554]: info: make_args: reboot-ing node
'l3' as 'port=l3'
May 19 23:00:14 l2 stonith-ng: [1554]: info: stonith_command: Processed
st_execute from lrmd: rc=-1
May 19 23:00:19 l2 stonith-ng: [1554]: info: log_operation: Operation
'reboot' [7042] (call 0 from (null)) for host 'l3' with device
'p_fence-l3' returned: 0
May 19 23:00:19 l2 stonith-ng: [1554]: info: log_operation: p_fence-l3:
Performing: stonith -t external/libvirt -T reset l3
May 19 23:00:19 l2 stonith-ng: [1554]: info: log_operation: p_fence-l3:
success: l3 0
May 19 23:00:19 l2 stonith-ng: [1554]: info:
process_remote_stonith_exec: ExecResult <st-reply
st_origin="stonith_construct_async_reply" t="stonith-ng"
st_op="st_notify" st_remote_op="b1374d19-458b-4520-9cbf-e2e5812e6639"
st_callid="0" st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/libvirt -T reset l3#012success: l3 0#012" src="l2" seq="11" />
May 19 23:00:19 l2 stonith-ng: [1554]: info: remote_op_done: Notifing
clients of b1374d19-458b-4520-9cbf-e2e5812e6639 (reboot of l3 from
9f36c78b-06c8-4b62-bc84-6cb87b30351b by l2): 2, rc=0
May 19 23:00:19 l2 crmd: [1559]: info: tengine_stonith_callback:
StonithOp <st-reply st_origin="stonith_construct_async_reply"
t="stonith-ng" st_op="reboot"
st_remote_op="b1374d19-458b-4520-9cbf-e2e5812e6639" st_callid="0"
st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/libvirt -T reset l3#012success: l3 0#012" src="l2" seq="11"
state="2" st_target="l3" />
May 19 23:00:19 l2 stonith-ng: [1554]: info: stonith_notify_client:
Sending st_fence-notification to client
1559/b09a62f6-b077-4181-98da-91f43f40bc9a
May 19 23:00:19 l2 crmd: [1559]: info: tengine_stonith_callback:
StonithOp <st-reply st_origin="stonith_construct_async_reply"
t="stonith-ng" st_op="reboot"
st_remote_op="b1374d19-458b-4520-9cbf-e2e5812e6639" st_callid="0"
st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/libvirt -T reset l3#012success: l3 0#012" src="l2" seq="11"
state="2" st_target="l3" />
May 19 23:00:19 l2 crmd: [1559]: info: tengine_stonith_callback: Stonith
operation 4/82:118:0:b92bcccd-5765-469c-b56e-392cc065b65c: OK (0)
May 19 23:00:19 l2 crmd: [1559]: info: tengine_stonith_callback: Stonith
of l3 passed
May 19 23:00:19 l2 crmd: [1559]: info: send_stonith_update: Sending
fencing update 358 for l3
May 19 23:00:19 l2 stonith-ng: [1554]: info: stonith_notify_client:
Sending st_fence-notification to client
1559/b09a62f6-b077-4181-98da-91f43f40bc9a
May 19 23:00:19 l2 crmd: [1559]: info: tengine_stonith_notify: Peer l3
was terminated (reboot) by l2 for l2
(ref=b1374d19-458b-4520-9cbf-e2e5812e6639): OK
May 19 23:00:19 l2 crmd: [1559]: notice: tengine_stonith_notify:
Notified CMAN that 'l3' is now fenced
May 19 23:00:19 l2 crmd: [1559]: notice: tengine_stonith_notify:
Confirmed CMAN fencing event for 'l3'


AND here is a log snippet from after the DRBD configuration was updated. 

May 21 14:36:02 l2 stonith-ng: [1618]: info: initiate_remote_stonith_op:
Initiating remote operation reboot for l3:
9c19ba05-363c-48b4-ade3-d9dac5087866
May 21 14:36:02 l2 stonith-ng: [1618]: info: can_fence_host_with_device:
p_fence-l3 can fence l3: none
May 21 14:36:02 l2 stonith-ng: [1618]: info: call_remote_stonith:
Requesting that l2 perform op reboot l3
May 21 14:36:02 l2 stonith-ng: [1618]: info: stonith_fence: Exec
<stonith_command t="stonith-ng"
st_async_id="9c19ba05-363c-48b4-ade3-d9dac5087866" st_op="st_fence"
st_callid="0" st_callopt="0"
st_remote_op="9c19ba05-363c-48b4-ade3-d9dac5087866" st_target="l3"
st_device_action="reboot" st_timeout="54000" src="l2" seq="20" />
May 21 14:36:02 l2 stonith-ng: [1618]: info: can_fence_host_with_device:
p_fence-l3 can fence l3: none
May 21 14:36:02 l2 stonith-ng: [1618]: info: stonith_fence: Found 1
matching devices for 'l3'
May 21 14:36:02 l2 stonith-ng: [1618]: info: stonith_command: Processed
st_fence from l2: rc=-1
May 21 14:36:02 l2 stonith-ng: [1618]: info: make_args: reboot-ing node
'l3' as 'port=l3'
May 21 14:36:08 l2 stonith-ng: [1618]: info: log_operation: Operation
'reboot' [341] (call 0 from (null)) for host 'l3' with device
'p_fence-l3' returned: 0
May 21 14:36:08 l2 stonith-ng: [1618]: info: log_operation: p_fence-l3:
Performing: stonith -t external/libvirt -T reset l3
May 21 14:36:08 l2 stonith-ng: [1618]: info: log_operation: p_fence-l3:
success: l3 0
May 21 14:36:08 l2 stonith-ng: [1618]: info:
process_remote_stonith_exec: ExecResult <st-reply
st_origin="stonith_construct_async_reply" t="stonith-ng"
st_op="st_notify" st_remote_op="9c19ba05-363c-48b4-ade3-d9dac5087866"
st_callid="0" st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/libvirt -T reset l3#012success: l3 0#012" src="l2" seq="21" />
May 21 14:36:08 l2 stonith-ng: [1618]: info: remote_op_done: Notifing
clients of 9c19ba05-363c-48b4-ade3-d9dac5087866 (reboot of l3 from
f782c9f8-71e1-4ec2-8f45-93a4b2f7f795 by l2): 2, rc=0
May 21 14:36:08 l2 crmd: [1623]: info: tengine_stonith_callback:
StonithOp <st-reply st_origin="stonith_construct_async_reply"
t="stonith-ng" st_op="reboot"
st_remote_op="9c19ba05-363c-48b4-ade3-d9dac5087866" st_callid="0"
st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/libvirt -T reset l3#012success: l3 0#012" src="l2" seq="21"
state="2" st_target="l3" />
May 21 14:36:08 l2 crmd: [1623]: info: tengine_stonith_callback: Stonith
operation 5/81:56:0:e647e4db-cb29-4db4-a0bc-b631fc35f5ec: OK (0)
May 21 14:36:08 l2 crmd: [1623]: info: tengine_stonith_callback: Stonith
of l3 passed
May 21 14:36:08 l2 crmd: [1623]: info: send_stonith_update: Sending
fencing update 276 for l3
May 21 14:36:08 l2 stonith-ng: [1618]: info: stonith_notify_client:
Sending st_fence-notification to client
1623/ffe204e9-3d5d-4a11-b605-084d3f61980d
May 21 14:36:08 l2 crmd: [1623]: info: tengine_stonith_notify: Peer l3
was terminated (reboot) by l2 for l2
(ref=9c19ba05-363c-48b4-ade3-d9dac5087866): OK
May 21 14:36:08 l2 stonith-ng: [1618]: info: stonith_device_execute:
Nothing to do for p_fence-l3
May 21 14:36:08 l2 crmd: [1623]: notice: tengine_stonith_notify:
Notified CMAN that 'l3' is now fenced
May 21 14:36:08 l2 crmd: [1623]: notice: tengine_stonith_notify:
Confirmed CMAN fencing event for 'l3'

I am not sure this reveals much, but chances are you will see something
I don't! ;-)

>> For sake of testing, I used the "crm-fence-peer.sh" script - it seemed
>> to do the trick, although I strongly suspect this is the wrong script
>> for the job.
> It is. No good for dual-Primary, really, as it doesn't prevent split
> brain in that sort of configuration.
Yes, that is perfectly sensible. 

Perhaps my (still-in-testing) production cluster's problem will be a bit
simpler, then?  The DRBD resource there is actually operated in
single-primary mode on a two-node cluster, because it is served up over
iSCSI to another cluster of machines.  DLM/OCFS2 do not operate on the
DRBD/iSCSI host cluster, only on the iSCSI client cluster.  So, in this
case, would the crm-fence-peer.sh then be sufficient for the DRBD
cluster nodes?


>
>> Do I need to write my own script to call stonith_admin?
> No, stonith_admin-fence-peer.sh ships with recent DRBD releases.
Sadness...not found on Ubuntu 12.04.  They are providing v8.3.11.  I
will check with them...

Thanks!!
-- Matthew

>
> Cheers,
> Florian
>




More information about the Pacemaker mailing list