[Pacemaker] questions about expected behaviour stonith:meatware

imnotpc imnotpc at rock3d.net
Wed Jun 15 23:50:49 CET 2011


On Wednesday, June 15, 2011 17:14:49 Dejan Muhamedagic wrote:
> Hi,
> 
> On Wed, Jun 15, 2011 at 10:29:15PM +0200, Jelle de Jong wrote:
> > Hello everybody,
> > 
> > I was doing some testing/experiments with stonith:meatware using the
> > following configuration: http://paste.debian.net/119991/
> > 
> > question 1: does somebody know if I should add a pingd location to the
> > meatware stonith (see configuration)
> 
> No.
> 
> > question 2: I had my resources running on node hennessy, pulled out all
> > network cables to see what happens next...
> > 
> > The viktoriya node detects hennessy is gone and sends out the following
> > distress call:
> > 
> > info: client tengine [pid: 1776] requests a STONITH operation RESET on
> > node hennessy
> > info: stonith_operate_locally::2713: sending fencing op RESET for
> > hennessy to stonith_hennessy (meatware) (pid=8029)
> > CRIT: OPERATOR INTERVENTION REQUIRED to reset hennessy.
> > CRIT: Run "meatclient -c hennessy" AFTER power-cycling the machine.
> > 
> > So I am waiting and waiting until viktoriya starts its kvm guests but
> > nothing happens it just sits there... http://paste.debian.net/119984/
> > 
> > _until_ I executed meatclient -c hennessy (y) and the kvm guests started
> > 
> > Is this what stonith:meatware is expected to do? I was hoping viktoriya
> 
> Yes. meatware is software run by meat (aka operator). It says so
> above:
> 
> 	OPERATOR INTERVENTION REQUIRED to reset hennessy.
> 
> Funny but it looks fairly unequivocal to me.

Yes and no. The message is clear but unless you have someone sitting at a 
console 24/7 running tail on the log file, it has little value. According to 
the ClusterLabs stonith docs (which I just realized you wrote, haha):

"Whenever invoked, meatware logs a CRIT severity message which should show up 
on the node’s console."

While meatware seems to provide the fencing action as expected I've never seen 
any console messages either. There are at least 2 possible reasons I've found:

1) You need to check /etc/rsylog.conf (or equivalent) to see that .crit 
messages are indeed logged to the console. On Fedora 15 at least there is no 
special routing for .crit messages and they are sent to /var/log/messages by 
default. You might try inserting a line "*.crit	/dev/console" although it 
didn't help me.

2) Even though fencing works I still get lots of warn and error messages in 
the logs. Here's a sample:

[...]
Jun 15 18:12:34 JeffDesk stonith-ng: [560]: info: remote_op_done: Notifing 
clients of be940d4a-27d0-43a5-9d31-21b1fa8e761f (reboot of Server2.LAN from 
8a883947-7022-4cad-9dba-d42110819bf9 by JeffDesk.LAN): 0, rc=1
Jun 15 18:12:34 JeffDesk stonith-ng: [560]: info: stonith_notify_client: 
Sending st_fence-notification to client 565/4ab08c7b-b795-404c-b1d6-
dc2215a60248
Jun 15 18:12:34 JeffDesk crmd: [565]: ERROR: stonith_error2string: Unknown 
Stonith error code: 1
Jun 15 18:12:34 JeffDesk crmd: [565]: ERROR: tengine_stonith_notify: Peer 
Server2.LAN could not be terminated (reboot) by JeffDesk.LAN for Server4.LAN 
(ref=be940d4a-27d0-43a5-9d31-21b1fa8e761f): <unknown error>
Jun 15 18:12:40 JeffDesk stonith-ng: [560]: info: stonith_queryQuery 
<stonith_command t="stonith-ng" st_async_id="62604f54-1777-45d5-
b88e-9a8e53209e35" st_op="st_query" st_callid="0" st_callopt="0" 
st_remote_op="62604f54-1777-45d5-b88e-9a8e53209e35" st_target="Server2.LAN" 
st_device_action="reboot" st_clientid="8a883947-7022-4cad-9dba-d42110819bf9" 
st_timeout="6000" src="Server4.LAN" seq="80" />
Jun 15 18:12:40 JeffDesk stonith-ng: [560]: info: can_fence_host_with_device: 
Meatware-Fence:0 can fence Server2.LAN: dynamic-list
Jun 15 18:12:40 JeffDesk stonith-ng: [560]: info: stonith_query: Found 1 
matching devices for 'Server2.LAN'
Jun 15 18:12:40 JeffDesk stonith-ng: [560]: info: stonith_fenceExec 
<stonith_command t="stonith-ng" st_async_id="62604f54-1777-45d5-
b88e-9a8e53209e35" st_op="st_fence" st_callid="0" st_callopt="0" 
st_remote_op="62604f54-1777-45d5-b88e-9a8e53209e35" st_target="Server2.LAN" 
st_device_action="reboot" st_timeout="54000" src="Server4.LAN" seq="82" />
Jun 15 18:12:40 JeffDesk stonith-ng: [560]: info: can_fence_host_with_device: 
Meatware-Fence:0 can fence Server2.LAN: dynamic-list
Jun 15 18:12:40 JeffDesk stonith-ng: [560]: info: stonith_fence: Found 1 
matching devices for 'Server2.LAN'
Jun 15 18:12:51 JeffDesk stonith: meatware device OK.
Jun 15 18:13:21 JeffDesk stonith: meatware device OK.
Jun 15 18:13:34 JeffDesk stonith-ng: [560]: WARN: 62604f54-1777-45d5-
b88e-9a8e53209e35 process (PID 31288) timed out (try 1).  Killing with signal 
SIGTERM (15).
Jun 15 18:13:34 JeffDesk stonith-ng: [560]: ERROR: log_operation: Operation 
'reboot' [31288] for host 'Server2.LAN' with device 'Meatware-Fence:0' 
returned: 1 (call 0 from (null))
Jun 15 18:13:34 JeffDesk stonith-ng: [560]: info: 
process_remote_stonith_execExecResult <st-reply 
st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" 
st_remote_op="62604f54-1777-45d5-b88e-9a8e53209e35" st_callid="0" 
st_callopt="0" st_rc="1" st_output="Performing: stonith -t meatware -T reset 
Server2.LAN#012** INFO: parse config info info=JeffDesk.LAN Server2.LAN 
Server4.LAN#012#012** (process:31289): CRITICAL **: OPERATOR INTERVENTION 
REQUIRED to reset server2.lan.#012#012** (process:31289): CRITICAL **: Run 
"meatclient -c server2.lan" AFTER power-cycling the machine.#012failed: 
Server2.LAN 0.05859375#012" src="JeffDesk.LAN" seq="3" />
Jun 15 18:13:34 JeffDesk stonith-ng: [560]: info: remote_op_done: Notifing 
clients of 62604f54-1777-45d5-b88e-9a8e53209e35 (reboot of Server2.LAN from 
8a883947-7022-4cad-9dba-d42110819bf9 by JeffDesk.LAN): 0, rc=1
Jun 15 18:13:34 JeffDesk stonith-ng: [560]: info: stonith_notify_client: 
Sending st_fence-notification to client 565/4ab08c7b-b795-404c-b1d6-
dc2215a60248
Jun 15 18:13:34 JeffDesk crmd: [565]: ERROR: stonith_error2string: Unknown 
Stonith error code: 1
Jun 15 18:13:34 JeffDesk crmd: [565]: ERROR: tengine_stonith_notify: Peer 
Server2.LAN could not be terminated (reboot) by JeffDesk.LAN for Server4.LAN 
(ref=62604f54-1777-45d5-b88e-9a8e53209e35): <unknown error>
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: stonith_queryQuery 
<stonith_command t="stonith-ng" st_async_id="fe6c7bcf-
b444-408b-88d1-72abd3f13b21" st_op="st_query" st_callid="0" st_callopt="0" 
st_remote_op="fe6c7bcf-b444-408b-88d1-72abd3f13b21" st_target="Server2.LAN" 
st_device_action="reboot" st_clientid="8a883947-7022-4cad-9dba-d42110819bf9" 
st_timeout="6000" src="Server4.LAN" seq="83" />
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: can_fence_host_with_device: 
Refreshing port list for Meatware-Fence:0
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: WARN: parse_host_line: Could not 
parse (0 2): ** INFO: parse config info info=JeffDesk.LAN Server2.LAN 
Server4.LAN
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: WARN: parse_host_line: Could not 
parse (0 0): 
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: can_fence_host_with_device: 
Meatware-Fence:0 can fence Server2.LAN: dynamic-list
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: stonith_query: Found 1 
matching devices for 'Server2.LAN'
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: stonith_fenceExec 
<stonith_command t="stonith-ng" st_async_id="fe6c7bcf-
b444-408b-88d1-72abd3f13b21" st_op="st_fence" st_callid="0" st_callopt="0" 
st_remote_op="fe6c7bcf-b444-408b-88d1-72abd3f13b21" st_target="Server2.LAN" 
st_device_action="reboot" st_timeout="54000" src="Server4.LAN" seq="85" />
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: can_fence_host_with_device: 
Meatware-Fence:0 can fence Server2.LAN: dynamic-list
Jun 15 18:13:40 JeffDesk stonith-ng: [560]: info: stonith_fence: Found 1 
matching devices for 'Server2.LAN'
Jun 15 18:13:51 JeffDesk stonith: meatware device OK.
Jun 15 18:14:21 JeffDesk stonith: meatware device OK.
Jun 15 18:14:34 JeffDesk stonith-ng: [560]: WARN: fe6c7bcf-
b444-408b-88d1-72abd3f13b21 process (PID 31469) timed out (try 1).  Killing 
with signal SIGTERM (15).
Jun 15 18:14:34 JeffDesk stonith-ng: [560]: ERROR: log_operation: Operation 
'reboot' [31469] for host 'Server2.LAN' with device 'Meatware-Fence:0' 
returned: 1 (call 0 from (null))
Jun 15 18:14:34 JeffDesk stonith-ng: [560]: info: 
process_remote_stonith_execExecResult <st-reply 
st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" 
st_remote_op="fe6c7bcf-b444-408b-88d1-72abd3f13b21" st_callid="0" 
st_callopt="0" st_rc="1" st_output="Performing: stonith -t meatware -T reset 
Server2.LAN#012** INFO: parse config info info=JeffDesk.LAN Server2.LAN 
Server4.LAN#012#012** (process:31470): CRITICAL **: OPERATOR INTERVENTION 
REQUIRED to reset server2.lan.#012#012** (process:31470): CRITICAL **: Run 
"meatclient -c server2.lan" AFTER power-cycling the machine.#012failed: 
Server2.LAN 0.05859375#012" src="JeffDesk.LAN" seq="5" />
[...]

These messages repeat until I release the fence with meatclient.

Thanks, Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110615/05ed44e7/attachment-0001.html>


More information about the Pacemaker mailing list