[Pacemaker] Does "stonith_admin --confirm" work?
Староверов Никита Александрович
nastaroverov at kola.so-ups.ru
Fri May 17 08:22:36 UTC 2013
Hello, pacemaker users and developers.
First, many thanks to clusterlabs.org for their software, Pacemaker helps us very much!
I am testing cluster configuration based on Pacemaker+CMAN. I configured fencing as described in Pacemker documentation about CMAN based clusters and it works.
May be I misunderstood something, but I can't acknowledge nodes fencing manually.
I use fence_ipmilan as device and when I plug out power cable from server stonith fails. I expected this, of course, but I don't know how to acknowledge manual fencing.
When I try stonith_admin -C node_name, it does nothing.
I see this in logs:
May 17 11:46:52 NODE1 stonith-ng[5434]: notice: stonith_manual_ack: Injecting manual confirmation that NODE2 is safely off/down
May 17 11:46:52 NODE1 stonith-ng[5434]: notice: log_operation: Operation 'off' [0] (call 2 from stonith_admin.10959) for host 'NODE2' with device 'manual_ack' returned: 0 (OK)
May 17 11:46:52 NODE1 stonith-ng[5434]: error: crm_abort: do_local_reply: Triggered assert at main.c:241 : client_obj->request_id
May 17 11:46:52 NODE1 stonith-ng[5434]: error: crm_abort: crm_ipcs_sendv: Triggered assert at ipc.c:575 : header->qb.id != 0
May 17 11:47:35 NODE1 stonith_admin[11162]: notice: crm_log_args: Invoked: stonith_admin -C NODE2
May 17 11:47:35 NODE1 stonith-ng[5434]: notice: merge_duplicates: Merging stonith action off for node NODE2 originating from client stonith_admin.11162.b42172b1 with identical request from stonith_admin.10959 at NODE1.f2048550 (0s)
May 17 11:47:35 NODE1 stonith-ng[5434]: notice: stonith_manual_ack: Injecting manual confirmation that NODE2 is safely off/down
May 17 11:47:35 NODE1 stonith-ng[5434]: notice: log_operation: Operation 'off' [0] (call 2 from stonith_admin.11162) for host 'NODE2' with device 'manual_ack' returned: 0 (OK)
May 17 11:47:35 NODE1 stonith-ng[5434]: error: crm_abort: do_local_reply: Triggered assert at main.c:241 : client_obj->request_id
May 17 11:47:35 NODE1 stonith-ng[5434]: error: crm_abort: crm_ipcs_sendv: Triggered assert at ipc.c:575 : header->qb.id != 0
Nothing happened after stonith_admin -C.
Fenced still trying to fence_pcmk, and I see lots of "Timer expired" from stonith-ng, and failed fence_ipmilan operations.
Yes, I can do fence_ack_manual on cman-master node, and then cleanup node state with cibadmin, but it is very sloooow way.
If I lost many servers in cluster, for example, lost power in one rack with two or more servers, I need a way to running again services on remaining nodes as quickly as possible.
I think fencing manual acknowledgement must be fast and simple and I suppose that stonith_admin --confirm have to do that.
More information about the Pacemaker
mailing list