[Pacemaker] CMAN integration questions

Wed Mar 23 12:56:24 UTC 2011

Hi Andrew,

23.12.2010 14:14, Andrew Beekhof wrote:
...
>> Especially I need to understand how pacemaker integrates with cman's
>> fencing/dlm subsystem:
>> *) Do I need to configure fencing in both cman and pacemaker?
> 
> No.  Just in Pacemaker.
> fenced spins waiting for Pacemaker to make an API call that tells it
> that fencing completed, at which point the dlm can continue.

It doesn't seem to be enough even with c6a01b02950b:
When I killall -9 corosync on one node (vd01-b, cman id 2) which by the
chance was a DC, the I have following in log on will-be-new-DC (vd01-d)
which again by chance run stonith resource for vd01-b (only relevant log
lines):
============
Mar 23 10:08:49 vd01-d corosync[1630]:   [TOTEM ] A processor failed,
forming new configuration.
Mar 23 10:09:01 vd01-d kernel: dlm: closing connection to node 2
Mar 23 10:09:01 vd01-d crmd: [1875]: info: cman_event_callback:
Membership 1582268: quorum retained
Mar 23 10:09:01 vd01-d crmd: [1875]: info: ais_status_callback: status:
vd01-b is now lost (was member)
Mar 23 10:09:01 vd01-d crmd: [1875]: info: crm_update_peer: Node vd01-b:
id=2 state=lost (new) addr=(null) votes=0 born=1582212 seen=1582264
proc=00000000000000000000000000111312
Mar 23 10:09:01 vd01-d corosync[1630]:   [CLM   ] Members Left:
Mar 23 10:09:01 vd01-d crmd: [1875]: WARN: check_dead_member: Our DC
node (vd01-b) left the cluster
Mar 23 10:09:01 vd01-d corosync[1630]:   [CLM   ] #011r(0) ip(10.5.4.65)
Mar 23 10:09:01 vd01-d crmd: [1875]: info: send_ais_text: Peer
overloaded or membership in flux: Re-sending message (Attempt 1 of 20)
Mar 23 10:09:01 vd01-d corosync[1630]:   [QUORUM] Members[15]: 1 3 4 5 6
7 8 9 10 11 12 13 14 15 16
Mar 23 10:09:02 vd01-d corosync[1630]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Mar 23 10:09:02 vd01-d fenced[1688]: fencing deferred to vd01-a
Mar 23 10:09:02 vd01-d crmd: [1875]: info: update_dc: Unset DC vd01-b
============

At this time fenced (on vd01-a which has cman id 1 and is a fencing
domain master) tries to kill that node but fails:
============
Mar 23 10:09:02 vd01-a fenced[1748]: fencing node vd01-b
Mar 23 10:09:02 vd01-a fenced[1748]: fence vd01-b dev 0.0 agent none
result: error no method
Mar 23 10:09:02 vd01-a fenced[1748]: fence vd01-b failed
Mar 23 10:09:05 vd01-a fenced[1748]: fencing node vd01-b
Mar 23 10:09:05 vd01-a fenced[1748]: fence vd01-b dev 0.0 agent none
result: error no method
Mar 23 10:09:05 vd01-a fenced[1748]: fence vd01-b failed
Mar 23 10:09:08 vd01-a fenced[1748]: fencing node vd01-b
Mar 23 10:09:08 vd01-a fenced[1748]: fence vd01-b dev 0.0 agent none
result: error no method
Mar 23 10:09:08 vd01-a fenced[1748]: fence vd01-b failed
============
All DLM-related staff is blocked.

After 1 minute vd01-d takes over DC role.
============
Mar 23 10:10:03 vd01-d crmd: [1875]: info: update_dc: Set DC to vd01-d
(3.0.5)
============
After that all monitoring operations on resources which depend on DLM
(LVM, GFS) fail with timeout, all dependent resources are then stopped,
so cluster stops to be highly available.

And only almost one more minute later pacemaker decides to stonith vd01-b:
============
Mar 23 10:10:54 vd01-d crmd: [1875]: WARN: match_down_event: No match
for shutdown action on vd01-b
Mar 23 10:10:54 vd01-d crmd: [1875]: info: te_update_diff:
Stonith/shutdown of vd01-b not matched
Mar 23 10:10:55 vd01-d pengine: [1874]: WARN: pe_fence_node: Node vd01-b
will be fenced because it is un-expectedly down
Mar 23 10:10:55 vd01-d pengine: [1874]: WARN: determine_online_status:
Node vd01-b is unclean
============

and one minute later vd01-b is finally fenced.
============
Mar 23 10:12:17 vd01-a crmd: [1935]: info: tengine_stonith_notify: Peer
vd01-b was terminated (reboot) by vd01-d for vd01-d
(ref=05cd139e-585d-452e-a22d-0ef188a64d81): OK
Mar 23 10:12:17 vd01-a crmd: [1935]: notice: tengine_stonith_notify:
Notified CMAN that 'vd01-b' is now fenced
Mar 23 10:12:17 vd01-a crmd: [1935]: notice: tengine_stonith_notify:
Confirmed CMAN fencing event for 'vd01-b'
Mar 23 10:12:17 vd01-a fenced[1748]: fence vd01-b overridden by
administrator intervention
============

Overall it took (10:08:49 - 10:12:17) three and a half minutes to fence
failed node.
So, for this kind of failures (crash of corosync) it could be much more
safer to duplicate fencing in both cman and pacemaker, because it would
take only 15-20 seconds to do the same. I'll check it a bit later, need
to configure fencing in cman, and also check a case when fencing domain
master fails.
Alternative could be if fenced asks pacemaker to fence failed node (is
this done this way?), but this will not help much if DC (my case) fails
because election of new DC takes some time too and (I assume) pacemaker
will refuse to do fencing without DC. And this time is enough for
monitor ops to fail (yes, I can configure bigger timeouts, but I
generally want cluster to be as smart as possible).

Would you please comment on this?

Best,
Vladislav