[Pacemaker] stonith q

Alex Samad - Yieldbroker Alex.Samad at yieldbroker.com
Tue Nov 4 22:39:47 UTC 2014



> -----Original Message-----
> From: Digimer [mailto:lists at alteeve.ca]
> Sent: Wednesday, 5 November 2014 8:54 AM
> To: Alex Samad - Yieldbroker; Andrei Borzenkov
> Cc: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] stonith q
> 
> On 04/11/14 02:45 PM, Alex Samad - Yieldbroker wrote:
> > {snip}
> >>>> Any pointers to a frame work somewhere ?
> >>>
> >>> I do not think there is any formal stonith agent developers guide;
> >>> take at any existing agent like external/ipmi and modify to suite
> >>> your
> >> needs.
> >>>
> >>>> Does fenced have any handlers, I notice it logs a message in syslog
> >>>> and
> >> cluster log is there a chance to capture the event there ?
> >>>
> >>> I do not have experience with RH CMAN, sorry. But from what I
> >>> understand fenced and stonithd agents are compatible.
> >>
> >> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
> >
> >
> > Thanks
> >
> >>
> >> Note the return codes. Also, not listed there, is the requirement
> >> that an agent print it's XML validation data. You can see example of
> >> what this looks like by calling 'fence_ipmilan -o metadata' (or any
> >> other
> >> fence_* agent).
> >>
> >> For the record, I think this is a bad idea.
> >
> > So lots of people have said this is bad idea and maybe I am miss
> understanding something.
> >
> >  From my observation of my 2 node cluster, when inter cluster comms has
> an issues 1 node kills the other node.
> > Lets say A + B.
> > A is currently running the resources, B get elected to die.
> 
> Nothing is "selected". Both nodes will initiate a fence, but if you set
> 'delay="15"' for the node "A" fence method, the node B will pause for 15
> seconds before acting on the fence request. If node A saw no delay on node
> B, it will immediately proceed with the fence action. In this way, node A will
> always be faster than node B, so node B will always lose in a fence race like
> this.

Okay, maybe I am reading this wrong. So example of what happened last night

demorp1
=======
Nov  4 23:21:34 demorp1 corosync[23415]:   [TOTEM ] A processor failed, forming new configuration.
Nov  4 23:21:36 demorp1 corosync[23415]:   [CMAN  ] quorum lost, blocking activity
Nov  4 23:21:36 demorp1 corosync[23415]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov  4 23:21:36 demorp1 corosync[23415]:   [QUORUM] Members[1]: 1
Nov  4 23:21:36 demorp1 corosync[23415]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  4 23:21:36 demorp1 corosync[23415]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.218.51) ; members(old:2 left:1)
Nov  4 23:21:36 demorp1 corosync[23415]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov  4 23:21:37 demorp1 corosync[23415]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  4 23:21:37 demorp1 corosync[23415]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.218.51) ; members(old:1 left:0)
Nov  4 23:21:37 demorp1 corosync[23415]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov  4 23:21:37 demorp1 kernel: dlm: closing connection to node 2
Nov  4 23:21:37 demorp1 corosync[23415]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  4 23:21:37 demorp1 corosync[23415]:   [CMAN  ] quorum regained, resuming activity
Nov  4 23:21:37 demorp1 corosync[23415]:   [QUORUM] This node is within the primary component and will provide service.
Nov  4 23:21:37 demorp1 corosync[23415]:   [QUORUM] Members[2]: 1 2
Nov  4 23:21:37 demorp1 corosync[23415]:   [QUORUM] Members[2]: 1 2
Nov  4 23:21:37 demorp1 corosync[23415]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.218.51) ; members(old:1 left:0)
Nov  4 23:21:37 demorp1 corosync[23415]:   [MAIN  ] Completed service synchronization, ready to provide service.

>>>>> I read to mean that demorp2 killed this node  >>> Nov  4 23:21:37 demorp1 corosync[23415]: cman killed by node 2 because we were killed by cman_tool or other application

Nov  4 23:21:37 demorp1 pacemakerd[24093]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Nov  4 23:21:37 demorp1 pacemakerd[24093]:    error: mcp_cpg_destroy: Connection destroyed
Nov  4 23:21:37 demorp1 stonith-ng[24100]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Nov  4 23:21:37 demorp1 dlm_controld[23497]: cluster is down, exiting
Nov  4 23:21:37 demorp1 fenced[23483]: cluster is down, exiting

>>> This is what I would like to capture and do something with


Nov  4 23:21:37 demorp1 fenced[23483]: daemon cpg_dispatch error 2
Nov  4 23:21:37 demorp1 fenced[23483]: cpg_dispatch error 2
Nov  4 23:21:37 demorp1 gfs_controld[23559]: cluster is down, exiting
Nov  4 23:21:37 demorp1 gfs_controld[23559]: daemon cpg_dispatch error 2
Nov  4 23:21:37 demorp1 stonith-ng[24100]:    error: stonith_peer_cs_destroy: Corosync connection terminated
Nov  4 23:21:37 demorp1 attrd[24101]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Nov  4 23:21:37 demorp1 attrd[24101]:     crit: attrd_cs_destroy: Lost connection to Corosync service!
Nov  4 23:21:37 demorp1 attrd[24101]:   notice: main: Exiting...
Nov  4 23:21:37 demorp1 attrd[24101]:   notice: main: Disconnecting client 0x14ab240, pid=24102...
Nov  4 23:21:37 demorp1 crmd[24102]:   notice: peer_update_callback: Our peer on the DC is dead
Nov  4 23:21:37 demorp1 crmd[24102]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Nov  4 23:21:37 demorp1 crmd[24102]:    error: crmd_cs_destroy: connection terminated
Nov  4 23:21:37 demorp1 cib[24099]:  warning: qb_ipcs_event_sendv: new_event_notification (24099-24100-12): Broken pipe (32)
Nov  4 23:21:37 demorp1 cib[24099]:  warning: cib_notify_send_one: Notification of client crmd/0a81732f-ee8e-4e97-bd8e-a45e2f360a0f failed
Nov  4 23:21:37 demorp1 cib[24099]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Nov  4 23:21:37 demorp1 cib[24099]:    error: cib_cs_destroy: Corosync connection lost!  Exiting.
Nov  4 23:21:37 demorp1 crmd[24102]:   notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Nov  4 23:21:37 demorp1 attrd[24101]:    error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Nov  4 23:21:38 demorp1 lrmd[2434]:  warning: qb_ipcs_event_sendv: new_event_notification (2434-24102-6): Bad file descriptor (9)
Nov  4 23:21:38 demorp1 lrmd[2434]:  warning: send_client_notify: Notification of client crmd/3651ccf7-018a-4b0d-a6dc-f2513bd7bbe9 failed
Nov  4 23:21:38 demorp1 lrmd[2434]:  warning: send_client_notify: Notification of client crmd/3651ccf7-018a-4b0d-a6dc-f2513bd7bbe9 failed
Nov  4 23:21:39 demorp1 kernel: dlm: closing connection to node 2
Nov  4 23:21:39 demorp1 kernel: dlm: closing connection to node 1


demorp2
=======
Nov  4 23:21:37 demorp2 corosync[1734]:   [MAIN  ] Corosync main process was not scheduled for 12117.8027 ms (threshold is 8000.0000 ms). Consider token timeout increase.
Nov  4 23:21:37 demorp2 corosync[1734]:   [CMAN  ] quorum lost, blocking activity
Nov  4 23:21:37 demorp2 corosync[1734]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov  4 23:21:37 demorp2 corosync[1734]:   [QUORUM] Members[1]: 2
Nov  4 23:21:37 demorp2 corosync[1734]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  4 23:21:37 demorp2 corosync[1734]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.218.52) ; members(old:2 left:1)
Nov  4 23:21:37 demorp2 corosync[1734]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov  4 23:21:37 demorp2 corosync[1734]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  4 23:21:37 demorp2 corosync[1734]:   [CMAN  ] quorum regained, resuming activity
Nov  4 23:21:37 demorp2 corosync[1734]:   [QUORUM] This node is within the primary component and will provide service.
Nov  4 23:21:37 demorp2 corosync[1734]:   [QUORUM] Members[2]: 1 2
Nov  4 23:21:37 demorp2 corosync[1734]:   [QUORUM] Members[2]: 1 2
Nov  4 23:21:37 demorp2 corosync[1734]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.218.51) ; members(old:1 left:0)
Nov  4 23:21:37 demorp2 corosync[1734]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov  4 23:21:37 demorp2 crmd[2492]:  warning: match_down_event: No match for shutdown action on demorp1
Nov  4 23:21:37 demorp2 crmd[2492]:   notice: peer_update_callback: Stonith/shutdown of demorp1 not matched
Nov  4 23:21:37 demorp2 crmd[2492]:   notice: cman_event_callback: Membership 400: quorum lost
Nov  4 23:21:37 demorp2 crmd[2492]:   notice: cman_event_callback: Membership 400: quorum acquired
Nov  4 23:21:37 demorp2 crmd[2492]:   notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Nov  4 23:21:37 demorp2 kernel: dlm: closing connection to node 1

>>>> this is what I believe is node 2 saying to kill node 1 >>>>>>>  Nov  4 23:21:37 demorp2 fenced[1833]: telling cman to remove nodeid 1 from cluster

Nov  4 23:21:37 demorp2 fenced[1833]: receive_start 1:3 add node with started_count 1
Nov  4 23:21:51 demorp2 corosync[1734]:   [MAIN  ] Corosync main process was not scheduled for 10987.4082 ms (threshold is 8000.0000 ms). Consider token timeout increase.
Nov  4 23:21:51 demorp2 corosync[1734]:   [TOTEM ] A processor failed, forming new configuration.
Nov  4 23:21:51 demorp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:0f:15:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Nov  4 23:21:53 demorp2 corosync[1734]:   [CMAN  ] quorum lost, blocking activity
Nov  4 23:21:53 demorp2 corosync[1734]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov  4 23:21:53 demorp2 corosync[1734]:   [QUORUM] Members[1]: 2
Nov  4 23:21:53 demorp2 corosync[1734]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  4 23:21:53 demorp2 crmd[2492]:   notice: cman_event_callback: Membership 404: quorum lost
Nov  4 23:21:53 demorp2 kernel: dlm: closing connection to node 1
Nov  4 23:21:53 demorp2 corosync[1734]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.218.52) ; members(old:2 left:1)
Nov  4 23:21:53 demorp2 corosync[1734]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov  4 23:21:53 demorp2 crmd[2492]:   notice: crm_update_peer_state: cman_event_callback: Node demorp1[1] - state is now lost (was member)
Nov  4 23:21:53 demorp2 crmd[2492]:  warning: match_down_event: No match for shutdown action on demorp1
Nov  4 23:21:53 demorp2 crmd[2492]:   notice: peer_update_callback: Stonith/shutdown of demorp1 not matched
Nov  4 23:21:53 demorp2 crmd[2492]:  warning: match_down_event: No match for shutdown action on demorp1
Nov  4 23:21:53 demorp2 crmd[2492]:   notice: peer_update_callback: Stonith/shutdown of demorp1 not matched
Nov  4 23:21:53 demorp2 attrd[2490]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Nov  4 23:21:53 demorp2 attrd[2490]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ybrpstat (142)
Nov  4 23:21:53 demorp2 pengine[2491]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Nov  4 23:21:53 demorp2 pengine[2491]:   notice: LogActions: Start   ybrpip#011(demorp2)
Nov  4 23:21:53 demorp2 pengine[2491]:   notice: process_pe_message: Calculated Transition 99: /var/lib/pacemaker/pengine/pe-input-3255.bz2
Nov  4 23:21:53 demorp2 crmd[2492]:   notice: te_rsc_command: Initiating action 5: start ybrpip_start_0 on demorp2 (local)
Nov  4 23:21:53 demorp2 attrd[2490]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1414871697)
Nov  4 23:21:53 demorp2 attrd[2490]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Nov  4 23:21:54 demorp2 IPaddr2(ybrpip)[25809]: INFO: Adding inet address 10.172.218.50/24 with broadcast address 10.172.218.255 to device eth0
Nov  4 23:21:54 demorp2 IPaddr2(ybrpip)[25809]: INFO: Bringing device eth0 up
Nov  4 23:21:54 demorp2 IPaddr2(ybrpip)[25809]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.172.218.50 eth0 10.172.218.50 auto not_used not_used



> 
> > A signal is sent cman -> PK -> stonithd
> 
> Correct (basically).
> 
> >  From the logs on server B I see fenced trying to kill server B, but I don't use
> any cman/stonith agents. I would like to capture that event and use a OS
> reboot.
> 
> Then use a fabric fence method. These are ones where the network
> connection(s) to the target node is(are) severed. Thus, node B will sit there
> perpetually trying to fence node A, but failing because it can't talk to it's
> fence device (network switch, etc). Then a human can come in, examine the
> system, reboot the node and unfence the node once it has rebooted,
> restoring network connections.
> 
> I created a proof of concept fence agent doing this with D-Link switches:
> 
> https://github.com/digimer/fence_dlink_snmp
> 
> It should be easy enough to adapt to, say, call the hypervisor/host and using
> brctl to detach the virtual interfaces to the VM.
Nice, but I am in a virt world, vm's share a LUN.

My preference is to not allow each node access to vmware to shutdown or isolate the other node.
Issue with userid and password and basically security.

> 
> Or, more easily, stick with power fencing and use an external log server.
> 
> > So the problem I perceive is if server B is in a state where it can't
> > run OS locked up or crashed. I believe VMware will look after that,
> > from experience I have seen it deal with that
> 
> I'm not sure I understand... I don't use VMWare, so maybe I am missing
> something. If the node stops all processing, then it's possible the node will be
> detected as faulty and will be rebooted. However, there are many ways that
> nodes can fail. Secondly, unless something tells pacemaker that the node is
> dead, it won't know and is not allowed to assume.

What I am trying to say is there are a few states a node can be in
1) okay
2) not cluster ok, but OS is okay
3) not cluster ok, not OS okay, but server still ticking over
4) server is locked up

So for 2, if I have an agent that reboots it will work
For 3 this is the issue, the niche cache where the OS reboot will potential fail
For 4 VMware has a way to detect this and will restart the vm

I am willing to live with 3 as it is for now



> 
> > The issue is  if  B is running enough to still have a VIP (one of the
> > resources that PK looks after) is still on B and A and B can't or will
> > not shutdown via the OS. I understand that, but I would like still
> > attempt to reboot at that time
> 
> You're mistake here is assuming that the node will be operating in a defined
> state. The whole idea of fencing is to put a node that is in an unknown state
> into a known state. To do that, you must be able to fence totally outside the
> node itself. If you depend on the node behaving at all, your approach is
> flawed.
Yes and no. I am willing to accept some states, as outlined above

> 
> > I have found a simpler solution I actively poll to check if the cluster is okay.  I
> would prefer to fire a script  on an event but ..
> >
> > I'm also looking into why there is a comms problem as its 2 vm's on the
> same host on the same network, I think its starvation of cpu cycles as it’s a
> dev setup.
> 
> Why things went wrong is entirely secondary to fencing.

True but it might help to deprioritise finding my fencing solution :)

> 
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is
> trapped in the mind of a person without access to education?


More information about the Pacemaker mailing list