[Pacemaker] stonith q
Alex Samad - Yieldbroker
Alex.Samad at yieldbroker.com
Tue Nov 4 20:45:41 CET 2014
{snip}
> >> Any pointers to a frame work somewhere ?
> >
> > I do not think there is any formal stonith agent developers guide;
> > take at any existing agent like external/ipmi and modify to suite your
> needs.
> >
> >> Does fenced have any handlers, I notice it logs a message in syslog and
> cluster log is there a chance to capture the event there ?
> >
> > I do not have experience with RH CMAN, sorry. But from what I
> > understand fenced and stonithd agents are compatible.
>
> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
Thanks
>
> Note the return codes. Also, not listed there, is the requirement that an
> agent print it's XML validation data. You can see example of what this looks
> like by calling 'fence_ipmilan -o metadata' (or any other
> fence_* agent).
>
> For the record, I think this is a bad idea.
So lots of people have said this is bad idea and maybe I am miss understanding something.
From my observation of my 2 node cluster, when inter cluster comms has an issues 1 node kills the other node.
Lets say A + B.
A is currently running the resources, B get elected to die.
A signal is sent cman -> PK -> stonithd
From the logs on server B I see fenced trying to kill server B, but I don't use any cman/stonith agents. I would like to capture that event and use a OS reboot.
So the problem I perceive is if server B is in a state where it can't run OS locked up or crashed. I believe VMware will look after that, from experience I have seen it deal with that
The issue is if B is running enough to still have a VIP (one of the resources that PK looks after) is still on B and A and B can't or will not shutdown via the OS. I understand that, but I would like still attempt to reboot at that time
I have found a simpler solution I actively poll to check if the cluster is okay. I would prefer to fire a script on an event but ..
I'm also looking into why there is a comms problem as its 2 vm's on the same host on the same network, I think its starvation of cpu cycles as it’s a dev setup.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is
> trapped in the mind of a person without access to education?
More information about the Pacemaker
mailing list