[ClusterLabs] Sudden stop of pacemaker functions

Wed Feb 17 13:48:46 UTC 2016

On 17/02/16 15:15 +0200, Klechomir wrote:
> Here is the output from your command:
> 
> attrd: 609413
> cib: 609409
> corosync: 608778
> crmd: 609415
> lrmd: 609412
> pengine: 609414
> pacemakerd: 609407
> stonithd: 609411

This may mean that you are triggering this nasty bug in libqb:
https://github.com/ClusterLabs/libqb/pull/162
(fixed in libqb-0.17.2)

> Regarding using a newer version, that's what I've been thinking about, but
> I've been using this combination of corosync/pacemaker for many years on a
> different hardware and hever had similar problem.
> The main difference is that I have stonith enabled only the problematic
> cluster, but I also suspect that the node, which causes this problem may
> have some hardware issues.

Stonith/fencing should be configured at any cluster to satisfy fully
what HA clusters are for, full stop.

> BTW my last few tests with the newest corosync/pacemaker gave me very
> annoying delay, when commiting configuration changes (maybe it's a known
> problem?).

Cannot comment on this but definitely good to be aware of possible
performance regressions.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160217/28f761d2/attachment-0004.sig>