[ClusterLabs] DLM hanging when corosync is OK causes cluster to hang
Jan Pokorný
jpokorny at redhat.com
Wed Jan 20 01:04:24 UTC 2016
On 11/01/16 11:59 -0500, Digimer wrote:
> We hit a strange problem where a RAID controller on a node failed,
> causing DLM (gfs2/clvmd) to hang, but the node was never fenced. I
> assume this was because corosync was still working.
>
> Is there a way in rhel6/cman/rgmanager to have a node suicide or get
> fenced in a condition like this?
something like this in the crontab (beside cron and other components
are now the SPOF and I/O spike or DoS will finish the apocalypse)?
*/1 * * * * timeout 30s touch <file on respective fs> || fence_node <myself>
Sophistications at the components you mentioned might be preferred,
though.
--
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160120/916d8690/attachment-0004.sig>
More information about the Users
mailing list