[Pacemaker] lvm ra timeouts and vgdisplay hang
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon Oct 22 16:28:44 UTC 2012
Hi,
On Wed, Oct 17, 2012 at 11:35:43AM +0000, James Harper wrote:
> I've been having a problem with the lvm ra when used in conjunction with clvm when a node dies (eg when I destroy the vm to test this particular scenario)
>
> clvm re-organises itself just fine, and comes good well within the lvm ra timeout I set (60 seconds), but if the "vgdisplay -v vg-drbd" command is executed by the lvm ra monitor op while clvm is learning that the node is dropped it hangs forever and the ra monitor times out.
>
> I worked around this by doing this in the monitor of the ra:
>
> rc=124
> limit=10
> while [ $limit -ge 0 -a $rc -eq 124 ]
> do
> limit=`expr $limit - 1`
> timeout --kill-after=5s 5s vgdisplay -v $1 2>&1 | grep -i 'Status[ \t]*available' 2>&1 >/dev/null
> rc=$?
> done
> return $rc
>
> which kills the hung vgdisplay if it goes more than 5 seconds (should never) and retries the operation a few times, and seems to work. Now I can kill a node without the cluster falling to pieces and going on a stonith frenzy (actually it sometimes still does, but not for that reason)
>
> Maybe someone will find this useful? Or tell me a better way to do it (other than fix the bug in vgdisplay :)?
Any calls to LVM tools were removed from the monitor action
path. I think that v3.9.3 has that change.
Thanks,
Dejan
> Thanks
>
> James
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list