[Pacemaker] Cluster Volume Group is stuck

Thu May 12 07:51:21 UTC 2011

Hi David,

startup-fencing is true
stonith is enabled
stonith-timeout is 60s
stonith-action is reboot

We have a Fibre Channel SAN with multipath driver as common device
for the Volume Groups.

I have SBD Stonith
--------------- This is the SBD Setting: --------------------------

multix244:~ # sbd -d  
/dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 dump
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 60
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 120

on a similar cluster with iSCSI device and no multipath driver
there is no problem.

karl

Quoting David Coulson <david at davidcoulson.net>:

>
>
> On 5/11/11 8:07 AM, Karl Rößmann wrote:
>> we have a three node cluster with a Cluster Volume Group vgsmet.
>>
>>
>> After powering off one Node, the Volume Group is stuck.
>> One of the ERROR messages is:
>> May 11 10:50:32 multix244 crmd: [8086]: ERROR: process_lrm_event:  
>> LRM  operation vgsmet:0_monitor_60000 (38) Timed Out  
>> (timeout=60000ms)
>>
>>
>> If we power on the Node again the cluster recovers.
>
> Usually this is a fencing problem - How does your cluster manager  
> (openais) have fencing configured?
>
> David
>

-- 
Karl Rößmann				Tel. +49-711-689-1657
Max-Planck-Institut FKF       		Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart				email K.Roessmann at fkf.mpg.de