No subject
Sun Apr 3 06:52:37 UTC 2011
> sbd -d /dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 list
> 0 multix244 clear
> 1 multix245 clear
> 2 multix246 reset multix245
suggests that multix246 actually was sent the request; and thus, should
be considered 'fenced' by the remaining cluster.
Looking back in your mails further:
>> /dev/disk/by-id/scsi-3600a0b8000420d5a00001cf14dc3a9a2-part1 dump
>> Header version : 2
>> Number of slots : 255
>> Sector size : 512
>> Timeout (watchdog) : 60
>> Timeout (allocate) : 2
>> Timeout (loop) : 1
>> Timeout (msgwait) : 120
You've set extremely long timeouts for the watchdog, and in particular
for the msgwait - this means that a fence will only be considered
completed after 120s by sbd. At the same time, you've set
stonith-timeout to 60s, so if the fence takes longer than that, it'll be
considered failed.
You've set up your cluster so that it can never complete a successful
fence - congratulations! ;-)
If you've got a legitimate reason for setting the msgwait timeout to
120s, you need to set the stonith-timeout to >120s - 140s, for example.
Regards,
Lars
--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list