[Pacemaker] Problems with SBD

Oriol Mula-Valls omv.lists at gmail.com
Mon Jan 12 10:21:21 EST 2015


Thanks a lot Lars. I took advantage of a crash last week to add the -P
parameter.

I'll try to read more carefully the man of sbd to increase the IO timeout.

Kind regards,
Oriol

On Wed, Jan 7, 2015 at 12:09 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:

> On 2015-01-04T19:49:58, Oriol Mula-Valls <omv.lists at gmail.com> wrote:
>
> > I have a two node system with SLES 11 SP3 (pacemaker-1.1.9-0.19.102,
> > corosync-1.4.5-0.18.15, sbd-1.1-0.13.153). Since desember we started to
> > have several reboots of the system due to SBD; 22nd, 24th and 26th. Last
> > reboot happened yesterday January 3rd. The message is the same all the
> > times.
> > /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7879]: info: Cancelling
> > IO request due to timeout (rw=0)
> > /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7879]: ERROR: mbox read
> > failed in servant.
> > /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7878]: WARN: Servant
> for
> > /dev/sdc1 (pid: 7879) has terminated
> > /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7878]: WARN: Servant
> for
> > /dev/sdc1 outdated (age: 4)
> > /var/log/messages:Jan  3 11:55:08 kernighan sbd: [8183]: info: Servant
> > starting for device /dev/sdc1
> > /var/log/messages:Jan  3 11:55:11 kernighan sbd: [8183]: info: Cancelling
> > IO request due to timeout (rw=0)
> > /var/log/messages:Jan  3 11:55:11 kernighan sbd: [8183]: ERROR: Unable to
> > read header from device 5
> > /var/log/messages:Jan  3 11:55:11 kernighan sbd: [8183]: ERROR: Not a
> valid
> > header on /dev/sdc1
> > /var/log/messages:Jan  3 11:55:11 kernighan sbd: [7878]: WARN: Servant
> for
> > /dev/sdc1 (pid: 8183) has terminated
> > /var/log/messages:Jan  3 11:55:11 kernighan sbd: [7878]: WARN: Latency:
> No
> > liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
> >
> > The sbd is an iscsi drive shared by synology box.
> >
> > Could any one provide me some guidance on what's happenning please?
>
> Those are pretty clearly IO errors due to high latency. You may need to
> increase the IO timeout, and/or figure out why the IO to your Synology
> box sometimes stalls for multiple seconds. See the manpage for this; you
> can add the required flag to /etc/sysconfig/sbd -> SBD_OPTS.
>
> You also should use a stable name (/dev/disk/by-id/...) rather than
> /dev/sdc1 - note that /dev/sdX may not be stable over reboots or iSCSI
> restarts.
>
> Further, you can avoid the reboots by enabling the pacemaker
> integration. See the manpage for details on what that flag does. (-P)
> That will be the default in later sbd versions for releases after SLE HA
> 11.
>
>
>
> Regards,
>     Lars
>
> --
> Architect Storage/HA
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild,
> Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150112/996045d9/attachment-0003.html>


More information about the Pacemaker mailing list