[Pacemaker] Bug? Resources running with realtime priority - possibly causing monitor timeouts

Thu Oct 3 23:41:00 UTC 2013

On Oct 1, 2013, at 2:41 PM, pacemaker-request at oss.clusterlabs.org wrote:
> Message: 4
> Date: Tue, 1 Oct 2013 19:22:12 +0200
> From: Dejan Muhamedagic <dejanmm at fastmail.fm>
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] Bug? Resources running with realtime priority
> 	- possibly causing monitor timeouts
> Message-ID: <20131001172212.GC6892 at walrus.homenet>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi,
> 
> On Tue, Oct 01, 2013 at 11:07:35AM +0200, Joschi Brauchle wrote:
>> Hello everyone,
>> 
>> on two (recently upgraded) SLES11SP3 machines, we are running an
>> active/passive NFS fileserver and several other high availability
>> services using corosync + pacemaker (see version numbers below).
>> 
>> We are having severe problems with resource monitors timing out
>> during our system backup at night, where the active machine is under
>> high IO load. These problems did not exist under SLES11SP1, from
>> which we just upgraded some days ago.
>> 
>> After some diagnosis, it turns out that actually all cluster
>> resources which are started by pacemaker are running with realtime
>> priority, which includes our backup service. This seems not to be
>> correct!
>> 
> Oops. Looks like neither corosync nor lrmd reset the priority and
> scheduler for their children.
> 
>> As far as we remember from SLES11SP1, the resources were not running
>> in realtime priority there. Hence, this looks like a bug in the more
>> recent pacemaker/corosync version?!?
> 
> Looks like it. Can you please open a support call.
Dejan,

Any idea if SP2 is also affected?

Fortunately, it shouldn't affect me, since I'm just managing VMs (and mounting filesystems) with pacemaker, and not spawning a bunch of long-running processes.

Joschi,

As a workaround (and potential best practice anyway), try setting elevator=deadline in the kernel boot parameters.  This will give better response under heavy I/O load.  I'm not sure how effective it will be with everything running realtime priority, but assuming you're I/O-bound rather than CPU-bound, it should help, and is something I now set on all cluster members.

Before setting this, during periods of high I/O on the SAN (such as migrating several VMs at once during 'rcopenais stop' on one node), occasionally monitor operations would time out and pacemaker would stop and start unrelated VMs needlessly, thinking they had failed.  Afterwards, no more problems.

Andrew Daugherity
Systems Analyst
Division of Research, Texas A&M University