[Pacemaker] "probe" operations always use cluster default operation timeout
Ron Kerry
rkerry at sgi.com
Thu Nov 18 12:54:33 UTC 2010
Hi Tim,
On 11/17/2010 9:33 PM, Tim Serong wrote:
> Hi Ron,
>
> On 11/18/2010 at 11:26 AM, Ron Kerry <rkerry at sgi.com> wrote:
> > I have noted a problem that exists in both SLE11-HAE and SLE11-HAE-SP1
> > distributions with the
> > "probe" operation that takes place when openais is first started on a node
> > to determine whether a
> > resource is actively running or not.
> >
> > Nov 17 17:47:07 gto2 lrmd: [13475]: debug: on_msg_perform_op: add an
> > operation operation monitor[2]
> > on ocf::cxfs::CXFS for client 13478, its parameters:
> > crm_feature_set=[3.0.2]
> > volnames=[dmfhome,dmfjrnls,dmfspool,dmftmp,diskmsp,data]
> > CRM_meta_timeout=[20000] to the operation list.
> > Nov 17 17:47:07 gto2 corosync[13452]: [TOTEM ] mcasted message added to
> > pending queue
> > Nov 17 17:47:07 gto2 crmd: [13478]: info: te_rsc_command: Initiating action
> > 12: monitor
> > CXFS_monitor_0 on gto3
> > Nov 17 17:47:07 gto2 lrmd: [13475]: info: rsc:CXFS:2: probe
> >
> > Note that the timeout for this operation is 20s (20000ms). Note also that it
> > is the monitor
> > operation for the resource that is actually called. The monitor operation
> > timeout for this resource
> > is set to 60s. Even manually defining a "probe" operation for the resource
> > with a longer timeout is
> > not effective. The timeout that is being used for this operation is the
> > cluster default operation
> > timeout.
>
> A probe is a special case of the monitor op, with an interval of 0.
> Try configuring it like this:
>
> primitive CXFS ocf:sgi:cxfs \
> op monitor interval="60s" timeout="60s" \
> op start timeout="600s" \
> op stop timeout="600s" \
> op monitor interval="0" timeout="600s"
>
> The timeout of 600s on the monitor op with the interval of zero should
> thus be used when doing the probe. The timeout of 60s should be used
> on the recurring monitor op with the 60s interval.
>
This works like a charm!
Nov 18 06:27:36 prod lrmd: [4565]: debug: on_msg_perform_op: add an operation
operation monitor[2] on ocf::cxfs::CXFS for client 4568, its parameters:
CRM_meta_op_target_rc=[7] CRM_meta_start_delay=[0]
volnames=[lun3s0,lun3s1,lun3s2,lun3s3,lun3s4,lun0s0,lun0s1,lun2s0]
CRM_meta_timeout=[600000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] to
the operation list.
The probe operation timeout is 600s even though my cluster default operation timeout is set to 20s.
Thanks again! - Ron
--
Ron Kerry rkerry at sgi.com
More information about the Pacemaker
mailing list