[Pacemaker] primitive resource start timeout ignored by monitor-operation
Rainer Maier
rainer.maier at thalesgroup.com
Tue Apr 17 12:41:55 CEST 2012
hi,
this is my first post to this list, therefor i ask you to be lenient towards me.
my problem is, that i configured a primitive resource like this:
primitive p_fuseesb_cellx ocf:thales:fuseesb \
params instance="cell1" fuseesb_home="/usr/lib/fuseesb"
javahome="/usr/lib/jdk1.6.0_31" \
op monitor interval="60s" timeout="45s" \
op start interval="0" timeout="45s" \
op stop interval="0" timeout="20s"
Now when i start the resource from crm, it gets started, and immediately it gets
stopped and restarted. this happens in a cycle every 1-2 seconds.
inside the corosync-log i get the following output:
Apr 17 10:48:46 c6 lrmd: [28224]: info: operation start[1538] on p_fuseesb_cellx
for client 28227: pid 27751 exited with return code 0
Apr 17 10:48:46 c6 crmd: [28227]: info: process_lrm_event: LRM operation
p_fuseesb_cellx_start_0 (call=1538, rc=0, cib-update=1633, confirmed=true) ok
Apr 17 10:48:46 c6 crmd: [28227]: info: do_lrm_rsc_op: Performing
key=1:1017:0:084c0a4a-562e-46b2-bd13-df30802c2bd5
op=p_fuseesb_cellx_monitor_60000 )
Apr 17 10:48:46 c6 lrmd: [28224]: info: rsc:p_fuseesb_cellx monitor[1539]
(pid 27830)
Apr 17 10:48:46 c6 lrmd: [28224]: info: operation monitor[1539] on
p_fuseesb_cellx for client 28227: pid 27830 exited with return code 7
Apr 17 10:48:46 c6 crmd: [28227]: info: process_lrm_event: LRM operation
p_fuseesb_cellx_monitor_60000 (call=1539, rc=7, cib-update=1634,
confirmed=false)
not running
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_ais_dispatch: Update
relayed from c7
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_local_callback: Expanded
fail-count-p_fuseesb_cellx=value++ to 225
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_trigger_update: Sending flush
op to all hosts for: fail-count-p_fuseesb_cellx (225)
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_perform_update: Sent update
2420: fail-count-p_fuseesb_cellx=225
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_ais_dispatch: Update relayed
from c7
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_trigger_update: Sending flush
op to all hosts for: last-failure-p_fuseesb_cellx (1334652551)
Apr 17 10:48:46 c6 attrd: [28225]: info: attrd_perform_update: Sent update
2422: last-failure-p_fuseesb_cellx=1334652551
Apr 17 10:48:46 c6 lrmd: [28224]: info: cancel_op: operation monitor[1539]
on p_fuseesb_cellx for client 28227, its parameters: CRM_meta_name=[monitor]
crm_feature_set=[3.0.1] fuseesb_home=[/usr/lib/fuseesb]
CRM_meta_timeout=[45000] CRM_meta_interval=[60000]
javahome=[/usr/lib/jdk1.6.0_31] instance=[cell1]
cancelled
Apr 17 10:48:46 c6 crmd: [28227]: info: do_lrm_rsc_op: Performing
key=2:1019:0:084c0a4a-562e-46b2-bd13-df30802c2bd5 op=p_fuseesb_cellx_stop_0 )
Apr 17 10:48:46 c6 lrmd: [28224]: info: rsc:p_fuseesb_cellx stop[1540]
(pid 27897)
Apr 17 10:48:46 c6 crmd: [28227]: info: process_lrm_event: LRM operation
p_fuseesb_cellx_monitor_60000 (call=1539, status=1, cib-update=0,
confirmed=true)
Cancelled
Apr 17 10:48:46 c6 lrmd: [28224]: info: RA output:
(p_fuseesb_cellx:stop:stdout) Stop FUSE ESB: fuse-esb
from what i can see, the monitor-operation is started immediately after the
start-operation. as the start-operation is not finished, the monitor detects
that it's not running and therefore, the resource get's immediately stopped
and restarted - the circle starts from the beginning.
what i don't understand is, why does pacemaker ignore the timeouts defined?
regards
Rainer
More information about the Pacemaker
mailing list