[Pacemaker] monitor operation cancel question
Phil Armstrong
pma at sgi.com
Tue Sep 21 18:58:19 UTC 2010
Hi,
This is my first post to this list so if I'm doing this wrong, please be
patient. I am using pacemaker-1.1.2-0.2.1 on sles11sp1. Thanks in
advance for any help anyone can give me.
I have configured an HA group with a number of primitives. As part of adding
the primitives, we do a simple manual relocation of the group from one
node to the other and back again after each prmitive has been added to
the group to verfiy the config as we go. I had done this for 7 primitives.
After I added this last primitive, doing a relocation:
crm_resource -r dmfGroup -M -H node2
This results in the primitive that was added last (tmf), and which gets
stopped first, failing becuase the periodic monitor operation for the
resource doesn't get cancelled like the rest of the primitive's monitor
ops do. I have added a snippet from the log. Notice that the VirtualIP
and dskmspfs primitives get their monitor ops cancelled, but tmf
didn't. There is a process_pe_message with an error during what appears
to be the tmf primitive processing, but I can find no errors using
crm_verify no mattter how many V's is use to indicate a problem with
tmf's config. the tmf resource works fine until it is time to relocate.
Does anyone have any clues for me where to look next ?
Here is tmf's xml:
<primitive class="ocf" id="tmf" provider="sgi" type="tmf">
<operations id="tmf-operations">
<op id="tmf-op-monitor-30s" interval="30s" name="monitor"
on-fail="restart" start-delay="480s" timeout="30s"/>
<op id="tmf-op-start-0" interval="0" name="start"
on-fail="restart" requires="fencing" timeout="480s"/>
<op id="tmf-op-stop-0" interval="0" name="stop"
on-fail="fence" timeout="480s"/>
</operations>
<instance_attributes id="tmf-instance_attributes">
<nvpair id="tmf-instance_attributes-devgrpnames"
name="devgrpnames" value="drives"/>
<nvpair id="tmf-instance_attributes-mindevsup"
name="mindevsup" value="1"/>
<nvpair id="tmf-instance_attributes-devtimeout"
name="devtimeout" value="240"/>
<nvpair id="tmf-instance_attributes-loader_names"
name="loader_names" value="sl3000"/>
<nvpair id="tmf-instance_attributes-loader_hosts"
name="loader_hosts" value="acsls1"/>
<nvpair id="tmf-instance_attributes-loader_users"
name="loader_users" value="root,acssa"/>
<nvpair id="tmf-instance_attributes-loader_passwords"
name="loader_passwords" value="weasel!,dog3ear"/>
</instance_attributes>
<meta_attributes id="tmf-meta_attributes">
<nvpair id="tmf-meta_attributes-resource-stickiness"
name="resource-stickiness" value="1"/>
<nvpair id="tmf-meta_attributes-migration-threshold"
name="migration-threshold" value="1"/>
</meta_attributes>
</primitive>
Here's the log:
Sep 21 10:35:45 pry crm_resource: [14746]: info: Invoked: crm_resource
-r dmfGroup -M -H punch
Sep 21 10:35:45 pry cib: [5597]: info: cib_process_request: Operation
complete: op cib_delete for section constraints
(origin=local/crm_resource/3, version=0.85.17): ok (rc=0)
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: -
<cib admin_epoch="0" epoch="85" num_updates="17" />
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +
<cib admin_epoch="0" epoch="86" num_updates="1" >
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +
<configuration >
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +
<constraints >
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:
+ <rsc_location id="cli-prefer-dmfGroup" rsc="dmfGroup"
__crm_diff_marker__="added:top" >
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:
+ <rule id="cli-prefer-rule-dmfGroup" score="INFINITY"
boolean-op="and" >
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:
+ <expression id="cli-prefer-expr-dmfGroup" attribute="#uname"
operation="eq" value="punch" type="string" />
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:
+ </rule>
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:
+ </rsc_location>
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +
</constraints>
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +
</configuration>
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </cib>
Sep 21 10:35:45 pry cib: [5597]: info: cib_process_request: Operation
complete: op cib_modify for section constraints
(origin=local/crm_resource/4, version=0.86.1): ok (rc=0)
Sep 21 10:35:45 pry crmd: [5601]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Sep 21 10:35:46 pry crmd: [5601]: info: need_abort: Aborting on change
to admin_epoch
Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.
Sep 21 10:35:46 pry crmd: [5601]: info: do_pe_invoke: Query 258:
Requesting the current CIB: S_POLICY_ENGINE
Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to
0x2000000001801b6d, ip=0x200000000042e830
Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to
0x2000000001801b7d, ip=0x2000000000430dd0
Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to
0x2000000001801b6d, ip=0x2000000000431021
Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to
0x2000000001801b6d, ip=0x2000000000431241
Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to
0x2000000001801b6d, ip=0x2000000000431401
Sep 21 10:35:46 pry crmd: [5601]: info: do_pe_invoke_callback: Invoking
the PE: query=258, ref=pe_calc-dc-1285083346-211, seq=156, quorate=1
Sep 21 10:35:46 pry pengine: [5600]: info: unpack_config: Startup
probes: enabled
Sep 21 10:35:46 pry pengine: [5600]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Sep 21 10:35:46 pry pengine: [5600]: info: unpack_config: Node scores:
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Sep 21 10:35:46 pry pengine: [5600]: info: unpack_domains: Unpacking domains
Sep 21 10:35:46 pry pengine: [5600]: info: determine_online_status: Node
pry is online
Sep 21 10:35:46 pry pengine: [5600]: info: determine_online_status: Node
punch is online
Sep 21 10:35:46 pry pengine: [5600]: notice: group_print: Resource
Group: dmfGroup
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print:
local_xvm (ocf::sgi:lxvm): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print:
dmfusr1fs (ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print:
dmfusr2fs (ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: homefs
(ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry cib: [14747]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-86.raw
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print:
journalfs (ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: spoolfs
(ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: tmpfs
(ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: movefs
(ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print:
dskmspfs (ocf::heartbeat:Filesystem): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print:
VirtualIP (ocf::heartbeat:IPaddr2): Started pry
Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: tmf
(ocf::sgi:tmf): Started pry
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (30s) for local_xvm on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for dmfusr1fs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for dmfusr2fs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for homefs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for journalfs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for spoolfs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for tmpfs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for movefs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (20s) for dskmspfs on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (10s) for VirtualIP on punch
Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying
on_fail=fence and stonith-enabled=false makes no sense
Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start
recurring monitor (30s) for tmf on punch
Sep 21 10:35:46 pry cib: [14747]: info: write_cib_contents: Wrote
version 0.86.0 of the CIB to disk (digest: 58aef84a6b3b4a14ca7ed17f167b6fcd)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
local_xvm (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
dmfusr1fs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
dmfusr2fs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
homefs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
journalfs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
spoolfs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
tmpfs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
movefs (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
dskmspfs (Started pry -> punch)
Sep 21 10:35:46 pry cib: [14747]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.3aJ5F5 (digest:
/var/lib/heartbeat/crm/cib.TlYNDx)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
VirtualIP (Started pry -> punch)
Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource
tmf (Started pry -> punch)
Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Sep 21 10:35:46 pry crmd: [5601]: info: unpack_graph: Unpacked
transition 52: 38 actions in 38 synapses
Sep 21 10:35:46 pry crmd: [5601]: info: do_te_invoke: Processing graph
52 (ref=pe_calc-dc-1285083346-211) derived from
/var/lib/pengine/pe-input-110.bz2
Sep 21 10:35:46 pry crmd: [5601]: info: te_pseudo_action: Pseudo action
51 fired and confirmed
Sep 21 10:35:46 pry crmd: [5601]: info: te_rsc_command: Initiating
action 46: stop tmf_stop_0 on pry (local)
Sep 21 10:35:46 pry crmd: [5601]: info: do_lrm_rsc_op: Performing
key=46:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=tmf_stop_0 )
Sep 21 10:35:46 pry lrmd: [5598]: info: rsc:tmf:74: stop
Sep 21 10:35:46 pry pengine: [5600]: info: process_pe_message:
Transition 52: PEngine Input stored in: /var/lib/pengine/pe-input-110.bz2
Sep 21 10:35:46 pry pengine: [5600]: info: process_pe_message:
Configuration ERRORs found during PE processing. Please run "crm_verify
-L" to identify issues.
Sep 21 10:35:46 pry TapeReleased[8042]: |$(413)
pci0002:00:01.1/fc/500104f000acc66d-500104f000acc66e/lun0, pid=14775,
device released
Sep 21 10:35:46 pry tmf[14748]: INFO: stop_all_tmf_resources():
successfully stopped all devices
Sep 21 10:35:46 pry tmf[14748]: INFO: stop_tmf(): tmf successfully stopped
Sep 21 10:35:46 pry crmd: [5601]: info: process_lrm_event: LRM operation
tmf_stop_0 (call=74, rc=0, cib-update=259, confirmed=true) ok
Sep 21 10:35:46 pry crmd: [5601]: info: match_graph_event: Action
tmf_stop_0 (46) confirmed on pry (rc=0)
Sep 21 10:35:46 pry crmd: [5601]: info: te_rsc_command: Initiating
action 43: stop VirtualIP_stop_0 on pry (local)
Sep 21 10:35:46 pry lrmd: [5598]: info: cancel_op: operation monitor[68]
on ocf::IPaddr2::VirtualIP for client 5601, its parameters:
CRM_meta_interval=[10000] cidr_netmask=[24] nic=[eth0]
broadcast=[128.162.246.255] crm_feature_set=[3.0.2] ip=[128.162.246.8]
CRM_meta_on_fail=[restart] CRM_meta_name=[monitor]
CRM_meta_start_delay=[90000] CRM_meta_timeout=[110000]
CRM_meta_requires=[fencing] cancelled
Sep 21 10:35:46 pry crmd: [5601]: info: do_lrm_rsc_op: Performing
key=43:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=VirtualIP_stop_0 )
Sep 21 10:35:46 pry lrmd: [5598]: info: rsc:VirtualIP:75: stop
Sep 21 10:35:46 pry crmd: [5601]: info: process_lrm_event: LRM operation
VirtualIP_monitor_10000 (call=68, status=1, cib-update=0,
confirmed=true) Cancelled
Sep 21 10:35:47 pry IPaddr2[14796]: INFO: IP status = ok, IP_CIP=
Sep 21 10:35:47 pry IPaddr2[14796]: INFO: ip -f inet addr delete
128.162.246.8/24 dev eth0
Sep 21 10:35:47 pry crmd: [5601]: info: process_lrm_event: LRM operation
VirtualIP_stop_0 (call=75, rc=0, cib-update=260, confirmed=true) ok
Sep 21 10:35:47 pry crmd: [5601]: info: match_graph_event: Action
VirtualIP_stop_0 (43) confirmed on pry (rc=0)
Sep 21 10:35:47 pry crmd: [5601]: info: te_rsc_command: Initiating
action 40: stop dskmspfs_stop_0 on pry (local)
Sep 21 10:35:47 pry lrmd: [5598]: info: cancel_op: operation monitor[66]
on ocf::Filesystem::dskmspfs for client 5601, its parameters:
CRM_meta_requires=[fencing] fstype=[xfs]
device=[/dev/lxvm/tp9400_11_12_dmfbases5] crm_feature_set=[3.0.2]
options=[rw,dmapi,mtpt=/dmf/dskmsp_store] directory=[/dmf/dskmsp_store]
CRM_meta_on_fail=[restart] CRM_meta_name=[monitor]
CRM_meta_start_delay=[60000] CRM_meta_interval=[20000]
CRM_meta_timeout=[100000] cancelled
Sep 21 10:35:47 pry crmd: [5601]: info: do_lrm_rsc_op: Performing
key=40:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=dskmspfs_stop_0 )
Sep 21 10:35:47 pry lrmd: [5598]: info: rsc:dskmspfs:76: stop
Sep 21 10:35:47 pry crmd: [5601]: info: process_lrm_event: LRM operation
dskmspfs_monitor_20000 (call=66, status=1, cib-update=0, confirmed=true)
Cancelled
--
Phil Armstrong pma at sgi.com
Phone: 651-683-5561 VNET 233-5561
More information about the Pacemaker
mailing list