[Pacemaker] Exec Failure issues.
James Horsfall (CTR)
jameshorsfall at stratosgsi.com
Tue Oct 18 18:17:13 UTC 2011
Quick update to this
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ <resources >
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ <group id="IPS" >
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ <primitive id="ETH2" >
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ <meta_attributes id="ETH2-meta_attributes" >
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ <nvpair id="ETH2-meta_attributes-is-managed"
name="is-managed" value="true" __crm_diff_marker__="added:top" />
Oct 18 18:11:20 localhost crmd: [2642]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status
change
Oct 18 18:11:20 localhost crmd: [2642]: info: need_abort: Aborting on
change to admin_epoch
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </meta_attributes>
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </primitive>
Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </group>
Oct 18 18:11:20 localhost crmd: [2642]: info: do_pe_invoke: Query 80:
Requesting the current CIB: S_POLICY_ENGINE
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </resources>
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </configuration>
Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </cib>
Oct 18 18:11:20 localhost cib: [2638]: info: cib_process_request:
Operation complete: op cib_replace for section resources
(origin=local/cibadmin/2, version=0.19.1): ok (rc=0)
Oct 18 18:11:20 localhost crmd: [2642]: info: do_pe_invoke_callback:
Invoking the PE: query=80, ref=pe_calc-dc-1318961480-51, seq=976,
quorate=1
Oct 18 18:11:20 localhost pengine: [2641]: info: unpack_config: Startup
probes: enabled
Oct 18 18:11:20 localhost pengine: [2641]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Oct 18 18:11:20 localhost pengine: [2641]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Oct 18 18:11:20 localhost pengine: [2641]: info: unpack_domains:
Unpacking domains
Oct 18 18:11:20 localhost pengine: [2641]: info:
determine_online_status: Node sgn-pau-hub1 is online
Oct 18 18:11:20 localhost pengine: [2641]: ERROR: unpack_rsc_op: Hard
error - ETH2_stop_0 failed with rc=3: Preventing ETH2 from re-starting
on sgn-pau-hub1
Oct 18 18:11:20 localhost pengine: [2641]: WARN: unpack_rsc_op:
Processing failed op ETH2_stop_0 on sgn-pau-hub1: unimplemented feature
(3)
Oct 18 18:11:20 localhost pengine: [2641]: info: native_add_running:
resource ETH2 isnt managed
Oct 18 18:11:20 localhost pengine: [2641]: WARN: unpack_rsc_op:
Processing failed op ETH3_stop_0 on sgn-pau-hub1: unknown exec error
(-2)
Oct 18 18:11:20 localhost pengine: [2641]: info: native_add_running:
resource ETH3 isnt managed
Oct 18 18:11:20 localhost pengine: [2641]: info:
determine_online_status: Node sgn-pau-hub0 is online
Oct 18 18:11:20 localhost pengine: [2641]: notice: group_print:
Resource Group: IPS
Oct 18 18:11:20 localhost pengine: [2641]: notice: native_print:
ETH2#011(ocf::heartbeat:IPaddr):#011Started sgn-pau-hub1 (unmanaged)
FAILED
Oct 18 18:11:20 localhost pengine: [2641]: notice: native_print:
ETH3#011(ocf::heartbeat:IPaddr):#011Started sgn-pau-hub1 (unmanaged)
FAILED
Oct 18 18:11:20 localhost pengine: [2641]: notice: clone_print: Clone
Set: ping-On-both
Oct 18 18:11:20 localhost pengine: [2641]: notice: short_print:
Started: [ sgn-pau-hub1 sgn-pau-hub0 ]
Oct 18 18:11:20 localhost pengine: [2641]: info: get_failcount: ETH2 has
failed INFINITY times on sgn-pau-hub1
Oct 18 18:11:20 localhost pengine: [2641]: WARN:
common_apply_stickiness: Forcing ETH2 away from sgn-pau-hub1 after
1000000 failures (max=1000000)
Oct 18 18:11:20 localhost pengine: [2641]: info: get_failcount: ETH3 has
failed INFINITY times on sgn-pau-hub1
Oct 18 18:11:20 localhost pengine: [2641]: WARN:
common_apply_stickiness: Forcing ETH3 away from sgn-pau-hub1 after
1000000 failures (max=1000000)
Oct 18 18:11:20 localhost pengine: [2641]: info: native_color: Unmanaged
resource ETH2 allocated to 'nowhere': failed
Oct 18 18:11:20 localhost pengine: [2641]: info: native_color: Unmanaged
resource ETH3 allocated to 'nowhere': failed
Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource ETH2#011(Started unmanaged)
Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource ETH3#011(Started unmanaged)
Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource peth2:0#011(Started sgn-pau-hub1)
Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource peth2:1#011(Started sgn-pau-hub0)
Oct 18 18:11:20 localhost pengine: [2641]: WARN: should_dump_input:
Ignoring requirement that ETH2_stop_0 comeplete before IPS_stopped_0:
unmanaged failed resources cannot prevent shutdown
Oct 18 18:11:20 localhost pengine: [2641]: WARN: should_dump_input:
Ignoring requirement that ETH3_stop_0 comeplete before IPS_stopped_0:
unmanaged failed resources cannot prevent shutdown
Oct 18 18:11:20 localhost pengine: [2641]: WARN: should_dump_input:
Ignoring requirement that ETH3_stop_0 comeplete before IPS_stopped_0:
unmanaged failed resources cannot prevent shutdown
Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Oct 18 18:11:20 localhost crmd: [2642]: info: unpack_graph: Unpacked
transition 15: 2 actions in 2 synapses
Oct 18 18:11:20 localhost crmd: [2642]: info: do_te_invoke: Processing
graph 15 (ref=pe_calc-dc-1318961480-51) derived from
/var/lib/pengine/pe-input-73.bz2
Oct 18 18:11:20 localhost crmd: [2642]: info: te_pseudo_action: Pseudo
action 15 fired and confirmed
Oct 18 18:11:20 localhost crmd: [2642]: info: te_pseudo_action: Pseudo
action 16 fired and confirmed
Oct 18 18:11:20 localhost crmd: [2642]: info: run_graph:
====================================================
Oct 18 18:11:20 localhost crmd: [2642]: notice: run_graph: Transition 15
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-73.bz2): Complete
Oct 18 18:11:20 localhost crmd: [2642]: info: te_graph_trigger:
Transition 15 is now complete
Oct 18 18:11:20 localhost crmd: [2642]: info: notify_crmd: Transition 15
status: done - <null>
Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition:
Starting PEngine Recheck Timer
Oct 18 18:11:20 localhost pengine: [2641]: info: process_pe_message:
Transition 15: PEngine Input stored in: /var/lib/pengine/pe-input-73.bz2
Oct 18 18:12:02 localhost cib: [2638]: info: cib_stats: Processed 147
operations (5102.00us average, 0% utilization) in the last 10min
From: James Horsfall (CTR) [mailto:jameshorsfall at stratosgsi.com]
Sent: Tuesday, October 18, 2011 1:39 PM
To: pacemaker at oss.clusterlabs.org
Subject: [Pacemaker] Exec Failure issues.
Hello all, I'm having some problems getting resources to fail over
properly I need the IP's to swith to a different node when it cannot
ping. We're doing a "shut" on the respective interfaces to simulate
cables being unplugged but I keep getting exec timeouts and unknown
errors.
crm_mon -fortA
============
Last updated: Tue Oct 18 17:32:10 2011
Stack: openais
Current DC: sgn-pau-hub0 - partition with quorum
Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ sgn-pau-hub0 sgn-pau-hub1 ]
Full list of resources:
Resource Group: IPS
ETH2 (ocf::heartbeat:IPaddr): Started sgn-pau-hub0
(unmanaged) FAILED
ETH3 (ocf::heartbeat:IPaddr): Started sgn-pau-hub0
(unmanaged) FAILED
Clone Set: ping-On-both
peth2:1 (ocf::pacemaker:ping): Started sgn-pau-hub0 FAILED
Started: [ sgn-pau-hub1 ]
Node Attributes:
* Node sgn-pau-hub0: #sometimes this says :1000 (degraded)
* Node sgn-pau-hub1:
+ pingd : 2000
Operations:
* Node sgn-pau-hub0:
ETH2: migration-threshold=1000000
+ (5) start: last-rc-change='Tue Oct 18 17:28:11 2011' last-run='Tue
Oct 18 17:28:11 2011' exec-time=100ms queue-time=0ms rc=0 (ok)
+ (7) monitor: interval=30000ms last-rc-change='Tue Oct 18 17:28:11
2011' last-run='Tue Oct 18 17:28:41 2011' exec-time=30ms queue-time=0ms
rc=0 (
ok)
+ (15) stop: last-rc-change='Tue Oct 18 17:30:29 2011' last-run='Tue
Oct 18 17:30:09 2011' exec-time=20000ms queue-time=0ms rc=-2 (unknown
exec er
ror)
ETH3: migration-threshold=1000000
+ (8) start: last-rc-change='Tue Oct 18 17:28:11 2011' last-run='Tue
Oct 18 17:28:11 2011' exec-time=80ms queue-time=0ms rc=0 (ok)
+ (9) monitor: interval=30000ms last-rc-change='Tue Oct 18 17:28:11
2011' last-run='Tue Oct 18 17:28:41 2011' exec-time=30ms queue-time=0ms
rc=0 (
ok)
+ (12) stop: last-rc-change='Tue Oct 18 17:29:45 2011' last-run='Tue
Oct 18 17:29:25 2011' exec-time=20000ms queue-time=0ms rc=-2 (unknown
exec er
ror)
peth2:1: migration-threshold=1000000
+ (24) stop: last-rc-change='Tue Oct 18 17:33:10 2011' last-run='Tue
Oct 18 17:33:10 2011' exec-time=10020ms queue-time=0ms rc=0 (ok)
+ (25) start: last-rc-change='Tue Oct 18 17:33:20 2011'
last-run='Tue Oct 18 17:33:20 2011' exec-time=19030ms queue-time=0ms
rc=1 (unknown error)
* Node sgn-pau-hub1:
peth2:0: migration-threshold=1000000
+ (5) start: last-rc-change='Tue Oct 18 17:26:36 2011' last-run='Tue
Oct 18 17:26:36 2011' exec-time=8070ms queue-time=0ms rc=0 (ok)
+ (6) monitor: interval=10000ms last-rc-change='Tue Oct 18 17:26:45
2011' last-run='Tue Oct 18 17:27:21 2011' exec-time=8030ms
queue-time=0ms rc=0
(ok)
Failed actions:
ETH2_stop_0 (node=sgn-pau-hub0, call=15, rc=-2, status=Timed Out):
unknown exec error
ETH3_stop_0 (node=sgn-pau-hub0, call=12, rc=-2, status=Timed Out):
unknown exec error
peth2:1_start_0 (node=sgn-pau-hub0, call=25, rc=1, status=complete):
unknown error
-------------------------------------------------------CRM
configuration-----------------------------------------------------
ode sgn-pau-hub0
node sgn-pau-hub1
primitive ETH2 ocf:heartbeat:IPaddr \
params ip="10.151.9.42" cidr_netmask="255.255.255.248"
nic="eth2" \
op monitor interval="30s" timeout="60" \
meta target-role="Started" allow-migrate="true"
primitive ETH3 ocf:heartbeat:IPaddr \
params ip="10.151.9.49" cidr_netmask="255.255.255.248"
nic="eth3" \
op monitor interval="30s" timeout="60" \
meta target-role="Started" allow-migrate="true"
primitive peth2 ocf:pacemaker:ping \
params multiplier="1000" host_list="10.151.9.41 10.151.9.50" \
operations $id="peth2-operations" \
op monitor interval="10" timeout="20"
group IPS ETH2 ETH3 \
meta target-role="Started"
clone ping-On-both peth2 \
meta target-role="Started"
location UPchk IPS \
rule $id="UPchk-rule" pingd: defined pingd
property $id="cib-bootstrap-options" \
dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
cluster-infrastructure="openais" \
stonith-enabled="false" \
default-resource-stickiness="100" \
no-quorum-policy="ignore" \
last-lrm-refresh="1318948973" \
expected-quorum-votes="2"
-------------------------------------------- Cib.xml
----------------------------------------------------------------
<?xml version="1.0" ?>
<cib admin_epoch="0" crm_feature_set="3.0.2" dc-uuid="sgn-pau-hub0"
epoch="10" have-quorum="1" num_updates="5"
validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="100"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh"
name="last-lrm-refresh" value="1318948973"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="2"/>
</cluster_property_set>
</crm_config>
<rsc_defaults/>
<op_defaults/>
<nodes>
<node id="sgn-pau-hub1" type="normal" uname="sgn-pau-hub1"/>
<node id="sgn-pau-hub0" type="normal" uname="sgn-pau-hub0"/>
</nodes>
<resources>
<clone id="ping-On-both">
<meta_attributes id="ping-On-both-meta_attributes">
<nvpair id="ping-On-both-meta_attributes-target-role"
name="target-role" value="Started"/>
</meta_attributes>
<primitive class="ocf" id="peth2" provider="pacemaker"
type="ping">
<instance_attributes id="peth2-instance_attributes">
<nvpair id="peth2-instance_attributes-multiplier"
name="multiplier" value="1000"/>
<nvpair id="peth2-instance_attributes-host_list"
name="host_list" value="10.151.9.41 10.151.9.50"/>
</instance_attributes>
<operations id="peth2-operations">
<op id="peth2-monitor-10" interval="10" name="monitor"
timeout="20"/>
</operations>
</primitive>
</clone>
<group id="IPS">
<meta_attributes id="IPS-meta_attributes">
<nvpair id="IPS-meta_attributes-target-role"
name="target-role" value="Started"/>
</meta_attributes>
<primitive class="ocf" id="ETH2" provider="heartbeat"
type="IPaddr">
<instance_attributes id="ETH2-instance_attributes">
<nvpair id="ETH2-instance_attributes-ip" name="ip"
value="10.151.9.42"/>
<nvpair id="ETH2-instance_attributes-cidr_netmask"
name="cidr_netmask" value="255.255.255.248"/>
<nvpair id="ETH2-instance_attributes-nic" name="nic"
value="eth2"/>
</instance_attributes>
<operations>
<op id="ETH2-monitor-30s" interval="30s" name="monitor"
timeout="60"/>
</operations>
<meta_attributes id="ETH2-meta_attributes">
<nvpair id="ETH2-meta_attributes-target-role"
name="target-role" value="Started"/>
<nvpair id="ETH2-meta_attributes-allow-migrate"
name="allow-migrate" value="true"/>
</meta_attributes>
</primitive>
<primitive class="ocf" id="ETH3" provider="heartbeat"
type="IPaddr">
<instance_attributes id="ETH3-instance_attributes">
<nvpair id="ETH3-instance_attributes-ip" name="ip"
value="10.151.9.49"/>
<nvpair id="ETH3-instance_attributes-cidr_netmask"
name="cidr_netmask" value="255.255.255.248"/>
<nvpair id="ETH3-instance_attributes-nic" name="nic"
value="eth3"/>
</instance_attributes>
<operations>
<op id="ETH3-monitor-30s" interval="30s" name="monitor"
timeout="60"/>
</operations>
<meta_attributes id="ETH3-meta_attributes">
<nvpair id="ETH3-meta_attributes-target-role"
name="target-role" value="Started"/>
<nvpair id="ETH3-meta_attributes-allow-migrate"
name="allow-migrate" value="true"/>
</meta_attributes>
</primitive>
</group>
</resources>
<constraints>
<rsc_location id="UPchk" rsc="IPS">
<rule id="UPchk-rule" score-attribute="pingd">
<expression attribute="pingd" id="UPchk-expression"
operation="defined"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111018/3b661a3c/attachment.htm>
More information about the Pacemaker
mailing list