[Pacemaker] Stuck in a STONITH cycle
David Parker
dparker at utica.edu
Fri Oct 12 22:58:11 CEST 2012
Hello,
I have two nodes set up in a cluster to provide a MySQL server (mysqld)
in HA on a virtual IP address. This was working fine until I had to
reboot the servers. All I did was change the interface each node uses
for its primary IP address (changed from eth1 to eth0 on each node).
Now I'm stuck in a cycle. Let's say node 1 has the virtual IP and is
running mysqld, and node 2 is down. When node 2 boots up, it will
STONITH node 1 for no apparent reason and take over the resources, which
shouldn't happen. When node 1 boots up again, it will STONITH node 2
and take over the resources, which again shouldn't happen.
Obviously, the goal is to have both nodes running, and whichever one
boots up second should see that the first one has the resources and then
stand by. I'll include the relevant log entries and my crm config at
the end of this message. In the logs, the only thing I see that seems
unusual (other than the fact that a STONITH is executed against a
running node) is this:
Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha:
Requesting the list of configured nodes
Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not calculate
UUID for ha2
Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: Node ha2:
no uuid found
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
The exact opposite shows up on the node "ha2" (it says ha1 has no
uuid). Did the UUID of each node change because the physical interface
changed? Any other ideas? Thanks in advance.
- Dave
Here are the log lines:
Oct 12 16:26:13 ha1 heartbeat: [1176]: info: Starting
"/usr/lib/heartbeat/crmd" as uid 103 gid 105 (pid 1176)
Oct 12 16:26:13 ha1 crmd: [1176]: info: crm_log_init: Changed active
directory to /var/lib/heartbeat/cores/hacluster
Oct 12 16:26:13 ha1 crmd: [1176]: info: Invoked: /usr/lib/heartbeat/crmd
Oct 12 16:26:13 ha1 crmd: [1176]: info: main: CRM Hg Version:
74392a28b7f31d7ddc86689598bd23114f58978b
Oct 12 16:26:13 ha1 crmd: [1176]: info: crmd_init: Starting crmd
Oct 12 16:26:13 ha1 crmd: [1176]: info: G_main_add_SignalHandler: Added
signal handler for signal 17
Oct 12 16:26:14 ha1 crmd: [1176]: info: do_cib_control: CIB connection
established
Oct 12 16:26:14 ha1 crmd: [1176]: info: crm_cluster_connect: Connecting
to Heartbeat
Oct 12 16:26:15 ha1 crmd: [1176]: info: register_heartbeat_conn:
Hostname: ha1
Oct 12 16:26:15 ha1 crmd: [1176]: info: register_heartbeat_conn: UUID:
1b48f410-44d1-4e89-8b52-ff23b32db1bc
Oct 12 16:26:15 ha1 crmd: [1176]: info: do_ha_control: Connected to the
cluster
Oct 12 16:26:15 ha1 crmd: [1176]: info: do_ccm_control: CCM connection
established... waiting for first callback
Oct 12 16:26:15 ha1 crmd: [1176]: info: do_started: Delaying start, CCM
(0000000000100000) not connected
Oct 12 16:26:15 ha1 crmd: [1176]: info: crmd_init: Starting crmd's mainloop
Oct 12 16:26:15 ha1 crmd: [1176]: info: config_query_callback: Checking
for expired actions every 900000ms
Oct 12 16:26:15 ha1 crmd: [1176]: notice: crmd_client_status_callback:
Status update: Client ha1/crmd now has status [online] (DC=false)
Oct 12 16:26:15 ha1 crmd: [1176]: info: crm_new_peer: Node 0 is now
known as ha1
Oct 12 16:26:15 ha1 crmd: [1176]: info: crm_update_peer_proc: ha1.crmd
is now online
Oct 12 16:26:15 ha1 crmd: [1176]: info: crmd_client_status_callback: Not
the DC
Oct 12 16:26:15 ha1 crmd: [1176]: notice: crmd_client_status_callback:
Status update: Client ha1/crmd now has status [online] (DC=false)
Oct 12 16:26:16 ha1 crmd: [1176]: info: crmd_client_status_callback: Not
the DC
Oct 12 16:26:16 ha1 crmd: [1176]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 12 16:26:16 ha1 crmd: [1176]: info: mem_handle_event: instance=1,
nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
Oct 12 16:26:16 ha1 crmd: [1176]: info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=1)
Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: NEW
MEMBERSHIP: trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=3
Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: #011CURRENT:
ha1 [nodeid=0, born=1]
Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: #011NEW:
ha1 [nodeid=0, born=1]
Oct 12 16:26:16 ha1 crmd: [1176]: info: crm_update_peer: Node ha1: id=0
state=member (new) addr=(null) votes=-1 born=1 seen=1
proc=00000000000000000000000000000200
Oct 12 16:26:16 ha1 crmd: [1176]: info: crm_update_peer_proc: ha1.ais is
now online
Oct 12 16:26:16 ha1 crmd: [1176]: info: do_started: The local CRM is
operational
Oct 12 16:26:16 ha1 crmd: [1176]: info: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Oct 12 16:27:17 ha1 crmd: [1176]: info: crm_timer_popped: Election
Trigger (I_DC_TIMEOUT) just popped!
Oct 12 16:27:17 ha1 crmd: [1176]: WARN: do_log: FSA: Input I_DC_TIMEOUT
from crm_timer_popped() received in state S_PENDING
Oct 12 16:27:17 ha1 crmd: [1176]: info: do_state_transition: State
transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Oct 12 16:27:17 ha1 crmd: [1176]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Oct 12 16:27:17 ha1 crmd: [1176]: info: do_te_control: Registering TE
UUID: 4f352feb-6be6-4af1-bb9e-af8b7ca28f58
Oct 12 16:27:17 ha1 crmd: [1176]: WARN: cib_client_add_notify_callback:
Callback already present
Oct 12 16:27:17 ha1 crmd: [1176]: info: set_graph_functions: Setting
custom graph functions
Oct 12 16:27:17 ha1 crmd: [1176]: info: unpack_graph: Unpacked
transition -1: 0 actions in 0 synapses
Oct 12 16:27:17 ha1 crmd: [1176]: info: start_subsystem: Starting
sub-system "pengine"
Oct 12 16:27:20 ha1 crmd: [1176]: info: do_dc_takeover: Taking over DC
status for this partition
Oct 12 16:27:20 ha1 crmd: [1176]: info: join_make_offer: Making join
offers based on membership 1
Oct 12 16:27:20 ha1 crmd: [1176]: info: do_dc_join_offer_all: join-1:
Waiting on 1 outstanding join acks
Oct 12 16:27:20 ha1 crmd: [1176]: info: te_connect_stonith: Attempting
connection to fencing daemon...
Oct 12 16:27:21 ha1 crmd: [1176]: info: te_connect_stonith: Connected
Oct 12 16:27:21 ha1 crmd: [1176]: info: config_query_callback: Checking
for expired actions every 900000ms
Oct 12 16:27:21 ha1 crmd: [1176]: info: update_dc: Set DC to ha1 (3.0.1)
Oct 12 16:27:21 ha1 crmd: [1176]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Oct 12 16:27:21 ha1 crmd: [1176]: info: do_state_transition: All 1
cluster nodes responded to the join offer.
Oct 12 16:27:21 ha1 crmd: [1176]: info: do_dc_join_finalize: join-1:
Syncing the CIB from ha1 to the rest of the cluster
Oct 12 16:27:22 ha1 crmd: [1176]: info: update_attrd: Connecting to attrd...
Oct 12 16:27:22 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion
of "//node_state[@uname='ha1']/transient_attributes": ok (rc=0)
Oct 12 16:27:22 ha1 crmd: [1176]: info: do_dc_join_ack: join-1: Updating
node state to member for ha1
Oct 12 16:27:22 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion
of "//node_state[@uname='ha1']/lrm": ok (rc=0)
Oct 12 16:27:22 ha1 crmd: [1176]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha:
Requesting the list of configured nodes
Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not calculate
UUID for ha2
Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: Node ha2:
no uuid found
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_dc_join_final: Ensuring DC,
quorum and node attributes are up-to-date
Oct 12 16:27:23 ha1 crmd: [1176]: info: crm_update_quorum: Updating
quorum status to true (call=21)
Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph:
do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke: Query 22:
Requesting the current CIB: S_POLICY_ENGINE
Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Oct 12 16:27:23 ha1 crmd: [1176]: info: need_abort: Aborting on change
to admin_epoch
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke: Query 23:
Requesting the current CIB: S_POLICY_ENGINE
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking
the PE: query=23, ref=pe_calc-dc-1350073643-7, seq=1, quorate=1
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Oct 12 16:27:23 ha1 crmd: [1176]: info: unpack_graph: Unpacked
transition 0: 20 actions in 20 synapses
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 0
(ref=pe_calc-dc-1350073643-7) derived from /var/lib/pengine/pe-warn-1370.bz2
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 4: monitor stonith-ha1_monitor_0 on ha1 (local)
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=4:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha1_monitor_0 )
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 5: monitor stonith-ha2_monitor_0 on ha1 (local)
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=5:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha2_monitor_0 )
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 6: monitor MysqlIP_monitor_0 on ha1 (local)
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=6:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_monitor_0 )
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 7: monitor mysqld_monitor_0 on ha1 (local)
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=7:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=mysqld_monitor_0 )
Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
stonith-ha1_monitor_0 (call=2, rc=7, cib-update=24, confirmed=true) not
running
Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
stonith-ha2_monitor_0 (call=3, rc=7, cib-update=25, confirmed=true) not
running
Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
stonith-ha1_monitor_0 (4) confirmed on ha1 (rc=0)
Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
stonith-ha2_monitor_0 (5) confirmed on ha1 (rc=0)
Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
mysqld_monitor_0 (call=5, rc=7, cib-update=26, confirmed=true) not running
Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
mysqld_monitor_0 (7) confirmed on ha1 (rc=0)
Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
MysqlIP_monitor_0 (call=4, rc=7, cib-update=27, confirmed=true) not running
Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
MysqlIP_monitor_0 (6) confirmed on ha1 (rc=0)
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 3: probe_complete probe_complete on ha1 (local) - no waiting
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
2 fired and confirmed
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 8: start stonith-ha2_start_0 on ha1 (local)
Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=8:0:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha2_start_0 )
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
14 fired and confirmed
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
20 fired and confirmed
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
19 fired and confirmed
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
15 fired and confirmed
Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
12 fired and confirmed
Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph:
te_update_diff:146 - Triggered transition abort (complete=0,
tag=transient_attributes, id=1b48f410-44d1-4e89-8b52-ff23b32db1bc,
magic=NA, cib=0.61.6) : Transient attribute:
update
Oct 12 16:27:23 ha1 crmd: [1176]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
Oct 12 16:27:23 ha1 crmd: [1176]: info: update_abort_priority: Abort
action done superceeded by restart
Oct 12 16:27:34 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
stonith-ha2_start_0 (call=6, rc=0, cib-update=28, confirmed=true) ok
Oct 12 16:27:34 ha1 crmd: [1176]: info: match_graph_event: Action
stonith-ha2_start_0 (8) confirmed on ha1 (rc=0)
Oct 12 16:27:34 ha1 crmd: [1176]: info: run_graph:
====================================================
Oct 12 16:27:34 ha1 crmd: [1176]: notice: run_graph: Transition 0
(Complete=12, Pending=0, Fired=0, Skipped=8, Incomplete=0,
Source=/var/lib/pengine/pe-warn-1370.bz2): Stopped
Oct 12 16:27:34 ha1 crmd: [1176]: info: te_graph_trigger: Transition 0
is now complete
Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Oct 12 16:27:34 ha1 crmd: [1176]: info: do_pe_invoke: Query 29:
Requesting the current CIB: S_POLICY_ENGINE
Oct 12 16:27:34 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking
the PE: query=29, ref=pe_calc-dc-1350073654-14, seq=1, quorate=1
Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Oct 12 16:27:34 ha1 crmd: [1176]: info: unpack_graph: Unpacked
transition 1: 9 actions in 9 synapses
Oct 12 16:27:34 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 1
(ref=pe_calc-dc-1350073654-14) derived from
/var/lib/pengine/pe-warn-1371.bz2
Oct 12 16:27:34 ha1 stonithd: [1174]: info: client tengine [pid: 1176]
requests a STONITH operation RESET on node ha2
Oct 12 16:27:34 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
9 fired and confirmed
Oct 12 16:27:34 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
13 fired and confirmed
Oct 12 16:27:34 ha1 crmd: [1176]: info: te_fence_node: Executing reboot
fencing operation (15) on ha2 (timeout=60000)
Oct 12 16:28:08 ha1 crmd: [1176]: info: tengine_stonith_callback:
call=1266, optype=1, node_name=ha2, result=0, node_list=ha1,
action=15:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
1 fired and confirmed
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
14 fired and confirmed
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 6: start MysqlIP_start_0 on ha1 (local)
Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=6:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_start_0 )
Oct 12 16:28:08 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion
of "//node_state[@uname='ha2']/lrm": ok (rc=0)
Oct 12 16:28:08 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion
of "//node_state[@uname='ha2']/transient_attributes": ok (rc=0)
Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
MysqlIP_start_0 (call=7, rc=0, cib-update=33, confirmed=true) ok
Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action
MysqlIP_start_0 (6) confirmed on ha1 (rc=0)
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 7: monitor MysqlIP_monitor_10000 on ha1 (local)
Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=7:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_monitor_10000 )
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating
action 8: start mysqld_start_0 on ha1 (local)
Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
key=8:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=mysqld_start_0 )
Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
mysqld_start_0 (call=9, rc=0, cib-update=34, confirmed=true) ok
Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action
mysqld_start_0 (8) confirmed on ha1 (rc=0)
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action
10 fired and confirmed
Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
MysqlIP_monitor_10000 (call=8, rc=0, cib-update=35, confirmed=false) ok
Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action
MysqlIP_monitor_10000 (7) confirmed on ha1 (rc=0)
Oct 12 16:28:08 ha1 crmd: [1176]: info: run_graph:
====================================================
Oct 12 16:28:08 ha1 crmd: [1176]: notice: run_graph: Transition 1
(Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-warn-1371.bz2): Complete
Oct 12 16:28:08 ha1 crmd: [1176]: info: te_graph_trigger: Transition 1
is now complete
Oct 12 16:28:08 ha1 crmd: [1176]: info: notify_crmd: Transition 1
status: done - <null>
Oct 12 16:28:08 ha1 crmd: [1176]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 12 16:28:08 ha1 crmd: [1176]: info: do_state_transition: Starting
PEngine Recheck Timer
Oct 12 16:43:08 ha1 crmd: [1176]: info: crm_timer_popped: PEngine
Recheck Timer (I_PE_CALC) just popped!
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: Progressed
to state S_POLICY_ENGINE after C_TIMER_POPPED
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_pe_invoke: Query 36:
Requesting the current CIB: S_POLICY_ENGINE
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking
the PE: query=36, ref=pe_calc-dc-1350074588-18, seq=1, quorate=1
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Oct 12 16:43:08 ha1 crmd: [1176]: info: unpack_graph: Unpacked
transition 2: 0 actions in 0 synapses
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 2
(ref=pe_calc-dc-1350074588-18) derived from
/var/lib/pengine/pe-input-4140.bz2
Oct 12 16:43:08 ha1 crmd: [1176]: info: run_graph:
====================================================
Oct 12 16:43:08 ha1 crmd: [1176]: notice: run_graph: Transition 2
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-4140.bz2): Complete
Oct 12 16:43:08 ha1 crmd: [1176]: info: te_graph_trigger: Transition 2
is now complete
Oct 12 16:43:08 ha1 crmd: [1176]: info: notify_crmd: Transition 2
status: done - <null>
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: Starting
PEngine Recheck Timer
Here's my config:
root at ha1:~# cibadmin --query
<cib validate-with="pacemaker-1.0" crm_feature_set="3.0.1"
have-quorum="1" admin_epoch="0" epoch="61" num_updates="11"
cib-last-written="Fri Oct 12 16:07:33 2012"
dc-uuid="1b48f410-44d1-4e89-8b52-ff23b32db1bc">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="Heartbeat"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="true"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" uname="ha1" type="normal"/>
<node id="9790fe6e-67b2-4817-abf4-966b5aa6948c" uname="ha2" type="normal"/>
</nodes>
<resources>
<primitive class="stonith" id="stonith-ha1" type="external/riloe">
<instance_attributes id="stonith-ha1-instance_attributes">
<nvpair id="stonith-ha1-instance_attributes-hostlist" name="hostlist"
value="ha1"/>
<nvpair id="stonith-ha1-instance_attributes-ilo_hostname"
name="ilo_hostname" value="10.0.1.111"/>
<nvpair id="stonith-ha1-instance_attributes-ilo_user" name="ilo_user"
value="Administrator"/>
<nvpair id="stonith-ha1-instance_attributes-ilo_password"
name="ilo_password" value="xxx"/>
<nvpair id="stonith-ha1-instance_attributes-ilo_can_reset"
name="ilo_can_reset" value="1"/>
<nvpair id="stonith-ha1-instance_attributes-ilo_protocol"
name="ilo_protocol" value="2.0"/>
<nvpair id="stonith-ha1-instance_attributes-ilo_powerdown_method"
name="ilo_powerdown_method" value="button"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="stonith-ha2" type="external/riloe">
<instance_attributes id="stonith-ha2-instance_attributes">
<nvpair id="stonith-ha2-instance_attributes-hostlist" name="hostlist"
value="ha2"/>
<nvpair id="stonith-ha2-instance_attributes-ilo_hostname"
name="ilo_hostname" value="10.0.1.112"/>
<nvpair id="stonith-ha2-instance_attributes-ilo_user" name="ilo_user"
value="Administrator"/>
<nvpair id="stonith-ha2-instance_attributes-ilo_password"
name="ilo_password" value="xxx"/>
<nvpair id="stonith-ha2-instance_attributes-ilo_can_reset"
name="ilo_can_reset" value="1"/>
<nvpair id="stonith-ha2-instance_attributes-ilo_protocol"
name="ilo_protocol" value="2.0"/>
<nvpair id="stonith-ha2-instance_attributes-ilo_powerdown_method"
name="ilo_powerdown_method" value="button"/>
</instance_attributes>
</primitive>
<group id="mysql-resources">
<primitive class="ocf" id="MysqlIP" provider="heartbeat" type="IPaddr2">
<instance_attributes id="MysqlIP-instance_attributes">
<nvpair id="MysqlIP-instance_attributes-ip" name="ip" value="192.168.25.9"/>
<nvpair id="MysqlIP-instance_attributes-cidr_netmask"
name="cidr_netmask" value="32"/>
</instance_attributes>
<operations>
<op id="MysqlIP-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive id="mysqld" class="lsb" type="mysqld"/>
</group>
</resources>
<constraints>
<rsc_location id="loc-3" rsc="stonith-ha1" node="ha1" score="-INFINITY"/>
<rsc_location id="loc-4" rsc="stonith-ha2" node="ha2" score="-INFINITY"/>
<rsc_location id="loc-5" rsc="mysql-resources" node="ha1" score="100"/>
</constraints>
<rsc_defaults/>
<op_defaults/>
</configuration>
<status>
<node_state id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" uname="ha1"
ha="active" in_ccm="true" crmd="online" join="member" expected="member"
crm-debug-origin="do_update_resource" shutdown="0">
<lrm id="1b48f410-44d1-4e89-8b52-ff23b32db1bc">
<lrm_resources>
<lrm_resource id="stonith-ha1" type="external/riloe" class="stonith">
<lrm_rsc_op id="stonith-ha1_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="4:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:7;4:0:7:4f352feb-6be6-4
af1-bb9e-af8b7ca28f58" call-id="2" rc-code="7" op-status="0"
interval="0" last-run="1350073642" last-rc-change="1350073642"
exec-time="10" queue-time="0" op-digest="f7c04bf4ee3eb60cf4f9e841449a0fe6"/>
</lrm_resource>
<lrm_resource id="stonith-ha2" type="external/riloe" class="stonith">
<lrm_rsc_op id="stonith-ha2_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="5:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:7;5:0:7:4f352feb-6be6-4
af1-bb9e-af8b7ca28f58" call-id="3" rc-code="7" op-status="0"
interval="0" last-run="1350073643" last-rc-change="1350073643"
exec-time="0" queue-time="0" op-digest="c8773e5caa7ae2a75f71a79adc582d54"/>
<lrm_rsc_op id="stonith-ha2_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="8:0:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:0;8:0:0:4f352feb-6be6-4af1-
bb9e-af8b7ca28f58" call-id="6" rc-code="0" op-status="0" interval="0"
last-run="1350073643" last-rc-change="1350073643" exec-time="10920"
queue-time="0" op-digest="c8773e5caa7ae2a75f71a79adc582d54"/>
</lrm_resource>
<lrm_resource id="mysqld" type="mysqld" class="lsb">
<lrm_rsc_op id="mysqld_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="7:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:7;7:0:7:4f352feb-6be6-4af1-b
b9e-af8b7ca28f58" call-id="5" rc-code="7" op-status="0" interval="0"
last-run="1350073642" last-rc-change="1350073642" exec-time="40"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="mysqld_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="8:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:0;8:1:0:4f352feb-6be6-4af1-bb9e-
af8b7ca28f58" call-id="9" rc-code="0" op-status="0" interval="0"
last-run="1350073688" last-rc-change="1350073688" exec-time="0"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
</lrm_resource>
<lrm_resource id="MysqlIP" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="MysqlIP_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="6:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:7;6:0:7:4f352feb-6be6-4af1-
bb9e-af8b7ca28f58" call-id="4" rc-code="7" op-status="0" interval="0"
last-run="1350073642" last-rc-change="1350073642" exec-time="150"
queue-time="0" op-digest="9611b7026c2dc135fbd13d3537b42d16"/>
<lrm_rsc_op id="MysqlIP_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="6:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:0;6:1:0:4f352feb-6be6-4af1-bb9e
-af8b7ca28f58" call-id="7" rc-code="0" op-status="0" interval="0"
last-run="1350073687" last-rc-change="1350073687" exec-time="30"
queue-time="0" op-digest="9611b7026c2dc135fbd13d3537b42d16"/>
<lrm_rsc_op id="MysqlIP_monitor_10000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
transition-key="7:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
transition-magic="0:0;7:1:0:4f352feb-6be6-4
af1-bb9e-af8b7ca28f58" call-id="8" rc-code="0" op-status="0"
interval="10000" last-run="1350073687" last-rc-change="1350073687"
exec-time="10" queue-time="0" op-digest="caf3454e2233540200e7a6a5319c5362"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="1b48f410-44d1-4e89-8b52-ff23b32db1bc">
<instance_attributes id="status-1b48f410-44d1-4e89-8b52-ff23b32db1bc">
<nvpair id="status-1b48f410-44d1-4e89-8b52-ff23b32db1bc-probe_complete"
name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
<node_state id="9790fe6e-67b2-4817-abf4-966b5aa6948c" uname="ha2"
ha="dead" in_ccm="false" crmd="offline" join="down" expected="down"
crm-debug-origin="send_stonith_update"/>
</status>
</cib>
--
Dave Parker
Systems Administrator
Utica College
Integrated Information Technology Services
(315) 792-3229
Registered Linux User #408177
More information about the Pacemaker
mailing list