[Pacemaker] crmd - set dc unset dc loop
Philippe Carbonnier
Philippe.Carbonnier at vif.fr
Tue Jan 17 16:41:27 CET 2012
Hello,
The configuration:
redhat 5.5 64bits
pacemaker-libs-1.0.10-1.4.el5.x86_64
pacemaker-1.0.10-1.4.el5.x86_64
corosync-1.2.7-1.1.el5.x86_64
corosynclib-1.2.7-1.1.el5.x86_64
when working : [root at ujboss1 cluster]# crm_mon -1
============
Last updated: Tue Jan 17 16:27:33 2012
Stack: openais
Current DC: ujboss2 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ ujboss1 ujboss2 ]
Resource Group: vifGroup
clusterIP (ocf::heartbeat:IPaddr2): Started ujboss1
routing-jboss (lsb:routing-jboss): Started ujboss1
Now, the problem : Just after running crm_mode offline on ujboss1
(12:51:44), crmd seems to loop with always the same messages :
I have restarted corosync on both node, and now it's working.
But can you help me avoiding this "loop".
on ujboss2:
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
loop...
and on ujboss1:
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked
transition 8776: 0 actions in 0 synapses
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_te_invoke: Processing
graph 8776 (ref=pe_calc-dc-1326800326-8977) derived from
/var/lib/pengine/pe-input-7829.bz2
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: run_graph:
====================================================
Jan 17 12:38:46 ujboss1 crmd: [28369]: notice: run_graph: Transition
8776 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-7829.bz2): Complete
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: te_graph_trigger:
Transition 8776 is now complete
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: notify_crmd: Transition
8776 status: done - <null>
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition:
Starting PEngine Recheck Timer
Jan 17 12:38:46 ujboss1 pengine: [28368]: info: process_pe_message:
Transition 8776: PEngine Input stored in: /var/lib/pengine/pe-input-7829.bz2
Jan 17 12:46:27 ujboss1 cib: [28365]: info: cib_stats: Processed 1
operations (0.00us average, 0% utilization) in the last 10min
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- <cib admin_epoch="0" epoch="233" num_updates="5" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- <configuration >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- <nodes >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- <node id="ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- <instance_attributes id="nodes-ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- <nvpair value="off" id="nodes-ujboss1-standby" />
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- </instance_attributes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- </node>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- </nodes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- </configuration>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
- </cib>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ <cib admin_epoch="0" epoch="234" num_updates="1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ <configuration >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ <nodes >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ <node id="ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ <instance_attributes id="nodes-ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ <nvpair value="on" id="nodes-ujboss1-standby" />
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ </instance_attributes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ </node>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ </nodes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ </configuration>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
+ </cib>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crm_attribute/4, version=0.234.1): ok (rc=0)
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: need_abort: Aborting on
change to admin_epoch
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_pe_invoke: Query 8981:
Requesting the current CIB: S_POLICY_ENGINE
Jan 17 12:51:46 ujboss1 cib: [28365]: ERROR: send_ais_text: Sending
message 251: FAILED (rc=2): Library error: Connection timed out (110)
Jan 17 12:51:46 ujboss1 crmd: [28369]: info: do_pe_invoke_callback:
Invoking the PE: query=8981, ref=pe_calc-dc-1326801106-8978, seq=560,
quorate=1
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_status: Node
ujboss1 is in standby-mode
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status:
Node ujboss1 is standby
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status:
Node ujboss2 is online
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: group_print: Resource
Group: vifGroup
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2): Started ujboss1
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print:
routing-jboss (lsb:routing-jboss): Started ujboss1
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp: Start
recurring monitor (30s) for clusterIP on ujboss2
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp: Start
recurring monitor (30s) for routing-jboss on ujboss2
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move
resource clusterIP (Started ujboss1 -> ujboss2)
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move
resource routing-jboss (Started ujboss1 -> ujboss2)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked
transition 8777: 11 actions in 11 synapses
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_te_invoke: Processing
graph 8777 (ref=pe_calc-dc-1326801106-8978) derived from
/var/lib/pengine/pe-input-7830.bz2
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
action 15 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating
action 10: stop routing-jboss_stop_0 on ujboss1 (local)
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation
monitor[79] on lsb::routing-jboss::routing-jboss for client 28369, its
parameters: CRM_meta_interval=[30000] CRM_meta_timeout=[20000]
crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing
key=10:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=routing-jboss_stop_0 )
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:routing-jboss:80: stop
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
operation routing-jboss_monitor_30000 (call=79, status=1, cib-update=0,
confirmed=true) Cancelled
Jan 17 12:51:48 ujboss1 lrmd: [5533]: WARN: For LSB init script, no
additional parameters are needed.
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(routing-jboss:stop:stdout) Disabling traffic redirection from
128.1.13.9 to 128.1.13.7
Jan 17 12:51:48 ujboss1 pengine: [28368]: info: process_pe_message:
Transition 8777: PEngine Input stored in: /var/lib/pengine/pe-input-7830.bz2
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(routing-jboss:stop:stdout) [
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(routing-jboss:stop:stdout) OK
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(routing-jboss:stop:stdout) ]
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(routing-jboss:stop:stdout)
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(routing-jboss:stop:stdout)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
operation routing-jboss_stop_0 (call=80, rc=0, cib-update=8982,
confirmed=true) ok
Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
connected to AIS
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action
routing-jboss_stop_0 (10) confirmed on ujboss1 (rc=0)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating
action 7: stop clusterIP_stop_0 on ujboss1 (local)
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation
monitor[77] on ocf::IPaddr2::clusterIP for client 28369, its parameters:
CRM_meta_interval=[30000] ip=[128.1.13.9] cidr_netmask=[32]
CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor]
iflabel=[jbossfailover] cancelled
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing
key=7:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=clusterIP_stop_0 )
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:clusterIP:81: stop
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
operation clusterIP_monitor_30000 (call=77, status=1, cib-update=0,
confirmed=true) Cancelled
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(clusterIP:stop:stderr) logger: unknown facility name: none.
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
(clusterIP:stop:stderr) logger: unknown facility name: none.
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
operation clusterIP_stop_0 (call=81, rc=0, cib-update=8983,
confirmed=true) ok
Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
connected to AIS
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action
clusterIP_stop_0 (7) confirmed on ujboss1 (rc=0)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
action 16 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
action 3 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
action 13 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating
action 8: start clusterIP_start_0 on ujboss2
Jan 17 12:51:48 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 568: memb=1, new=0, lost=1
Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: memb: ujboss1
34406784
Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: lost: ujboss2
51184000
Jan 17 12:51:48 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 568: memb=2, new=1, lost=0
Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: NEW: ujboss2
51184000
Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: MEMB: ujboss1
34406784
Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: MEMB: ujboss2
51184000
Jan 17 12:51:48 ujboss1 crmd: [28369]: ERROR: crmd_ha_msg_filter:
Another DC detected: ujboss2 (op=noop)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff
0.233.5 -> 0.233.6 not applied to 0.234.3: current "epoch" is greater
than required
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff
0.233.6 -> 0.233.7 not applied to 0.234.3: current "epoch" is greater
than required
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff
0.233.7 -> 0.234.1 not applied to 0.234.3: current "epoch" is greater
than required
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_dc_takeover: Taking over
DC status for this partition
Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We
are now in R/O mode
Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/8984, version=0.234.3): ok (rc=0)
Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We
are now in R/W mode
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_master for section 'all'
(origin=local/crmd/8985, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_modify for section cib
(origin=local/crmd/8986, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/8988, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/8990, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 corosync [MAIN ] Completed service synchronization,
ready to provide service.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
join-8: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership
568: quorum retained
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting
expected votes to 2
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/8993, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback:
Checking for expired actions every 900000ms
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback:
Sending expected-votes=2 to corosync
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: check_join_state:
do_dc_join_filter_offer: Membership changed since join started: 560 -> 568
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: join_make_offer: Making
join offers based on membership 568
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
join-9: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership
568: quorum retained
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting
expected votes to 2
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/8996, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2
cluster nodes responded to the join offer.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize:
join-9: Syncing the CIB from ujboss1 to the rest of the cluster
Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
connected to AIS
Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/8998, version=0.234.3): not connected (rc=-3)
Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback:
Sync from ujboss1 resulted in an error: not connected
Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input
I_ELECTION_DC from finalize_sync_callback() received in state
S_FINALIZE_JOIN
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
join-10: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2
cluster nodes responded to the join offer.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize:
join-10: Syncing the CIB from ujboss1 to the rest of the cluster
Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
connected to AIS
Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/9000, version=0.234.3): not connected (rc=-3)
Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback:
Sync from ujboss1 resulted in an error: not connected
Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input
I_ELECTION_DC from finalize_sync_callback() received in state
S_FINALIZE_JOIN
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
join-11: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2
cluster nodes responded to the join offer.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize:
join-11: Syncing the CIB from ujboss1 to the rest of the cluster
Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
connected to AIS
Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/9002, version=0.234.3): not connected (rc=-3)
Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback:
Sync from ujboss1 resulted in an error: not connected
Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input
I_ELECTION_DC from finalize_sync_callback() received in state
S_FINALIZE_JOIN
.... loop too
after restarting corosync :
17/01/12 13H10 : crm_mon -1
============
Last updated: Tue Jan 17 13:10:39 2012
Stack: openais
Current DC: ujboss1 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ ujboss1 ujboss2 ]
Resource Group: vifGroup
clusterIP (ocf::heartbeat:IPaddr2): Started ujboss2 FAILED
routing-jboss (lsb:routing-jboss): Stopped
Failed actions:
clusterIP_start_0 (node=ujboss2, call=-1, rc=1, status=Timed Out): unknown error
Both linux servers were very busy, crmd, cib and corosync using all the cpu.
Best regards,
Philippe
More information about the Pacemaker
mailing list