[Pacemaker] crmd - set dc unset dc loop

Philippe Carbonnier Philippe.Carbonnier at vif.fr
Tue Jan 17 16:41:27 CET 2012


Hello,

The configuration:
redhat 5.5 64bits
pacemaker-libs-1.0.10-1.4.el5.x86_64
pacemaker-1.0.10-1.4.el5.x86_64
corosync-1.2.7-1.1.el5.x86_64
corosynclib-1.2.7-1.1.el5.x86_64

when working : [root at ujboss1 cluster]# crm_mon -1
============
Last updated: Tue Jan 17 16:27:33 2012
Stack: openais
Current DC: ujboss2 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ ujboss1 ujboss2 ]

  Resource Group: vifGroup
      clusterIP  (ocf::heartbeat:IPaddr2):       Started ujboss1
      routing-jboss      (lsb:routing-jboss):    Started ujboss1


Now, the problem : Just after running crm_mode offline on ujboss1 
(12:51:44), crmd seems to loop with always the same messages :
I have restarted corosync on both node, and now it's working.
But can you help me avoiding this "loop".

on ujboss2:
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to 
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to 
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
loop...

and on ujboss1:
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked 
transition 8776: 0 actions in 0 synapses
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_te_invoke: Processing 
graph 8776 (ref=pe_calc-dc-1326800326-8977) derived from 
/var/lib/pengine/pe-input-7829.bz2
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: run_graph: 
====================================================
Jan 17 12:38:46 ujboss1 crmd: [28369]: notice: run_graph: Transition 
8776 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-7829.bz2): Complete
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: te_graph_trigger: 
Transition 8776 is now complete
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: notify_crmd: Transition 
8776 status: done - <null>
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: 
Starting PEngine Recheck Timer
Jan 17 12:38:46 ujboss1 pengine: [28368]: info: process_pe_message: 
Transition 8776: PEngine Input stored in: /var/lib/pengine/pe-input-7829.bz2
Jan 17 12:46:27 ujboss1 cib: [28365]: info: cib_stats: Processed 1 
operations (0.00us average, 0% utilization) in the last 10min
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- <cib admin_epoch="0" epoch="233" num_updates="5" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- <configuration >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- <nodes >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- <node id="ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- <instance_attributes id="nodes-ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- <nvpair value="off" id="nodes-ujboss1-standby" />
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- </instance_attributes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- </node>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- </nodes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- </configuration>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
- </cib>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ <cib admin_epoch="0" epoch="234" num_updates="1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ <configuration >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ <nodes >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ <node id="ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ <instance_attributes id="nodes-ujboss1" >
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ <nvpair value="on" id="nodes-ujboss1-standby" />
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ </instance_attributes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ </node>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ </nodes>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ </configuration>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: 
+ </cib>
Jan 17 12:51:44 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_modify for section nodes 
(origin=local/crm_attribute/4, version=0.234.1): ok (rc=0)
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: abort_transition_graph: 
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: need_abort: Aborting on 
change to admin_epoch
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: All 2 
cluster nodes are eligible to run resources.
Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_pe_invoke: Query 8981: 
Requesting the current CIB: S_POLICY_ENGINE
Jan 17 12:51:46 ujboss1 cib: [28365]: ERROR: send_ais_text: Sending 
message 251: FAILED (rc=2): Library error: Connection timed out (110)
Jan 17 12:51:46 ujboss1 crmd: [28369]: info: do_pe_invoke_callback: 
Invoking the PE: query=8981, ref=pe_calc-dc-1326801106-8978, seq=560, 
quorate=1
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: unpack_config: On loss 
of CCM Quorum: Ignore
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_status: Node 
ujboss1 is in standby-mode
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status: 
Node ujboss1 is standby
Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status: 
Node ujboss2 is online
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: group_print:  Resource 
Group: vifGroup
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print:      
clusterIP   (ocf::heartbeat:IPaddr2):       Started ujboss1
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print:      
routing-jboss       (lsb:routing-jboss):    Started ujboss1
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp:  Start 
recurring monitor (30s) for clusterIP on ujboss2
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp:  Start 
recurring monitor (30s) for routing-jboss on ujboss2
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move 
resource clusterIP    (Started ujboss1 -> ujboss2)
Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move 
resource routing-jboss        (Started ujboss1 -> ujboss2)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked 
transition 8777: 11 actions in 11 synapses
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_te_invoke: Processing 
graph 8777 (ref=pe_calc-dc-1326801106-8978) derived from 
/var/lib/pengine/pe-input-7830.bz2
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo 
action 15 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating 
action 10: stop routing-jboss_stop_0 on ujboss1 (local)
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation 
monitor[79] on lsb::routing-jboss::routing-jboss for client 28369, its 
parameters: CRM_meta_interval=[30000] CRM_meta_timeout=[20000] 
crm_feature_set=[3.0.1] CRM_meta_name=[monitor]  cancelled
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing 
key=10:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=routing-jboss_stop_0 )
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:routing-jboss:80: stop
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM 
operation routing-jboss_monitor_30000 (call=79, status=1, cib-update=0, 
confirmed=true) Cancelled
Jan 17 12:51:48 ujboss1 lrmd: [5533]: WARN: For LSB init script, no 
additional parameters are needed.
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(routing-jboss:stop:stdout) Disabling traffic redirection from 
128.1.13.9 to 128.1.13.7
Jan 17 12:51:48 ujboss1 pengine: [28368]: info: process_pe_message: 
Transition 8777: PEngine Input stored in: /var/lib/pengine/pe-input-7830.bz2
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(routing-jboss:stop:stdout) [
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(routing-jboss:stop:stdout)   OK
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(routing-jboss:stop:stdout) ]
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(routing-jboss:stop:stdout)
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(routing-jboss:stop:stdout)

Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM 
operation routing-jboss_stop_0 (call=80, rc=0, cib-update=8982, 
confirmed=true) ok
Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not 
connected to AIS
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action 
routing-jboss_stop_0 (10) confirmed on ujboss1 (rc=0)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating 
action 7: stop clusterIP_stop_0 on ujboss1 (local)
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation 
monitor[77] on ocf::IPaddr2::clusterIP for client 28369, its parameters: 
CRM_meta_interval=[30000] ip=[128.1.13.9] cidr_netmask=[32] 
CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] 
iflabel=[jbossfailover]  cancelled
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing 
key=7:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=clusterIP_stop_0 )
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:clusterIP:81: stop
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM 
operation clusterIP_monitor_30000 (call=77, status=1, cib-update=0, 
confirmed=true) Cancelled
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(clusterIP:stop:stderr) logger: unknown facility name: none.

Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: 
(clusterIP:stop:stderr) logger: unknown facility name: none.

Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM 
operation clusterIP_stop_0 (call=81, rc=0, cib-update=8983, 
confirmed=true) ok
Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not 
connected to AIS
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action 
clusterIP_stop_0 (7) confirmed on ujboss1 (rc=0)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo 
action 16 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo 
action 3 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo 
action 13 fired and confirmed
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating 
action 8: start clusterIP_start_0 on ujboss2
Jan 17 12:51:48 corosync [pcmk  ] notice: pcmk_peer_update: Transitional 
membership event on ring 568: memb=1, new=0, lost=1
Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: memb: ujboss1 
34406784
Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: lost: ujboss2 
51184000
Jan 17 12:51:48 corosync [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 568: memb=2, new=1, lost=0
Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: NEW:  ujboss2 
51184000
Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: MEMB: ujboss1 
34406784
Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: MEMB: ujboss2 
51184000
Jan 17 12:51:48 ujboss1 crmd: [28369]: ERROR: crmd_ha_msg_filter: 
Another DC detected: ujboss2 (op=noop)
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_ELECTION [ input=I_ELECTION 
cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff 
0.233.5 -> 0.233.6 not applied to 0.234.3: current "epoch" is greater 
than required
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff 
0.233.6 -> 0.233.7 not applied to 0.234.3: current "epoch" is greater 
than required
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff 
0.233.7 -> 0.234.1 not applied to 0.234.3: current "epoch" is greater 
than required
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_dc_takeover: Taking over 
DC status for this partition
Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We 
are now in R/O mode
Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_slave_all for section 'all' 
(origin=local/crmd/8984, version=0.234.3): ok (rc=0)
Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We 
are now in R/W mode
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_master for section 'all' 
(origin=local/crmd/8985, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_modify for section cib 
(origin=local/crmd/8986, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/8988, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/8990, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 corosync [MAIN  ] Completed service synchronization, 
ready to provide service.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: 
join-8: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership 
568: quorum retained
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting 
expected votes to 2
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/8993, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback: 
Checking for expired actions every 900000ms
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback: 
Sending expected-votes=2 to corosync
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to 
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: check_join_state: 
do_dc_join_filter_offer: Membership changed since join started: 560 -> 568
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: join_make_offer: Making 
join offers based on membership 568
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: 
join-9: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership 
568: quorum retained
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting 
expected votes to 2
Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/8996, version=0.234.3): ok (rc=0)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to 
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2 
cluster nodes responded to the join offer.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize: 
join-9: Syncing the CIB from ujboss1 to the rest of the cluster
Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not 
connected to AIS
Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request: 
Operation complete: op cib_sync for section 'all' 
(origin=local/crmd/8998, version=0.234.3): not connected (rc=-3)
Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback: 
Sync from ujboss1 resulted in an error: not connected
Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input 
I_ELECTION_DC from finalize_sync_callback() received in state 
S_FINALIZE_JOIN
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: 
join-10: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to 
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2 
cluster nodes responded to the join offer.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize: 
join-10: Syncing the CIB from ujboss1 to the rest of the cluster
Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not 
connected to AIS
Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request: 
Operation complete: op cib_sync for section 'all' 
(origin=local/crmd/9000, version=0.234.3): not connected (rc=-3)
Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback: 
Sync from ujboss1 resulted in an error: not connected
Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input 
I_ELECTION_DC from finalize_sync_callback() received in state 
S_FINALIZE_JOIN
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: 
join-11: Waiting on 2 outstanding join acks
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to 
ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2 
cluster nodes responded to the join offer.
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize: 
join-11: Syncing the CIB from ujboss1 to the rest of the cluster
Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not 
connected to AIS
Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request: 
Operation complete: op cib_sync for section 'all' 
(origin=local/crmd/9002, version=0.234.3): not connected (rc=-3)
Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback: 
Sync from ujboss1 resulted in an error: not connected
Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input 
I_ELECTION_DC from finalize_sync_callback() received in state 
S_FINALIZE_JOIN
.... loop too

after restarting corosync :

17/01/12 13H10 : crm_mon -1
============
Last updated: Tue Jan 17 13:10:39 2012
Stack: openais
Current DC: ujboss1 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ ujboss1 ujboss2 ]

  Resource Group: vifGroup
      clusterIP	(ocf::heartbeat:IPaddr2):	Started ujboss2 FAILED
      routing-jboss	(lsb:routing-jboss):	Stopped

Failed actions:
     clusterIP_start_0 (node=ujboss2, call=-1, rc=1, status=Timed Out): unknown error



Both linux servers were very busy, crmd, cib and corosync using all the cpu.
Best regards,
Philippe



More information about the Pacemaker mailing list