[Pacemaker] Timeout after nodejoin

Dan Frincu dfrincu at streamwide.ro
Thu Sep 23 08:19:12 UTC 2010


Hi,

Steven Dake wrote:
> On 09/22/2010 05:43 AM, Dan Frincu wrote:
>> Hi all,
>>
>> I have the following packages:
>>
>> # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
>> openais-0.80.5-15.2
>> cluster-glue-1.0-12.2
>> pacemaker-1.0.5-4.2
>> cluster-glue-libs-1.0-12.2
>> resource-agents-1.0-31.5
>> pacemaker-libs-1.0.5-4.2
>> pacemaker-mgmt-1.99.2-7.2
>> libopenais2-0.80.5-15.2
>> heartbeat-3.0.0-33.3
>> pacemaker-mgmt-client-1.99.2-7.2
>>
>> When I start openais, I get nodejoin immediately, as seen in the logs
>> below. However, it takes some time before the nodes are visible in
>> crm_mon output. Any idea how to minimize this delay?
>>
>> Sep 22 15:27:24 bench1 openais[12935]: [crm ] info:
>> send_member_notification: Sending membership update 8 to 1 children
>> Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message
>> 192.168.165.33
>> Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message
>> 192.168.165.35
>> Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
>> Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message:
>> Sending message to local.crmd failed: unknown (rc=-2)
>> Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message:
>> Sending message to local.crmd failed: unknown (rc=-2)
>> Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded
>> connection 0x174840d0 for crmd/12946
>> Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending
>> membership update 8 to crmd
>> Sep 22 15:27:24 bench1 openais[12935]: [crm ] info:
>> update_expected_votes: Expected quorum votes 1024 -> 2
>> Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
>> 8: quorum aquired
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
>> Election 2 (owner: bench2) pass: vote from bench2 (Host name)
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
>> transition S_PENDING -> S_ELECTION [ input=I_ELECTION
>> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>> cause=C_FSA_INTERNAL origin=do_election_check ]
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
>> TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
>> Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
>> cib_client_add_notify_callback: Callback already present
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
>> custom graph functions
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
>> transition -1: 0 actions in 0 synapses
>> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
>> DC status for this partition
>> Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
>> now in R/W mode
>>
>> Regards,
>>
>> Dan
>>
>
> Where did you get that version of openais?  openais 0.80.x is 
> deprecated in the community (and hence, no support).  We recommend 
> using corosync instead which has improved testing with pacemaker.
>
 From the SUSE repositories for Redhat, last year, when we began working 
with this cluster stack. I also pushed corosync forward, for obvious 
reasons, however for existing installations, upgrade is an option that 
will require some testing, because the platforms cannot be taken offline.

Anyway, thank you all for your input, I've done some researching and 
fiddling with the dc-timeout parameter did the trick.

Regards,

Dan  

-- 
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania




More information about the Pacemaker mailing list