[Pacemaker] Timeout after nodejoin

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Sep 22 10:12:35 EDT 2010


Hi,

On Wed, Sep 22, 2010 at 04:48:42PM +0300, Dan Frincu wrote:
> Hi,
> 
> Raoul Bhatia [IPAX] wrote:
> >hi,
> >
> >On 09/22/2010 02:43 PM, Dan Frincu wrote:
> >>When I start openais, I get nodejoin immediately, as seen in the logs
> >>below. However, it takes some time before the nodes are visible in
> >>crm_mon output. Any idea how to minimize this delay?
> >>
> >>Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
> >>send_member_notification: Sending membership update 8 to 1 children
> >>Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
> >>192.168.165.33
> >>Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
> >>192.168.165.35
> >>Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
> >>Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
> >>Sending message to local.crmd failed: unknown (rc=-2)
> >>Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
> >>Sending message to local.crmd failed: unknown (rc=-2)
> >>Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Recorded
> >>connection 0x174840d0 for crmd/12946
> >>Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Sending
> >>membership update 8 to crmd
> >>Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
> >>update_expected_votes: Expected quorum votes 1024 -> 2
> >>Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
> >>8: quorum aquired
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
> >>Election 2 (owner: bench2) pass: vote from bench2 (Host name)
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
> >>transition S_PENDING -> S_ELECTION [ input=I_ELECTION
> >>cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
> >>transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> >>cause=C_FSA_INTERNAL origin=do_election_check ]
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
> >>TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
> >>Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
> >>cib_client_add_notify_callback: Callback already present
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
> >>custom graph functions
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
> >>transition -1: 0 actions in 0 synapses
> >>Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
> >>DC status for this partition
> >>Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
> >>now in R/W
> >>mode
> >
> >is the cluster up and running and you're only (re-)starting one node?
> >or is this after you start openais on both nodes.
> >
> >thanks,
> >raoul
> Second case, just after openais start on both nodes.

It's probably due to dc-deadtime (from crm ra info crmd):

dc-deadtime (time, [60s]): How long to wait for a response from other nodes during startup.
    The "correct" value will depend on the speed/load of your
	network and the type of switches used.

Thanks,

Dejan

> Regards,
> Dan
> 
> -- 
> Dan FRINCU
> Systems Engineer
> CCNA, RHCE
> Streamwide Romania
> 

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list