[Pacemaker] Re: [PATCH] election trigger

Bernd Schubert bs at q-leap.de
Wed Nov 5 08:26:20 EST 2008


Hello Andrew,

sorry for my late response.

On Sunday 02 November 2008 20:32:14 Andrew Beekhof wrote:
> On Oct 30, 2008, at 6:08 PM, Bernd Schubert wrote:
> > Heartbeat calls crmd only if all nodes are already online.
>
> Not everyone uses it on heartbeat anymore ;-)

I grepped the sources of openais and corosync for "KEY_INITDEAD", but can't 
find anything. Are there any further solutions pacemaker supports?

>
> > So introducing
> > another posssibly huge deadtime here will at least delay the DC
> > selection
> > and so resource startup by heartbeats initial deadtime. If one node
> > e.g.
> > after a global power failure doesn't come up at all, the DC
> > selection was
> > even delayed by 2 x initial hb deadtime. Simply remove the usage of
> > heartbeats initial deadtime and only use our own.
>
> I don't understand.
> The logic below is only triggered for people who haven't set a value
> for dc_deadtime... why not just set a value in the cib?

Well firstly, the logs didn't tell me: 

"Look here, you didn't set dc_deadtime, so crm is going to use a huge useless 
timeout". 

But instead on each startup of heartbeat I get hundreds of lines into syslog 
and all of these don't look as if there are for the common admin, but IMHO 
99% of it are developer information. 

Then after I found the code in pacemaker, I already tested setting dc_deatime, 
but during my initial test that didn't change anything. While we need for 
Lustre installations a heartbeat deadtime > 10min, I set it on my test 
systems to 180s. 
Now after your suggestion I tested it again, with deadtime=20min, but 
dc_deatime=10s and quite odd, crm still needs about 3min to set the nodes 
online (syslog attached). With the code removed it is only 10s.

Since openais doesn't seem to support the code below at at all and since it is 
wrong when used together with heartbeat, I still think removing these lines 
is right. Please correct me if I'm wrong.


Thanks,
Bernd


PS: Sorry, the attached syslog is still with heartbeat-2.1.4. If you think you 
fixed it in pacemaker already, please point me to the commit.


>
> > Signed-off-by: Bernd Schubert <bs at q-leap.de>
> >
> > diff --git a/crmd/control.c b/crmd/control.c
> > --- a/crmd/control.c
> > +++ b/crmd/control.c
> > @@ -747,23 +747,6 @@ config_query_callback(xmlNode *msg, int
> > 		output, XML_CIB_TAG_PROPSET, NULL, config_hash,
> > 		CIB_OPTIONS_FIRST, FALSE, now);
> >
> > -	value = g_hash_table_lookup(config_hash,
> > XML_CONFIG_ATTR_DC_DEADTIME);
> > -	if(value == NULL) {
> > -		/* apparently we're not allowed to free the result of getenv */
> > -		char *param_val = getenv(ENV_PREFIX "initdead");
> > -
> > -		value = crmd_pref(config_hash, XML_CONFIG_ATTR_DC_DEADTIME);
> > -		if(param_val != NULL) {
> > -			int from_env = crm_get_msec(param_val) / 2;
> > -			int from_defaults = crm_get_msec(value);
> > -			if(from_env > from_defaults) {
> > -				g_hash_table_replace(
> > -					config_hash, crm_strdup(XML_CONFIG_ATTR_DC_DEADTIME),
> > -					crm_strdup(param_val));
> > -			}
> > -		}
> > -	}
> > -
> > 	verify_crmd_options(config_hash);
> >
> > 	value = crmd_pref(config_hash, XML_CONFIG_ATTR_DC_DEADTIME);
> >
> >
> > --
> > Bernd Schubert
> > Q-Leap Networks GmbH
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at clusterlabs.org
> > http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker



-- 
Bernd Schubert
Q-Leap Networks GmbH
-------------- next part --------------
A non-text attachment was scrubbed...
Name: syslog.bak.gz
Type: application/x-gzip
Size: 146568 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081105/10ba2504/attachment-0001.bin>


More information about the Pacemaker mailing list