[Pacemaker] socket is incremented after running crm shell

Andrew Beekhof andrew at beekhof.net
Tue Apr 10 12:29:29 UTC 2012


On Thu, Apr 5, 2012 at 5:05 AM, David Vossel <dvossel at redhat.com> wrote:
> ----- Original Message -----
>> From: "Junko IKEDA" <tsukishima.ha at gmail.com>
>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>> Sent: Tuesday, April 3, 2012 9:54:42 PM
>> Subject: Re: [Pacemaker] socket is incremented after running crm shell
>>
>> Hi,
>>
>> This is my investigation;
>> When "crm configure" or "cibadmin"  are called,
>> it seems that pengine process tries to restart.
>>
>> Apr  2 14:10:01 bl460g6b crmd: [7186]: info: start_subsystem:
>> Starting
>> sub-system "pengine"
>> Apr  2 14:10:01 bl460g6b crmd: [7186]: WARN: start_subsystem: Client
>> pengine already running as pid 7190
>> Apr  2 14:10:05 bl460g6b crmd: [7186]: info: do_dc_takeover: Taking
>> over DC status for this partition
>>
>> Process is already running, so "restart pengine" is canceled,
>> but IPC channel is added newly.
>> That's why a file descriptor is also increased.
>> Is it correct?
>
> The patch isn't wrong, but I believe the patch below is a bit simpler.  There is a flag we can check to see if we are already connected to the PE or not.
>
> diff --git crmd/pengine.c crmd/pengine.c
> index 989601b..ae60a59 100644
> --- crmd/pengine.c
> +++ crmd/pengine.c
> @@ -181,7 +181,7 @@ do_pe_control(long long action,
>         }
>     }
>
> -    if (action & start_actions) {
> +    if ((action & start_actions) && (is_set(fsa_input_register, R_PE_CONNECTED) == FALSE)) {
>         if (cur_state != S_STOPPING) {
>             if (is_openais_cluster()) {
>                 set_bit_inplace(fsa_input_register, pe_subsystem->flag_required);
>
>
>> Please see the attached.
>>
>> By the way, during the status check of pengine, crmd calls sleep(4)
>
> That is strange.  I have no idea the reasoning behind that.  It only occurs when using the heartbeat stack though.  It looks like an attempt to allow the child process launched in start_subsystem() to initialize something before the parent process proceeds. That kind of logic is never a good idea.

Agreed.  However...In the early days of heartbeat, the crmd launched
the pe and the pe connected back into the crmd.
Then for corosync we had the PE always be around and reversed the
direction of the connection.

Adding a 4s delay when heartbeat was used was the easiest way to fit
it into the new model.

Sometimes practicality wins.

>
>
> -- Vossel
>
>> in
>> do_pe_control().
>> I think it's not reasonable to do the check with each "crm configure"
>> or "cibadmin".
>> It will lead the delay of the transition.
>>
>> Thanks,
>> Junko
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list