[Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Tue Nov 17 17:07:05 UTC 2009

Disabling syslog the problem disappears.

Thank you very much,
Giovanni

On Nov 16, 2009, at 4:51 PM, hj lee wrote:

> Hi,
>
> Please disable syslog in openais.conf, and try it again. It seems  
> this issue is related to fork() call and syslog().
>
> hj
>
> On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <gdimilia at cfa.harvard.edu 
> > wrote:
> Thank you very much for your response.
>
> The only thing I really don't understand is: why this problem  
> doesn't appear in all my simulations?
> I configured at least 7 couple of virtual servers with vmware 2 and  
> CentOS 5.3 and 5.4 (32 and 64 bits) and I never had this kind of  
> problems!
>
> The only difference in the configuration is that I used private IPs  
> for the simulations and public IPs for the real servers, but I don't  
> think it is important.
>
> Thanks for your patience,
> Giovanni
>
>
>
> On Nov 13, 2009, at 1:36 PM, hj lee wrote:
>
>> Hi,
>>
>> I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and  
>> openais-0.80.5. This is openais bug! Two problems.
>> 1. Starting openais service gets seg fault sometime. It more likely  
>> happens if openais service get started before syslog.
>> 2. The seg fault handler of openais calls syslog(). The syslog is  
>> one of UNSAFE function that must not be called from signal handler  
>> because it is non-reentrent function.
>>
>> To fix this issue: get the openais source, find sigsegv_handler  
>> function exec/main.c and just comment out log_flush(), shown below.  
>> Then recompile and isntall it(make and make install). The log_flush  
>> should be removed from all signal handlers in openais code base. I  
>> am still not sure where seg fault occurs, but commenting out  
>> log_flush prevents seg fault.
>>
>>
>> -------------------------------------------------------------------------
>> static void sigsegv_handler (int num)
>> {
>>         signal (SIGSEGV, SIG_DFL);
>> //      log_flush ();
>>         raise (SIGSEGV);
>> }
>>
>> Thanks
>> hj
>>
>> On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <gdimilia at cfa.harvard.edu 
>> > wrote:
>> I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker  
>> 1.06 and corosync 1.1.2
>>
>> I only installed the x86_64 packages (yum install pacemaker try to  
>> install also the 32 bits one).
>>
>> I configured a shared cluster IP (it's a public ip) and a cluster  
>> website.
>>
>> Everything work fine if i try to stop corosync on one of the two  
>> servers (the services pass from one machine to the other without  
>> problems), but if I reboot one server, when it returns alive it  
>> cannot go online in the cluster.
>> I also noticed that there are several thread of corosync and if I  
>> kill all of them and then I start again corosync, everything work  
>> fine again.
>>
>> I don't know what is happening and I'm not able to reproduce the  
>> same situation on some virtual servers!
>>
>> Thanks,
>> Giovanni
>>
>>
>>
>> the configuration of corosync is the following:
>>
>> ##############################################
>> # Please read the corosync.conf.5 manual page
>> compatibility: whitetank
>>
>> aisexec {
>>        # Run as root - this is necessary to be able to manage  
>> resources with Pacemaker
>>        user:   root
>>        group:  root
>> }
>>
>> service {
>>        # Load the Pacemaker Cluster Resource Manager
>>        ver:       0
>>        name:      pacemaker
>>        use_mgmtd: yes
>>        use_logd:  yes
>> }
>>
>> totem {
>>        version: 2
>>
>>        # How long before declaring a token lost (ms)
>>        token:          5000
>>
>>        # How many token retransmits before forming a new  
>> configuration
>>        token_retransmits_before_loss_const: 10
>>
>>        # How long to wait for join messages in the membership  
>> protocol (ms)
>>        join:           1000
>>
>>        # How long to wait for consensus to be achieved before  
>> starting a new round of membership configuration (ms)
>>        consensus:      2500
>>
>>        # Turn off the virtual synchrony filter
>>        vsftype:        none
>>
>>        # Number of messages that may be sent by one processor on  
>> receipt of the token
>>        max_messages:   20
>>
>>        # Stagger sending the node join messages by 1..send_join ms
>>        send_join: 45
>>
>>        # Limit generated nodeids to 31-bits (positive signed  
>> integers)
>>        clear_node_high_bit: yes
>>
>>        # Disable encryption
>>        secauth:        off
>>
>>        # How many threads to use for encryption/decryption
>>        threads:        0
>>
>>        # Optionally assign a fixed node id (integer)
>>        # nodeid:         1234
>>
>>        interface {
>>                ringnumber: 0
>>
>>                # The following values need to be set based on your  
>> environment
>> bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my  
>> configuration
>> mcastaddr: 226.94.1.1
>> mcastport: 4000
>>        }
>> }
>>
>> logging {
>>        fileline: off
>>        to_stderr: yes
>>        to_logfile: yes
>>        to_syslog: yes
>>        logfile: /tmp/corosync.log
>>        debug: off
>>        timestamp: on
>>        logger_subsys {
>>                subsys: AMF
>>                debug: off
>>        }
>> }
>>
>> amf {
>>        mode: disabled
>> }
>>
>> ##################################################
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>>
>> -- 
>> Dream with longterm vision!
>> kerdosa
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
>
>
> -- 
> Dream with longterm vision!
> kerdosa
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091117/cec79f40/attachment-0002.htm>