[Pacemaker] Multiple thread after rebooting server: the node doesn't go online
hj lee
kerdosa at gmail.com
Mon Nov 16 21:51:41 UTC 2009
Hi,
Please disable syslog in openais.conf, and try it again. It seems this issue
is related to fork() call and syslog().
hj
On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <gdimilia at cfa.harvard.edu
> wrote:
> Thank you very much for your response.
>
> The only thing I really don't understand is: why this problem doesn't
> appear in all my simulations?
> I configured at least 7 couple of virtual servers with vmware 2 and CentOS
> 5.3 and 5.4 (32 and 64 bits) and I never had this kind of problems!
>
> The only difference in the configuration is that I used private IPs for the
> simulations and public IPs for the real servers, but I don't think it is
> important.
>
> Thanks for your patience,
> Giovanni
>
>
>
> On Nov 13, 2009, at 1:36 PM, hj lee wrote:
>
> Hi,
>
> I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and
> openais-0.80.5. This is openais bug! Two problems.
> 1. Starting openais service gets seg fault sometime. It more likely happens
> if openais service get started before syslog.
> 2. The seg fault handler of openais calls syslog(). The syslog is one of
> UNSAFE function that must not be called from signal handler because it is
> non-reentrent function.
>
> To fix this issue: get the openais source, find sigsegv_handler function
> exec/main.c and just comment out log_flush(), shown below. Then recompile
> and isntall it(make and make install). The log_flush should be removed from
> all signal handlers in openais code base. I am still not sure where seg
> fault occurs, but commenting out log_flush prevents seg fault.
>
>
> -------------------------------------------------------------------------
> static void sigsegv_handler (int num)
> {
> signal (SIGSEGV, SIG_DFL);
> // log_flush ();
> raise (SIGSEGV);
> }
>
> Thanks
> hj
>
> On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <
> gdimilia at cfa.harvard.edu> wrote:
>
>> I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker 1.06
>> and corosync 1.1.2
>>
>> I only installed the x86_64 packages (yum install pacemaker try to install
>> also the 32 bits one).
>>
>> I configured a shared cluster IP (it's a public ip) and a cluster website.
>>
>> Everything work fine if i try to stop corosync on one of the two servers
>> (the services pass from one machine to the other without problems), but if I
>> reboot one server, when it returns alive it cannot go online in the cluster.
>> I also noticed that there are several thread of corosync and if I kill all
>> of them and then I start again corosync, everything work fine again.
>>
>> I don't know what is happening and I'm not able to reproduce the same
>> situation on some virtual servers!
>>
>> Thanks,
>> Giovanni
>>
>>
>>
>> the configuration of corosync is the following:
>>
>> ##############################################
>> # Please read the corosync.conf.5 manual page
>> compatibility: whitetank
>>
>> aisexec {
>> # Run as root - this is necessary to be able to manage resources
>> with Pacemaker
>> user: root
>> group: root
>> }
>>
>> service {
>> # Load the Pacemaker Cluster Resource Manager
>> ver: 0
>> name: pacemaker
>> use_mgmtd: yes
>> use_logd: yes
>> }
>>
>> totem {
>> version: 2
>>
>> # How long before declaring a token lost (ms)
>> token: 5000
>>
>> # How many token retransmits before forming a new configuration
>> token_retransmits_before_loss_const: 10
>>
>> # How long to wait for join messages in the membership protocol
>> (ms)
>> join: 1000
>>
>> # How long to wait for consensus to be achieved before starting a
>> new round of membership configuration (ms)
>> consensus: 2500
>>
>> # Turn off the virtual synchrony filter
>> vsftype: none
>>
>> # Number of messages that may be sent by one processor on receipt
>> of the token
>> max_messages: 20
>>
>> # Stagger sending the node join messages by 1..send_join ms
>> send_join: 45
>>
>> # Limit generated nodeids to 31-bits (positive signed integers)
>> clear_node_high_bit: yes
>>
>> # Disable encryption
>> secauth: off
>>
>> # How many threads to use for encryption/decryption
>> threads: 0
>>
>> # Optionally assign a fixed node id (integer)
>> # nodeid: 1234
>>
>> interface {
>> ringnumber: 0
>>
>> # The following values need to be set based on your
>> environment
>> bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my configuration
>> mcastaddr: 226.94.1.1
>> mcastport: 4000
>> }
>> }
>>
>> logging {
>> fileline: off
>> to_stderr: yes
>> to_logfile: yes
>> to_syslog: yes
>> logfile: /tmp/corosync.log
>> debug: off
>> timestamp: on
>> logger_subsys {
>> subsys: AMF
>> debug: off
>> }
>> }
>>
>> amf {
>> mode: disabled
>> }
>>
>> ##################################################
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>
>
>
> --
> Dream with longterm vision!
> kerdosa
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
--
Dream with longterm vision!
kerdosa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091116/8f1d9fce/attachment-0001.htm>
More information about the Pacemaker
mailing list