[Pacemaker] RES: Reboot of cluster members with heavy load on filesystem.

Mon Feb 11 10:21:24 UTC 2013

On Mon, Feb 11, 2013 at 12:41 PM, Carlos Xavier
<cbastos at connection.com.br> wrote:
> Hi Andrew,
>
> tank you very much for your hints.
>
>> > Hi.
>> >
>> > We are running two clusters compounded of two machines. We are using DRBD + OCFS2 to make the common
>> filesystem.
>
> [snip]
>
>> >
>> > The clusters run nice with normal load except when doing backup of
>> > files or optimize of the databases. At this time we got a huge increment of data coming by the
>> mysqldump to the backup resource or from the resource mounted on /export.
>> > Sometimes when performing the backup or optimizing the database (done
>> > just on the mysql cluster), the Pacemaker declares a node dead (but
>> > its not)
>>
>> Well you know that, but it doesn't :)
>> It just knows it can't talk to its peer anymore - which it has to treat as a failure.
>>
>> > and start the recovering process. When it happens we end up with two
>> > machines getting restarted and most of the times with a database crash
>> > :-(
>> >
>> > As you can see below, just about 30 seconds after the dump starts on diana the problem happens.
>> > ----------------------------------------------------------------
>
> [snip]
>
>> > 04:27:31 diana lrmd: [2919]: info: RA output: (httpd:1:monitor:stderr)
>> > redirecting to systemctl Feb  6 04:28:31 diana lrmd: [2919]: info: RA
>> > output: (httpd:1:monitor:stderr) redirecting to systemctl Feb  6
>> > 04:29:31 diana lrmd: [2919]: info: RA output: (httpd:1:monitor:stderr)
>> > redirecting to systemctl Feb  6 04:30:01 diana /USR/SBIN/CRON[1257]:
>> > (root) CMD (/root/scripts/bkp_database_diario.sh)
>> > Feb  6 04:30:31 diana lrmd: [2919]: info: RA output:
>> > (httpd:1:monitor:stderr) redirecting to systemctl Feb  6 04:31:31
>> > diana lrmd: [2919]: info: RA output: (httpd:1:monitor:stderr)
>> > redirecting to systemctl Feb  6 04:31:42 diana lrmd: [2919]: WARN: ip_intranet:0:monitor process
>> (PID 1902) timed out (try 1).  Killing with signal SIGTERM (15).
>>
>> I'd increase the timeout here. Or put pacemaker into maintenance mode (where it will not act on
>> failures) while you do the backups - but thats more dangerous.
>>
>> > Feb  6 04:31:47 diana corosync[2902]:  [CLM   ] CLM CONFIGURATION CHANGE
>> > Feb  6 04:31:47 diana corosync[2902]:  [CLM   ] New Configuration:
>> > Feb  6 04:31:47 diana corosync[2902]:  [CLM   ] #011r(0) ip(10.10.1.2) r(1) ip(10.10.10.9)
>> > Feb  6 04:31:47 diana corosync[2902]:  [CLM   ] Members Left:
>> > Feb  6 04:31:47 diana corosync[2902]:  [CLM   ] #011r(0) ip(10.10.1.1) r(1) ip(10.10.10.8)
>> > Feb  6 04:31:47 diana corosync[2902]:  [CLM   ] Members Joined:
>> >
>>
>> This appears to be the (almost) root of your problem.
>> The load is staving corosync of CPU (or possibly network bandwidth) and it can no longer talk to its
>> peer.
>> Corosync then informs pacemaker who initiates recovery.
>>
>> I'd start by tuning some of your timeout values in corosync.conf
>>
>
> It should be the CPU, because I can see it going to 100% of usage on the cacti graph.
> Also we got two rings for corosync, one affected by the data flow ate the backup time and another with free badwidth.
>
> This is the totem session of my configuration.
>
> totem {
>         version:        2
>         token:          5000
>         token_retransmits_before_loss_const: 10
>         join:           60
>         consensus:      6000
>         vsftype:        none
>         max_messages:   20
>         clear_node_high_bit: yes
>         secauth:        off
>         threads:        0
>         rrp_mode: active
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 10.10.1.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5406
>                 ttl: 1
>         }
>         interface {
>                 ringnumber: 1
>                 bindnetaddr: 10.10.10.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5406
>                 ttl: 1
>         }
> }
>
> Can you kindly point what timer/counter should I play with?

I would start by making these higher, perhaps double them and see what
effect it has.

        token:          5000
        token_retransmits_before_loss_const: 10

> What are the reasonable values for them? I got scared with this warning "It is not recommended to alter this value without guidance
> from the corosync community."
> Is there any benefits of changing the rrp_mode from active to passive?

Not something I've played with, sorry.

> Should it be done on both hosts?

It should be the same I would imagine.

>
>> > ----------------------------------------------------------------
>> >
>> > Feb  6 04:30:32 apolo lrmd: [2855]: info: RA output:
>> > (httpd:0:monitor:stderr) redirecting to systemctl Feb  6 04:31:32
>> > apolo lrmd: [2855]: info: RA output: (httpd:0:monitor:stderr) redirecting to systemctl Feb  6
>> 04:31:41 apolo corosync[2848]:  [TOTEM ] A processor failed, forming new configuration.
>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] CLM CONFIGURATION CHANGE
>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] New Configuration:
>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] #011r(0) ip(10.10.1.1) r(1) ip(10.10.10.8)
>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] Members Left:
>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] #011r(0) ip(10.10.1.2) r(1) ip(10.10.10.9)
>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] Members Joined:
>> > Feb  6 04:31:47 apolo corosync[2848]:  [pcmk  ] notice:
>> > pcmk_peer_update: Transitional membership event on ring 304: memb=1,
>> > new=0,
>> > lost=1
>
> [snip]
>
>> >
>> > After lots of log apolo asks diana to reboot and sometime after that it got rebooted too.
>> > We had an old cluster with heartbeat and DRBD used to cause it on that system but now looks like
>> Pacemaker is the guilt.
>> >
>> > Here is my Pacemaker and DRBD configuration
>> > http://www2.connection.com.br/cbastos/pacemaker/crm_config
>> > http://www2.connection.com.br/cbastos/pacemaker/drbd_conf/global_commo
>> > n.setup
>> > http://www2.connection.com.br/cbastos/pacemaker/drbd_conf/backup.res
>> > http://www2.connection.com.br/cbastos/pacemaker/drbd_conf/export.res
>> >
>> > And more detailed logs
>> > http://www2.connection.com.br/cbastos/pacemaker/reboot_apolo
>> > http://www2.connection.com.br/cbastos/pacemaker/reboot_diana
>> >
>
> Best regards,
> Carlos.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org