[Pacemaker] crm_mon/pacemaker split brain
Andrew Beekhof
beekhof at gmail.com
Fri Nov 28 11:32:09 UTC 2008
There's a few things wrong here...
For starters the stonith resources appear to be badly configured.
This means that stonithd fails when we try to shoot the node because
extip_ftp resource isn't able to be stopped.
At which point the cluster can't do anything.
Moving on, you're using underscores instead of dashes in a 1.0 configuration.
So all the meta options are being ignored and its causing the cluster
to explode.
My guess is you loaded an xml fragment from a 0.6 cluster into a blank
1.0 configuration - instead of leaving it in place when you upgraded
and letting cibadmin do the conversion (which would have fixed the
underscores)
On Wed, Nov 19, 2008 at 15:09, Raoul Bhatia [IPAX] <r.bhatia at ipax.at> wrote:
> hi,
>
> crm_mon shows me a kind of split-brain view of my cluster:
>
> common lines on my two nodes:
>
>> ============
>> Last updated: Wed Nov 19 15:02:22 2008
>> Current DC: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396)
>> 2 Nodes configured.
>> 9 Resources configured.
>> ============
>>
>> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
>> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): online
>> ...
>
> wc01's view:
>> Clone Set: clone_nfs-common
>> Resource Group: group_nfs-common:0
>> nfs-common:0 (lsb:nfs-common): Started wc01
>> Resource Group: group_nfs-common:1
>> nfs-common:1 (lsb:nfs-common) Started [ wc01 wc02 ]
>
> wc02's view:
>> Clone Set: clone_nfs-common
>> Resource Group: group_nfs-common:0
>> nfs-common:0 (lsb:nfs-common) Started [ wc01 wc02 ]
>> Resource Group: group_nfs-common:1
>> nfs-common:1 (lsb:nfs-common): Started wc01
>
> the information basically is the same, but the two instances of the
> clone "group_nfs-common:0" and "group_nfs-common:1" are swapped.
>
> the configuration is:
> wc01: pacemaker 1.0.1; heartbeat 2.99.2
> wc02: pacemaker 1.0.0; heartbeat 2.99.1
>
> hb_report available at [1]
>
> cheers,
> raoul
>
> ps: regarding the logfiles, please note that i had different system
> times and just updated the clocks:
>> wc01: 19 Nov 15:03:18 ntpdate[3455]: adjust time server 81.223.14.147 offset 0.002208 sec
>> wc02: 19 Nov 15:03:22 ntpdate[22517]: step time server 81.223.14.147 offset -4.483140 sec
>
> [1] http://ip52.ipax.at/~raoul/cluster/hb_report_splitbrain.tar.bz2
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. office at ipax.at
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
More information about the Pacemaker
mailing list