[Pacemaker] UPDATE...2 node cluster with clvm, configuration help needed...

Patrik.Rapposch at knapp.com Patrik.Rapposch at knapp.com
Tue Jun 15 05:09:15 EDT 2010






hy guys,

my colleague gave me a tip, that the stonith ressource on node 1, when node
2 is offline, won't work cause of a false state (cant reach the asm module
of node 2) and so the other ressources (vg, lv) can't start.
Based on this I modified the ibmrsa plugin in following way:

I changed the return value of "/usr/lib64/stonith/plugins/external/ibmrsa"
in line 44 to 0, so that there is no false state for the stonith device and
the remaining node (node 1) can start the ressources.

So this problem is fixed for our needs.

The other question concerning the storage is still open.

Further I mentioned, that I have no problem, when a node loses the
connection to the gateway (ping ressource), but I have a problem with this.
Because when the connection is up again,
the ring stays faulty and won't return. Not even when I manually try to
make the ring clean again with "corosync-cfgtool -r". - I also open a call
@ novell concerning this problem.

The strace ouptut from" corosync-cfgtool -r" can be found in the
attachement.

(See attached file: strace_output_corosync-cfgtool_-r.txt)

thx for replies.

kr patrik



Mit freundlichen Grüßen / Best Regards

Patrik Rapposch
System Administration

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria
Phone: +43 3842 805-915
Fax: +43 3842 82930-500
peter.wratitsch at knapp.com
www.KNAPP.com

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential
and intended to be for the use of the addressee(s) only. If you have
received the e-mail by mistake, any disclosure, copy, distribution or use
of the contents of the e-mail is prohibited, and you must delete the e-mail
from your system. As e-mail can be changed electronically KNAPP assumes no
responsibility for any alteration to this e-mail or its attachments. KNAPP
has taken every reasonable precaution to ensure that any attachment to this
e-mail has been swept for virus. However, KNAPP does not accept any
liability for damage sustained as a result of such attachment being virus
infected and strongly recommend that you carry out your own virus check
before opening any attachment.


                                                                           
             Patrik.Rapposch at k                                             
             napp.com                                                      
                                                                        An 
             15.06.2010 09:12            The Pacemaker cluster resource    
                                         manager                           
                                         <pacemaker at oss.clusterlabs.org>   
              Bitte antworten                                        Kopie 
                    an                                                     
               The Pacemaker                                         Thema 
             cluster resource            [Pacemaker] 2 node cluster with   
                  manager                clvm, configuration help          
             <pacemaker at oss.cl           needed...                         
              usterlabs.org>                                               
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           









hy,

as I told you, I am going to test the clvm cluster with the new service
packs vor SLES11 and the HA edition.

The versions in there are following:
"pacemaker-1.1.2-0.2.1"
"corosync-1.2.1-0.5.1"
"openais-1.1.2-0.5.19".

The problem that only one ring is supported by the dlm is now gone and I
have it running with 2 rings right now.

Including a ping ressource, the loss of connection is also covered and
works fine.

The only problem I have is, when I power  off the node, which holds the
volume group and logical volume ressources: the ressources on the cluster
go in unclean state (stonith, vg, lv resources).
The failover of the ressources then doesn't work, till the node gets power
again. I maybe think, that this has something to do with my stonith
ressource, because as soon as the asm module gets power again, the failover
of the ressources to the running node works. We already updatet the asm
module to the newest version, but this didn't help.

Another question I have is following: Is it possible, that the cluster
checks the loss of fiber channel connection to the storage. (We are
connected to the storage via fc switches, and have 2 paths) We tried
pulling of the fiber channel connection, and could recognize that our
volume group we defined, fails. The group fails, but no failover happens
nor anything else.

I add my configuration, maybe you see a configuration failure. If you need
log files, please tell me.

Thx for your replies.

kr patrik

(See attached file: cib_150610_0909.xml)


Mit freundlichen Grüßen / Best Regards

Patrik Rapposch
System Administration

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria
Phone: +43 3842 805-915
Fax: +43 3842 82930-500
peter.wratitsch at knapp.com
www.KNAPP.com

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential
and intended to be for the use of the addressee(s) only. If you have
received the e-mail by mistake, any disclosure, copy, distribution or use
of the contents of the e-mail is prohibited, and you must delete the e-mail
from your system. As e-mail can be changed electronically KNAPP assumes no
responsibility for any alteration to this e-mail or its attachments. KNAPP
has taken every reasonable precaution to ensure that any attachment to this
e-mail has been swept for virus. However, KNAPP does not accept any
liability for damage sustained as a result of such attachment being virus
infected and strongly recommend that you carry out your own virus check
before opening any attachment.



             Patrik.Rapposch at k
             napp.com
                                                                        An
             07.06.2010 07:44            The Pacemaker cluster resource
                                         manager
                                         <pacemaker at oss.clusterlabs.org>
              Bitte antworten                                        Kopie
                    an
               The Pacemaker                                         Thema
             cluster resource            [Pacemaker] 2 node cluster with
                  manager                clvm, configuration help
             <pacemaker at oss.cl           needed...
              usterlabs.org>













hy,

thx for your answers.
I tried out, modifying the crm file, didn't get any new output. I wanted to
use the opensuse packages, because they were newer then the sles11 packages
which are in the hae extension.

finally novell managed to make the sp1 for sles11 and the hae extension
available, i'll download it, and try it out in the next few hours, hope
that it works with the new versions.
we'll see, i'll inform u then.

thx.

kr patrik ;)


Mit freundlichen Grüßen / Best Regards

Patrik Rapposch
System Administration

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria
Phone: +43 3842 805-915
Fax: +43 3842 82930-500
peter.wratitsch at knapp.com
www.KNAPP.com

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential
and intended to be for the use of the addressee(s) only. If you have
received the e-mail by mistake, any disclosure, copy, distribution or use
of the contents of the e-mail is prohibited, and you must delete the e-mail
from your system. As e-mail can be changed electronically KNAPP assumes no
responsibility for any alteration to this e-mail or its attachments. KNAPP
has taken every reasonable precaution to ensure that any attachment to this
e-mail has been swept for virus. However, KNAPP does not accept any
liability for damage sustained as a result of such attachment being virus
infected and strongly recommend that you carry out your own virus check
before opening any attachment.



             Dejan Muhamedagic
             <dejanmm at fastmail
             .fm>                                                       An
                                         The Pacemaker cluster resource
             04.06.2010 13:14            manager
                                         <pacemaker at oss.clusterlabs.org>
                                                                     Kopie
              Bitte antworten
                    an                                               Thema
               The Pacemaker             Re: [Pacemaker] 2 node cluster
             cluster resource            with clvm, configuration help
                  manager                needed...
             <pacemaker at oss.cl
              usterlabs.org>








On Fri, Jun 04, 2010 at 10:03:09AM +0200, Dejan Muhamedagic wrote:
> On Thu, Jun 03, 2010 at 07:57:59AM +0200, Andrew Beekhof wrote:
> > On Wed, Jun 2, 2010 at 1:25 PM,  <Patrik.Rapposch at knapp.com> wrote:
> > >
> > >
> > >
> > >
> > >
> > > hy,
> > >
> > > thx for your reply.
> > >
> > > I installed python-curses and xml, but didn't help.
> >
> > Dejan?  Thoughts?
>
> For whatever reason "import crm.main" fails. Patrik, could you
> remove the try/expect around it (in /usr/sbin/crm) and try again,
> perhaps it'll show a more specific error message.

Looking again at the code, it is most probably that the package
just can't be used on SLES, i.e. that the python paths for
modules differs. You can verify that with 'rpm -ql | grep /crm/'
and compare the output to the paths from the error message.

Thanks,

Dejan

> Otherwise, why do you want to install opensuse 11.0 packages on
> SLES11? It probably won't work and anyway you definitely won't
> get any support for that.
>
> Thanks,
>
> Dejan
>
> > > Yeah first we used the hae extension, but as you told us, that the
versions
> > > we use, are really old and this could be the problem, we tried to
upgrade
> > > to newer versions to get it running.
> > >
> > > Is there maybe another way to get it running with newer versions?
> >
> > was there nothing newer from yum?
> > I'm pretty sure the packages have been updated since then.
> >
> > > or could
> > > you may please have a look on my config, which I had in the old
running
> > > versions? I reattach it right now.
> > >
> > > thx.
> > >
> > > kr, patrik
> > >
> > > (See attached file: cib_aktuell.xml)
> > >
> > > Mit freundlichen Grüßen / Best Regards
> > >
> > > Patrik Rapposch
> > > System Administration
> > >
> > > KNAPP Systemintegration GmbH
> > > Waltenbachstraße 9
> > > 8700 Leoben, Austria
> > > Phone: +43 3842 805-915
> > > Fax: +43 3842 82930-500
> > > peter.wratitsch at knapp.com
> > > www.KNAPP.com
> > >
> > > Commercial register number: FN 138870x
> > > Commercial register court: Leoben
> > >
> > > The information in this e-mail (including any attachment) is
confidential
> > > and intended to be for the use of the addressee(s) only. If you have
> > > received the e-mail by mistake, any disclosure, copy, distribution or
use
> > > of the contents of the e-mail is prohibited, and you must delete the
e-mail
> > > from your system. As e-mail can be changed electronically KNAPP
assumes no
> > > responsibility for any alteration to this e-mail or its attachments.
KNAPP
> > > has taken every reasonable precaution to ensure that any attachment
to this
> > > e-mail has been swept for virus. However, KNAPP does not accept any
> > > liability for damage sustained as a result of such attachment being
virus
> > > infected and strongly recommend that you carry out your own virus
check
> > > before opening any attachment.
> > >
> > >
> > >
> > >             Andrew Beekhof
> > >             <andrew at beekhof.n
> > >
et>                                                        An
> > >                                         The Pacemaker cluster
resource
> > >             02.06.2010 12:53            manager
> > >
<pacemaker at oss.clusterlabs.org>
> > >
Kopie
> > >              Bitte antworten
> > >                    an
Thema
> > >               The Pacemaker             Re: [Pacemaker] Antwort: Re:
> > >             cluster resource            Antwort: Re: 2 node cluster
with
> > >                  manager                clvm,     configuration help
> > >             <pacemaker at oss.cl           needed...
> > >              usterlabs.org>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jun 2, 2010 at 7:50 AM,  <Patrik.Rapposch at knapp.com> wrote:
> > >>
> > >>
> > >>
> > >>
> > >> hy,
> > >>
> > >> so I tried yesterday to update to a newer version. I am using
sles11. At
> > >> least it worked with the opensuse 11.0 repo
> > >> (http://www.clusterlabs.org/rpm/opensuse-11.0/x86_64/) and one
additional
> > >> library, which I got as rpm.
> > >>
> > >> The problem I have now is, that if I want to run the crm command, I
get
> > >> following error:
> > >>
> > >> "abort: couldn't find crm libraries in [/usr/sbin
/usr/lib/python26.zip
> > >> /usr/lib64/python2.6 /usr/lib64/python2.6/plat-linux2
> > >> /usr/lib64/python2.6/lib-tk /usr/lib64/python2.6/lib-old
> > >> /usr/lib64/python2.6/lib-dynload /usr/lib64/python2.6/site-packages
> > >> /usr/lib64/python2.6/site-packages/Numeric
> > >> /usr/local/lib64/python2.6/site-packages
> > >> /usr/lib64/python2.6/site-packages/gtk-2.0]
> > >> (check your install and PYTHONPATH)"
> > >>
> > >> I don't know what libraries it is exactly searching for,
> > >
> > > you might be missing python-curses and python-xml
> > >
> > >> I tried
> > >> rearranging my PYTHONPATH to some directories, but had no access.
The
> > > next
> > >> thing I saw was, that it now works with corosync (had to configure
it)
> > >> instead of openais and that the gui totally disappeared, so I have
no
> > >> commands like "crm_gui" or "hb_gui".
> > >
> > > Since you're on SLES, have you thought about using the HAE extension?
> > > It has all the above plus the gui.
> > >
> > >>
> > >> Do you maybe know how to fix this, or do you know a successfull way
to
> > >> implement a newer version into sles11. Service pack for sles11
should be
> > >> available today, but they didn't make it available right now, so I
dunno
> > > if
> > >> there is also a hae sp1, which has newer versions in it.
> > >>
> > >> Thx for your help.
> > >>
> > >> Mit freundlichen Grüßen / Best Regards
> > >>
> > >> Patrik Rapposch, Bsc.
> > >> Systemadministration
> > >>
> > >> KNAPP Systemintegration GmbH
> > >> Waltenbachstraße 9
> > >> 8700 Leoben, Austria
> > >> Phone: +43 3842 805
> > >> Mobil:
> > >> Fax: +43 3842 82930-990
> > >> patrik.rapposch at knapp.com
> > >> www.KNAPP.com
> > >>
> > >> Commercial register number: FN 138870x
> > >> Commercial register court: Leoben
> > >>
> > >>
> > >>
> > >>             Andrew Beekhof
> > >>             <andrew at beekhof.n
> > >>
et>                                                        An
> > >>                                         The Pacemaker cluster
resource
> > >>             31.05.2010 08:46            manager
> > >>
<pacemaker at oss.clusterlabs.org>
> > >>
Kopie
> > >>              Bitte antworten
> > >>                    an
Thema
> > >>               The Pacemaker             Re: [Pacemaker] Antwort: Re:
2
> > >>             cluster resource            node cluster with clvm,
> > >>                  manager                configuration   help
needed...
> > >>             <pacemaker at oss.cl
> > >>              usterlabs.org>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, May 31, 2010 at 8:37 AM,  <Patrik.Rapposch at knapp.com> wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> hy,
> > >>>
> > >>> thx for your reply.
> > >>> k, i'll try that in the next few hours.
> > >>>
> > >>> is there any other possibility, why there is such a strange
behaviour in
> > >>> the cluster?
> > >>>
> > >>> I short redesribe the main problem:
> > >>>
> > >>> failover between the nodes works fine, the ressources get started
on the
> > >>> remaining node (let it be node2). When node1 comes back online,
> > >>> the resources on node2 get stopped and started again on node2. --->
very
> > >>> strange. I already tried a lot, but didn't find a solution.
> > >>>
> > >>> so the failback has a bug.
> > >>
> > >> Or did have a year ago when 1.0.3 was out... you might have more
luck
> > >> with something a little more recent.
> > >>
> > >> _______________________________________________
> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs:
> > >
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >>
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs:
> > >
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
(See attached file: cib_150610_0909.xml)
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace_output_corosync-cfgtool_-r.txt
Type: application/octet-stream
Size: 8772 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/17f83d41/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib_150610_0909.xml
Type: application/octet-stream
Size: 44046 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/17f83d41/attachment-0007.obj>


More information about the Pacemaker mailing list