[Pacemaker] SECOND UPDATE...2 node cluster with clvm, configuration help needed...
Andrew Beekhof
andrew at beekhof.net
Fri Jun 18 06:47:02 UTC 2010
On Wed, Jun 16, 2010 at 1:44 PM, <Patrik.Rapposch at knapp.com> wrote:
>
>
>
>
>
> hy,
>
> the other problems are still open, but I found another problem.
>
> We configured stonith so that it should power off one node. This didn't
> work and resulted in following error message:
>
> "Jun 16 09:00:38 kxxxxkc1 stonith-ng: [3871]: ERROR: log_operation:
> Operation 'poweroff' [12614] for host 'kxxxxkc2' with device
> 'kill-kxxxxkc2-fire-from-kxxxxkc1' returned: 1 (call 0 from (null))
> Jun 16 09:00:38 kxxxxkc1 stonith-ng: [3871]: ERROR: stonith_command:
> Unknown st_fence reply from kxxxxkc1
> Jun 16 09:00:38 kxxxxkc1 stonith-ng: [3871]: WARN: log_data_element:
> stonith_command: UnknownOp <st-reply
> st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_fence"
> st_remote_op="e814c6ce-41f3-4e8b-b5b6-301d8056f37b" st_callid="0"
> st_callopt="0" st_rc="1" st_output="failed: unrecognised action: poweroff "
> src="kxxxxkc1" seq="60" />"
>
> We played around a little and expanded the external ibmrsa plugin with an
> echo, which supports us the value, which the ibmrsa get's from stonithd. We
> found out, that it doesn't get an "off" message as it should,
Hmmm. Could you include a hb_report for this please?
I'd need to see more than just those two log lines.
> But if you configure reboot (the default value) for stonith, the ibmrsa
> plugin gets the reset value, as it should and reboots the faulty node.
>
> So this is probably a bug in the stonithd, as i guess, because it can't
> handle the poweroff command.
>
> For our needs we changed the condition in the plugin, so that the reset
> value issues the mpcli command to power off the node.
>
> Is this a known issue, because we didn't find anything to it.
>
> Does anyone have a glue with our other problems? --> 1. fiber channel
> connection loss to the storage, 2. hang of reenable a ring with
> corosync-cfgtool -r.
>
> thx for replies.
>
> kr
>
>
> Mit freundlichen Grüßen / Best Regards
>
> Patrik Rapposch
> System Administration
>
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria
> Phone: +43 3842 805-915
> Fax: +43 3842 82930-500
> peter.wratitsch at knapp.com
> www.KNAPP.com
>
> Commercial register number: FN 138870x
> Commercial register court: Leoben
>
> The information in this e-mail (including any attachment) is confidential
> and intended to be for the use of the addressee(s) only. If you have
> received the e-mail by mistake, any disclosure, copy, distribution or use
> of the contents of the e-mail is prohibited, and you must delete the e-mail
> from your system. As e-mail can be changed electronically KNAPP assumes no
> responsibility for any alteration to this e-mail or its attachments. KNAPP
> has taken every reasonable precaution to ensure that any attachment to this
> e-mail has been swept for virus. However, KNAPP does not accept any
> liability for damage sustained as a result of such attachment being virus
> infected and strongly recommend that you carry out your own virus check
> before opening any attachment.
>
>
>
> Patrik.Rapposch at k
> napp.com
> An
> 15.06.2010 11:11 The Pacemaker cluster resource
> manager
> <pacemaker at oss.clusterlabs.org>
> Bitte antworten Kopie
> an
> The Pacemaker Thema
> cluster resource [Pacemaker] UPDATE...2 node
> manager cluster with clvm,
> <pacemaker at oss.cl configuration help needed...
> usterlabs.org>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> hy guys,
>
> my colleague gave me a tip, that the stonith ressource on node 1, when node
> 2 is offline, won't work cause of a false state (cant reach the asm module
> of node 2) and so the other ressources (vg, lv) can't start.
> Based on this I modified the ibmrsa plugin in following way:
>
> I changed the return value of "/usr/lib64/stonith/plugins/external/ibmrsa"
> in line 44 to 0, so that there is no false state for the stonith device and
> the remaining node (node 1) can start the ressources.
>
> So this problem is fixed for our needs.
>
> The other question concerning the storage is still open.
>
> Further I mentioned, that I have no problem, when a node loses the
> connection to the gateway (ping ressource), but I have a problem with this.
> Because when the connection is up again,
> the ring stays faulty and won't return. Not even when I manually try to
> make the ring clean again with "corosync-cfgtool -r". - I also open a call
> @ novell concerning this problem.
>
> The strace ouptut from" corosync-cfgtool -r" can be found in the
> attachement.
>
> (See attached file: strace_output_corosync-cfgtool_-r.txt)
>
> thx for replies.
>
> kr patrik
>
>
>
> Mit freundlichen Grüßen / Best Regards
>
> Patrik Rapposch
> System Administration
>
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria
> Phone: +43 3842 805-915
> Fax: +43 3842 82930-500
> peter.wratitsch at knapp.com
> www.KNAPP.com
>
> Commercial register number: FN 138870x
> Commercial register court: Leoben
>
> The information in this e-mail (including any attachment) is confidential
> and intended to be for the use of the addressee(s) only. If you have
> received the e-mail by mistake, any disclosure, copy, distribution or use
> of the contents of the e-mail is prohibited, and you must delete the e-mail
> from your system. As e-mail can be changed electronically KNAPP assumes no
> responsibility for any alteration to this e-mail or its attachments. KNAPP
> has taken every reasonable precaution to ensure that any attachment to this
> e-mail has been swept for virus. However, KNAPP does not accept any
> liability for damage sustained as a result of such attachment being virus
> infected and strongly recommend that you carry out your own virus check
> before opening any attachment.
>
>
>
> Patrik.Rapposch at k
> napp.com
> An
> 15.06.2010 09:12 The Pacemaker cluster resource
> manager
> <pacemaker at oss.clusterlabs.org>
> Bitte antworten Kopie
> an
> The Pacemaker Thema
> cluster resource [Pacemaker] 2 node cluster with
> manager clvm, configuration help
> <pacemaker at oss.cl needed...
> usterlabs.org>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> hy,
>
> as I told you, I am going to test the clvm cluster with the new service
> packs vor SLES11 and the HA edition.
>
> The versions in there are following:
> "pacemaker-1.1.2-0.2.1"
> "corosync-1.2.1-0.5.1"
> "openais-1.1.2-0.5.19".
>
> The problem that only one ring is supported by the dlm is now gone and I
> have it running with 2 rings right now.
>
> Including a ping ressource, the loss of connection is also covered and
> works fine.
>
> The only problem I have is, when I power off the node, which holds the
> volume group and logical volume ressources: the ressources on the cluster
> go in unclean state (stonith, vg, lv resources).
> The failover of the ressources then doesn't work, till the node gets power
> again. I maybe think, that this has something to do with my stonith
> ressource, because as soon as the asm module gets power again, the failover
> of the ressources to the running node works. We already updatet the asm
> module to the newest version, but this didn't help.
>
> Another question I have is following: Is it possible, that the cluster
> checks the loss of fiber channel connection to the storage. (We are
> connected to the storage via fc switches, and have 2 paths) We tried
> pulling of the fiber channel connection, and could recognize that our
> volume group we defined, fails. The group fails, but no failover happens
> nor anything else.
>
> I add my configuration, maybe you see a configuration failure. If you need
> log files, please tell me.
>
> Thx for your replies.
>
> kr patrik
>
> (See attached file: cib_150610_0909.xml)
>
>
> Mit freundlichen Grüßen / Best Regards
>
> Patrik Rapposch
> System Administration
>
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria
> Phone: +43 3842 805-915
> Fax: +43 3842 82930-500
> peter.wratitsch at knapp.com
> www.KNAPP.com
>
> Commercial register number: FN 138870x
> Commercial register court: Leoben
>
> The information in this e-mail (including any attachment) is confidential
> and intended to be for the use of the addressee(s) only. If you have
> received the e-mail by mistake, any disclosure, copy, distribution or use
> of the contents of the e-mail is prohibited, and you must delete the e-mail
> from your system. As e-mail can be changed electronically KNAPP assumes no
> responsibility for any alteration to this e-mail or its attachments. KNAPP
> has taken every reasonable precaution to ensure that any attachment to this
> e-mail has been swept for virus. However, KNAPP does not accept any
> liability for damage sustained as a result of such attachment being virus
> infected and strongly recommend that you carry out your own virus check
> before opening any attachment.
>
>
>
> Patrik.Rapposch at k
> napp.com
> An
> 07.06.2010 07:44 The Pacemaker cluster resource
> manager
> <pacemaker at oss.clusterlabs.org>
> Bitte antworten Kopie
> an
> The Pacemaker Thema
> cluster resource [Pacemaker] 2 node cluster with
> manager clvm, configuration help
> <pacemaker at oss.cl needed...
> usterlabs.org>
>
>
>
>
>
>
>
>
>
>
>
>
>
> hy,
>
> thx for your answers.
> I tried out, modifying the crm file, didn't get any new output. I wanted to
> use the opensuse packages, because they were newer then the sles11 packages
> which are in the hae extension.
>
> finally novell managed to make the sp1 for sles11 and the hae extension
> available, i'll download it, and try it out in the next few hours, hope
> that it works with the new versions.
> we'll see, i'll inform u then.
>
> thx.
>
> kr patrik ;)
>
>
> Mit freundlichen Grüßen / Best Regards
>
> Patrik Rapposch
> System Administration
>
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria
> Phone: +43 3842 805-915
> Fax: +43 3842 82930-500
> peter.wratitsch at knapp.com
> www.KNAPP.com
>
> Commercial register number: FN 138870x
> Commercial register court: Leoben
>
> The information in this e-mail (including any attachment) is confidential
> and intended to be for the use of the addressee(s) only. If you have
> received the e-mail by mistake, any disclosure, copy, distribution or use
> of the contents of the e-mail is prohibited, and you must delete the e-mail
> from your system. As e-mail can be changed electronically KNAPP assumes no
> responsibility for any alteration to this e-mail or its attachments. KNAPP
> has taken every reasonable precaution to ensure that any attachment to this
> e-mail has been swept for virus. However, KNAPP does not accept any
> liability for damage sustained as a result of such attachment being virus
> infected and strongly recommend that you carry out your own virus check
> before opening any attachment.
>
>
>
> Dejan Muhamedagic
> <dejanmm at fastmail
> .fm> An
> The Pacemaker cluster resource
> 04.06.2010 13:14 manager
> <pacemaker at oss.clusterlabs.org>
> Kopie
> Bitte antworten
> an Thema
> The Pacemaker Re: [Pacemaker] 2 node cluster
> cluster resource with clvm, configuration help
> manager needed...
> <pacemaker at oss.cl
> usterlabs.org>
>
>
>
>
>
>
>
>
> On Fri, Jun 04, 2010 at 10:03:09AM +0200, Dejan Muhamedagic wrote:
>> On Thu, Jun 03, 2010 at 07:57:59AM +0200, Andrew Beekhof wrote:
>> > On Wed, Jun 2, 2010 at 1:25 PM, <Patrik.Rapposch at knapp.com> wrote:
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > hy,
>> > >
>> > > thx for your reply.
>> > >
>> > > I installed python-curses and xml, but didn't help.
>> >
>> > Dejan? Thoughts?
>>
>> For whatever reason "import crm.main" fails. Patrik, could you
>> remove the try/expect around it (in /usr/sbin/crm) and try again,
>> perhaps it'll show a more specific error message.
>
> Looking again at the code, it is most probably that the package
> just can't be used on SLES, i.e. that the python paths for
> modules differs. You can verify that with 'rpm -ql | grep /crm/'
> and compare the output to the paths from the error message.
>
> Thanks,
>
> Dejan
>
>> Otherwise, why do you want to install opensuse 11.0 packages on
>> SLES11? It probably won't work and anyway you definitely won't
>> get any support for that.
>>
>> Thanks,
>>
>> Dejan
>>
>> > > Yeah first we used the hae extension, but as you told us, that the
> versions
>> > > we use, are really old and this could be the problem, we tried to
> upgrade
>> > > to newer versions to get it running.
>> > >
>> > > Is there maybe another way to get it running with newer versions?
>> >
>> > was there nothing newer from yum?
>> > I'm pretty sure the packages have been updated since then.
>> >
>> > > or could
>> > > you may please have a look on my config, which I had in the old
> running
>> > > versions? I reattach it right now.
>> > >
>> > > thx.
>> > >
>> > > kr, patrik
>> > >
>> > > (See attached file: cib_aktuell.xml)
>> > >
>> > > Mit freundlichen Grüßen / Best Regards
>> > >
>> > > Patrik Rapposch
>> > > System Administration
>> > >
>> > > KNAPP Systemintegration GmbH
>> > > Waltenbachstraße 9
>> > > 8700 Leoben, Austria
>> > > Phone: +43 3842 805-915
>> > > Fax: +43 3842 82930-500
>> > > peter.wratitsch at knapp.com
>> > > www.KNAPP.com
>> > >
>> > > Commercial register number: FN 138870x
>> > > Commercial register court: Leoben
>> > >
>> > > The information in this e-mail (including any attachment) is
> confidential
>> > > and intended to be for the use of the addressee(s) only. If you have
>> > > received the e-mail by mistake, any disclosure, copy, distribution or
> use
>> > > of the contents of the e-mail is prohibited, and you must delete the
> e-mail
>> > > from your system. As e-mail can be changed electronically KNAPP
> assumes no
>> > > responsibility for any alteration to this e-mail or its attachments.
> KNAPP
>> > > has taken every reasonable precaution to ensure that any attachment
> to this
>> > > e-mail has been swept for virus. However, KNAPP does not accept any
>> > > liability for damage sustained as a result of such attachment being
> virus
>> > > infected and strongly recommend that you carry out your own virus
> check
>> > > before opening any attachment.
>> > >
>> > >
>> > >
>> > > Andrew Beekhof
>> > > <andrew at beekhof.n
>> > >
> et> An
>> > > The Pacemaker cluster
> resource
>> > > 02.06.2010 12:53 manager
>> > >
> <pacemaker at oss.clusterlabs.org>
>> > >
> Kopie
>> > > Bitte antworten
>> > > an
> Thema
>> > > The Pacemaker Re: [Pacemaker] Antwort: Re:
>> > > cluster resource Antwort: Re: 2 node cluster
> with
>> > > manager clvm, configuration help
>> > > <pacemaker at oss.cl needed...
>> > > usterlabs.org>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jun 2, 2010 at 7:50 AM, <Patrik.Rapposch at knapp.com> wrote:
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> hy,
>> > >>
>> > >> so I tried yesterday to update to a newer version. I am using
> sles11. At
>> > >> least it worked with the opensuse 11.0 repo
>> > >> (http://www.clusterlabs.org/rpm/opensuse-11.0/x86_64/) and one
> additional
>> > >> library, which I got as rpm.
>> > >>
>> > >> The problem I have now is, that if I want to run the crm command, I
> get
>> > >> following error:
>> > >>
>> > >> "abort: couldn't find crm libraries in [/usr/sbin
> /usr/lib/python26.zip
>> > >> /usr/lib64/python2.6 /usr/lib64/python2.6/plat-linux2
>> > >> /usr/lib64/python2.6/lib-tk /usr/lib64/python2.6/lib-old
>> > >> /usr/lib64/python2.6/lib-dynload /usr/lib64/python2.6/site-packages
>> > >> /usr/lib64/python2.6/site-packages/Numeric
>> > >> /usr/local/lib64/python2.6/site-packages
>> > >> /usr/lib64/python2.6/site-packages/gtk-2.0]
>> > >> (check your install and PYTHONPATH)"
>> > >>
>> > >> I don't know what libraries it is exactly searching for,
>> > >
>> > > you might be missing python-curses and python-xml
>> > >
>> > >> I tried
>> > >> rearranging my PYTHONPATH to some directories, but had no access.
> The
>> > > next
>> > >> thing I saw was, that it now works with corosync (had to configure
> it)
>> > >> instead of openais and that the gui totally disappeared, so I have
> no
>> > >> commands like "crm_gui" or "hb_gui".
>> > >
>> > > Since you're on SLES, have you thought about using the HAE extension?
>> > > It has all the above plus the gui.
>> > >
>> > >>
>> > >> Do you maybe know how to fix this, or do you know a successfull way
> to
>> > >> implement a newer version into sles11. Service pack for sles11
> should be
>> > >> available today, but they didn't make it available right now, so I
> dunno
>> > > if
>> > >> there is also a hae sp1, which has newer versions in it.
>> > >>
>> > >> Thx for your help.
>> > >>
>> > >> Mit freundlichen Grüßen / Best Regards
>> > >>
>> > >> Patrik Rapposch, Bsc.
>> > >> Systemadministration
>> > >>
>> > >> KNAPP Systemintegration GmbH
>> > >> Waltenbachstraße 9
>> > >> 8700 Leoben, Austria
>> > >> Phone: +43 3842 805
>> > >> Mobil:
>> > >> Fax: +43 3842 82930-990
>> > >> patrik.rapposch at knapp.com
>> > >> www.KNAPP.com
>> > >>
>> > >> Commercial register number: FN 138870x
>> > >> Commercial register court: Leoben
>> > >>
>> > >>
>> > >>
>> > >> Andrew Beekhof
>> > >> <andrew at beekhof.n
>> > >>
> et> An
>> > >> The Pacemaker cluster
> resource
>> > >> 31.05.2010 08:46 manager
>> > >>
> <pacemaker at oss.clusterlabs.org>
>> > >>
> Kopie
>> > >> Bitte antworten
>> > >> an
> Thema
>> > >> The Pacemaker Re: [Pacemaker] Antwort: Re:
> 2
>> > >> cluster resource node cluster with clvm,
>> > >> manager configuration help
> needed...
>> > >> <pacemaker at oss.cl
>> > >> usterlabs.org>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Mon, May 31, 2010 at 8:37 AM, <Patrik.Rapposch at knapp.com> wrote:
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> hy,
>> > >>>
>> > >>> thx for your reply.
>> > >>> k, i'll try that in the next few hours.
>> > >>>
>> > >>> is there any other possibility, why there is such a strange
> behaviour in
>> > >>> the cluster?
>> > >>>
>> > >>> I short redesribe the main problem:
>> > >>>
>> > >>> failover between the nodes works fine, the ressources get started
> on the
>> > >>> remaining node (let it be node2). When node1 comes back online,
>> > >>> the resources on node2 get stopped and started again on node2. --->
> very
>> > >>> strange. I already tried a lot, but didn't find a solution.
>> > >>>
>> > >>> so the failback has a bug.
>> > >>
>> > >> Or did have a year ago when 1.0.3 was out... you might have more
> luck
>> > >> with something a little more recent.
>> > >>
>> > >> _______________________________________________
>> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>
>> > >> Project Home: http://www.clusterlabs.org
>> > >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >>
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>
>> > >> Project Home: http://www.clusterlabs.org
>> > >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >> Bugs:
>> > >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> > >>
>> > >
>> > > _______________________________________________
>> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >
>> > > Project Home: http://www.clusterlabs.org
>> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > > Bugs:
>> > >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> > >
>> > > _______________________________________________
>> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >
>> > > Project Home: http://www.clusterlabs.org
>> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> > >
>> > >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> (See attached file: cib_150610_0909.xml)
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> (See attached file: strace_output_corosync-cfgtool_-r.txt)(See attached
> file: cib_150610_0909.xml)_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
More information about the Pacemaker
mailing list