[Pacemaker] stonith pacemaker problem

Vladislav Bogdanov bubble at hoster-ok.com
Mon Oct 11 19:51:48 UTC 2010


11.10.2010 09:14, Andrew Beekhof wrote:
> strictly speaking you don't.
> but at least on fedora, the policy is that $x-libs always requires $x
> so just building against heartbeat-libs means that yum will suck in
> the main heartbeat package :-(

And this seem to be a bit incorrect statement btw: usually application
(binary) requires some libraries, and some of that libraries are
provided by -libs package which is built together with the binary. But,
libraries themselves require something from the main package very
rarely. That rare cases are configuration files which are read from
inside of libraries without straight request from an application. And
even in that case that configurations files are (should be) provided by
-common subpackage (which -libs can depend on).
The only point in such requirements is the licenses which are usually
included in main packages. But from my point of view nothing prevents
packager from including license file in %doc stanza for -libs too, so
any 'reverse' dependencies could be easily avoided, leaving only
'straight' ones - what libraries actually depend on.
This is what I'm surprised from corosync, openais and pacemaker - I need
to install corosync and openais packages on development host only
because I need corresponding -libs and -devel packages. This is actually
not a usual for Fedora, and this is really not needed. The main idea of
-libs is to provide dso's which can be used by another applications
without need to install 'main' package (together with all daemons,
initscripts and dependencies on other libs). The same is for -devel - it
really need -libs because it provides .so symlinks to libs for ld, but
it shouldn't depend on main application.

Best,
Vladislav

> 
> glad you found a path forward though
> 
>>  understand that /usr/lib/ocf/resource.d/heartbeat has ocf scripts
>> provided by heartbeat but that can be part of the "Reusable cluster
>> agents" subsystem.
>>
>> Frankly I thought the way I had installed the system by erasing and
>> installing the fresh packages it should have worked.
>>
>> But all said and done I learned a lot of cluster code by gdbing it.
>> I'll be having a peaceful thanksgiving.
>>
>> Thanks and happy thanks giving.
>> Shravan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Oct 10, 2010 at 2:46 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>> Not enough information.
>>> We'd need more than just the lrmd's logs, they only show what happened not why.
>>>
>>> On Thu, Oct 7, 2010 at 11:02 PM, Shravan Mishra
>>> <shravan.mishra at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Description of my environment:
>>>>   corosync=1.2.8
>>>>   pacemaker=1.1.3
>>>>   Linux= 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP
>>>>
>>>>
>>>> We are having a problem with our pacemaker which is continuously
>>>> canceling the monitoring operation of our stonith devices.
>>>>
>>>> We ran:
>>>>
>>>> stonith -d -t external/safe/ipmi hostname=ha2.itactics.com
>>>> ipaddr=192.168.2.7 userid=hellouser passwd=hello interface=lanplus -S
>>>>
>>>> it's output is attached as stonith.output.
>>>>
>>>> We have been trying to debug this issue for  a few days now with no success.
>>>> We are hoping that someone can help us as we are under immense
>>>> pressure to move to RCS unless we can solve this issue in a day or two
>>>> ,which I personally don't want to because we like the product.
>>>>
>>>> Any help will be greatly appreciated.
>>>>
>>>>
>>>> Here is an excerpt from the /var/log/messages:
>>>> =========================
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11155: start
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11156: monitor
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>> monitor[11156] on
>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>> userid=[safe_ipmi_admin]  cancelled
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11157: stop
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11158: start
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11159: monitor
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>> monitor[11159] on
>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>> userid=[safe_ipmi_admin]  cancelled
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11160: stop
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11161: start
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11162: monitor
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>> monitor[11162] on
>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>> userid=[safe_ipmi_admin]  cancelled
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11163: stop
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11164: start
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11165: monitor
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
>>>> monitor[11165] on
>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>> userid=[safe_ipmi_admin]  cancelled
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11166: stop
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11167: start
>>>> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11168: monitor
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
>>>> monitor[11168] on
>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>> userid=[safe_ipmi_admin]  cancelled
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11169: stop
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11170: start
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: stonithRA plugin: got
>>>> metadata: <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM
>>>> "ra-api-1.dtd"> <resource-agent name="external/safe/ipmi">
>>>> <version>1.0</version>   <longdesc lang="en"> ipmitool based power
>>>> management. Apparently, the power off method of ipmitool is
>>>> intercepted by ACPI which then makes a regular shutdown. If case of a
>>>> split brain on a two-node it may happen that no node survives. For
>>>> two-node clusters use only the reset method.    </longdesc>
>>>> <shortdesc lang="en">IPMI STONITH external device </shortdesc>
>>>> <parameters> <parameter name="hostname" unique="1"> <content
>>>> type="string" /> <shortdesc lang="en"> Hostname </shortdesc> <longdesc
>>>> lang="en"> The name of the host to be managed by this STONITH device.
>>>> </longdesc> </parameter>  <parameter name="ipaddr" unique="1">
>>>> <content type="string" /> <shortdesc lang="en"> IP Address
>>>> </shortdesc> <longdesc lang="en"> The IP address of the STONITH
>>>> device. </longdesc> </parameter>  <parameter name="userid" unique="1">
>>>> <content type="string" /> <shortdesc lang="en"> Login </shortdesc>
>>>> <longdesc lang="en"> The username used for logging in to the STONITH
>>>> device. </longdesc> </parameter>  <parameter name="passwd" unique="1">
>>>> <content type="string" /> <shortdesc lang="en"> Password </shortdesc>
>>>> <longdesc lang="en"> The password used for logging in to the STONITH
>>>> device. </longdesc> </parameter>  <parameter name="interface"
>>>> unique="1"> <content type="string" default="lan"/> <shortdesc
>>>> lang="en"> IPMI interface </shortdesc> <longdesc lang="en"> IPMI
>>>> interface to use, such as "lan" or "lanplus". </longdesc> </parameter>
>>>>  </parameters>    <actions>     <action name="start"   timeout="15" />
>>>>    <action name="stop"    timeout="15" />     <action name="status"
>>>> timeout="15" />     <action name="monitor" timeout="15" interval="15"
>>>> start-delay="15" />     <action name="meta-data"  timeout="15" />
>>>> </actions>   <special tag="heartbeat">     <version>2.0</version>
>>>> </special> </resource-agent>
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11171: monitor
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
>>>> monitor[11171] on
>>>> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
>>>> its parameters: CRM_meta_interval=[20000] target_role=[started]
>>>> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
>>>> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
>>>> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
>>>> userid=[safe_ipmi_admin]  cancelled
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11172: stop
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11173: start
>>>> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
>>>> rsc:ha2.itactics.com-stonith:11174: monitor
>>>>
>>>> ==========================
>>>>
>>>> Thanks
>>>>
>>>> Shravan
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list