[Pacemaker] Question on ILO stonith resource config and restarting
Dejan Muhamedagic
dejanmm at fastmail.fm
Tue Nov 4 16:26:11 UTC 2008
On Thu, Oct 30, 2008 at 03:07:24PM -0400, Aaron Bush wrote:
> Just realized that I only included the log entries from the node that
> was not experiencing a network disconnect. Attached are the log entries
> from the node (01) that had the stonith resource running before the
> cable disconnect and looks like they provide some more useful
> information. Also included up through when the network cable was
> reconnected.
The monitor operation on riloe failed. You should definitely
upgrade.
Thanks,
Dejan
>
> -ab
>
> >> I have a 0.6 pacemaker/heartbeat cluster setup in a lab with
> resources
> >> as follows:
> >>
> >> Group-lvs(ordered): two primitives -> ocf/IPddr2 and ocf/ldirectord.
> >> Clone-pingd: set to monitor a couple of Ips and used to set a weight
> for
> >> where to run the LVS group.
> >>
> >> -- This is the area that I have a question on --
> >> Clone-stonith-node1: HP ILO to shoot node1
> >> Clone-stonith-node2: HP ILO to shoot node2
> >>
> >> I read on the old linux-ha site that using a clone for ILO/stonith
> was
> >> the way to go. I'm not sure I see how this would work correctly and
> be
> >> preferred over a standard resource. What I am confused about is
> this:
> >> the external/riloe stonith plugin only knows how to shoot one node so
> >
> >Please make sure that you use the latest edition of
> >external/riloe. The previous one didn't work under all
> >circumstances.
>
> I am using the version that came with heartbeat-common-2.99.0-3.1
> (according rpm -qf)
>
> To clear my current issue where the stonith resource was not started
> (and since this is still in the lab) I have rebooted both nodes to start
> with a somewhat clean slate. I have attempted to grab some more useful
> information from the logs on why the resource is not restarting from.
> Again I disconnect the LAN cable connecting a node to the rest of the
> network (a private HB channel is still available and the ILO is still
> up). I noticed these entries in the log:
>
> Oct 30 13:33:07 wwwlb02 crmd: [6415]: info: do_lrm_rsc_op: Performing
> op=cl_stonith_lb02:0_start_0
> key=18:7:0:efbdb124-d51a-4228-80bc-7a9464d7971a)
> Oct 30 13:33:07 wwwlb02 lrmd: [6412]: info: rsc:cl_stonith_lb02:0: start
> Oct 30 13:33:07 wwwlb02 lrmd: [30788]: info: Try to start STONITH
> resource <rsc_id=cl_stonith_lb02:0> : Device=external/riloe
> Oct 30 13:33:07 wwwlb02 stonithd: [6413]: info: Cannot get parameter
> ilo_can_reset from StonithNVpair
> Oct 30 13:33:07 wwwlb02 stonithd: [6413]: info: Cannot get parameter
> ilo_protocol from StonithNVpair
> Oct 30 13:33:07 wwwlb02 stonithd: [6413]: info: Cannot get parameter
> ilo_powerdown_method from StonithNVpair
> Oct 30 13:33:08 wwwlb02 heartbeat: [6202]: info: Link
> wwwlb01.microcenter.com:eth0 dead.
> Oct 30 13:33:08 wwwlb02 pingd: [8475]: notice: pingd_lstatus_callback:
> Status update: Ping node wwwlb01.microcenter.com now has status [dead]
> Oct 30 13:33:08 wwwlb02 pingd: [8475]: notice: pingd_nstatus_callback:
> Status update: Ping node wwwlb01.microcenter.com now has status [dead]
> Oct 30 13:33:12 wwwlb02 stonithd: [30790]: WARN: host list for
> cl_stonith_lb02:0 is empty, please fix your constraints
> Oct 30 13:33:12 wwwlb02 stonithd: [6413]: WARN: start cl_stonith_lb02:0
> failed, because its hostlist is empty
> Oct 30 13:33:12 wwwlb02 crmd: [6415]: info: process_lrm_event: LRM
> operation cl_stonith_lb02:0_start_0 (call=12, rc=2) complete
> Oct 30 13:33:13 wwwlb02 lrmd: [6412]: info: rsc:cl_stonith_lb02:0: stop
> Oct 30 13:33:13 wwwlb02 stonithd: [6413]: notice: try to stop a resource
> cl_stonith_lb02:0 who is not in started resource queue.
> Oct 30 13:33:13 wwwlb02 crmd: [6415]: info: do_lrm_rsc_op: Performing
> op=cl_stonith_lb02:0_stop_0
> key=1:8:0:efbdb124-d51a-4228-80bc-7a9464d7971a)
> Oct 30 13:33:13 wwwlb02 lrmd: [30842]: info: Try to stop STONITH
> resource <rsc_id=cl_stonith_lb02:0> : Device=external/riloe
> Oct 30 13:33:13 wwwlb02 crmd: [6415]: info: process_lrm_event: LRM
> operation cl_stonith_lb02:0_stop_0 (call=13, rc=0) complete
>
>
>
> Looks like I should specify from additional nvpair's for the ilo's. The
> WARN host list empty message is what looks bad to me. Here is the cib
> section for the clone resource and the cib constraint for this resource.
> Please let me know if there is some obvious errors in this
> configuration. This is the stonith resource that is to shoot the 02
> node, intended to run on the 01 node (the 01 node was the node who had a
> network cable disconnect).
>
>
> <clone id="cl_stonithset_lb02">
> <meta_attributes id="cl_stonithset_lb02_meta_attrs">
> <attributes>
> <nvpair id="cl_stonithset_lb02_metaattr_target_role"
> name="target_role" value="started"/>
> <nvpair id="cl_stonithset_lb02_metaattr_clone_max"
> name="clone_max" value="1"/>
> <nvpair id="cl_stonithset_lb02_metaattr_clone_node_max"
> name="clone_node_max" value="1"/>
> </attributes>
> </meta_attributes>
> <primitive id="cl_stonith_lb02" class="stonith"
> type="external/riloe" provider="heartbeat">
> <instance_attributes id="cl_stonith_lb02_instance_attrs">
> <attributes>
> <nvpair id="76163fb5-05ea-4cff-9786-a817774d8224"
> name="hostlist" value="wwwlb02.microcenter.com"/>
> <nvpair id="238e0158-81d3-48fd-879a-494c76d96b80"
> name="ilo_hostname" value="10.100.254.162"/>
> <nvpair id="82de3d5d-6f96-44f0-b98f-6eea75704b33"
> name="ilo_user" value="Administrator"/>
> <nvpair id="0fdef60a-fe62-4a0d-8f8f-d8da1d42082a"
> name="ilo_password" value="PASSWORD"/>
> </attributes>
> </instance_attributes>
> <operations>
> <op id="2a33ffe8-371f-4d08-a1ea-373135e85aeb"
> name="monitor" interval="30" timeout="20" start_delay="15"
> disabled="false" role="Started" on_fail="restart"/>
> <op id="4694393c-e89b-4371-af1c-a60d7f305e2f" name="start"
> timeout="20" start_delay="0" disabled="false" role="Started"
> on_fail="restart"/>
> </operations>
> <meta_attributes id="cl_stonith_lb02:0_meta_attrs">
> <attributes>
> <nvpair id="cl_stonith_lb02:0_metaattr_target_role"
> name="target_role" value="started"/>
> </attributes>
> </meta_attributes>
> </primitive>
> </clone>
>
> <constraints>
> <rsc_location id="location_on_lb01" rsc="cl_stonithset_lb02">
> <rule id="prefered_location_on_lb01" score="INFINITY">
> <expression attribute="#uname"
> id="c9e30917-97e2-4c35-86e7-9df6c7abc497" operation="eq"
> value="wwwlb01.microcenter.com"/>
> </rule>
> </rsc_location>
> </constraints>
>
> Thanks,
> -ab
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
More information about the Pacemaker
mailing list