[Pacemaker] R: R: R: R: Stonith external/sbd problem

Mon May 10 10:39:03 UTC 2010

Hi,

On Mon, May 10, 2010 at 09:21:03AM +0200, Nicola Sabatelli wrote:
> Hi,
> 
> I have solved my problem.
> 
> I find a little problem in the script ‘/usr/lib64/stonith/plugins/external/sbd’ when it retrieve the hosts list.
> 
> I substitute this lines:
> 
>  
> 
> nodes=$(
> 
> if is_heartbeat; then
> 
>     crm_node -H -p
> 
> else
> 
>     crm_node -p
> 
> fi)
> 
>  
> 
> Whit these:
> 
>  
> 
> if is_heartbeat; then
> 
>     nodes=$(crm_node -H -p)
> 
> else
> 
>     nodes=$(crm_node -p)
> 
> fi

Fixed now.

Cheers,

Dejan

>  
> 
> and now the resource ‘external/sbd’ function very well.
> 
>  
> 
>  
> 
>  
> 
> Best regards, Nicola.
> 
>  
> 
>   _____  
> 
> Da: Michael Brown [mailto:michael at netdirect.ca] 
> Inviato: giovedì 29 aprile 2010 16.53
> A: n.sabatelli at ct.rupar.puglia.it
> Oggetto: Re: R: [Pacemaker] R: R: Stonith external/sbd problem
> 
>  
> 
> Hrm, my limited knowledge is exhausted. Good luck!
> 
> M.
> 
>   _____  
> 
> From: Nicola Sabatelli 
> To: 'Michael Brown' 
> Sent: Thu Apr 29 10:36:15 2010
> Subject: R: [Pacemaker] R: R: Stonith external/sbd problem
> 
> The response to a query
> 
> /usr/sbin/sbd -d /dev/mapper/mpath1p1 list
> 
> is
> 
> 0       clover-a.rsr.rupar.puglia.it    clear
> 
> 1       clover-h.rsr.rupar.puglia.it    clear
> 
>  
> 
>  
> 
> Ciao, Nicola.
> 
>   _____  
> 
> Da: Michael Brown [mailto:michael at netdirect.ca] 
> Inviato: giovedì 29 aprile 2010 16.33
> A: The Pacemaker cluster resource manager
> Cc: Nicola Sabatelli
> Oggetto: Re: [Pacemaker] R: R: Stonith external/sbd problem
> 
>  
> 
> FWIW, here's my setup for sbd on shared storage:
> 
> in /etc/init.d/boot.local:
> sbd -d /dev/disk/by-id/dm-uuid-part2-mpath-3600a0b8000266f7e000035414bd00428 -D -W watch
> 
> xenhost1:~ # sbd -d /dev/disk/by-id/dm-uuid-part2-mpath-3600a0b8000266f7e000035414bd00428 list
> 0       xenhost1        clear
> 1       xenhost2        clear
> 
> excerpt from 'crm configure show':
> primitive sbd stonith:external/sbd \
>         operations $id="sbd-operations" \
>         op monitor interval="15" timeout="15" start-delay="15" \
>         params sbd_device="/dev/disk/by-id/dm-uuid-part2-mpath-3600a0b8000266f7e000035414bd00428"
> clone sbd-clone sbd \
>         meta interleave="true"
> 
> What do you see if you run '/usr/sbin/sbd -d /dev/mapper/mpath1p1 list'?
> 
> M.
> 
> On 04/29/2010 10:23 AM, Nicola Sabatelli wrote: 
> 
> Yes, I create the disk and allocate the node, and I create a resource on cluster in this way:
> 
> <clone id="cl_external_sbd_1">
> 
>         <meta_attributes id="cl_external_sbd_1-meta_attributes">
> 
>           <nvpair id="cl_external_sbd_1-meta_attributes-clone-max" name="clone-max" value="2"/>
> 
>         </meta_attributes>
> 
>         <primitive class="stonith" type="external/sbd" id="stonith_external_sbd_LOCK_LUN">
> 
>           <instance_attributes id="stonith_external_sbd_LOCK_LUN-instance_attributes">
> 
>             <nvpair id="nvpair-stonith_external_sbd_LOCK_LUN-sbd_device" name="sbd_device" value="/dev/mapper/mpath1p1"/>
> 
>           </instance_attributes>
> 
>           <operations id="stonith_external_sbd_LOCK_LUN-operations">
> 
>             <op id="op-stonith_external_sbd_LOCK_LUN-stop" interval="0" name="stop" timeout="60"/>
> 
>             <op id="op-stonith_external_sbd_LOCK_LUN-monitor" interval="60" name="monitor" start-delay="0" timeout="60"/>
> 
>             <op id="op-stonith_external_sbd_LOCK_LUN-start" interval="0" name="start" timeout="60"/>
> 
>           </operations>
> 
>           <meta_attributes id="stonith_external_sbd_LOCK_LUN-meta_attributes">
> 
>             <nvpair name="target-role" id="stonith_external_sbd_LOCK_LUN-meta_attributes-target-role" value="stopped"/>
> 
>           </meta_attributes>
> 
>         </primitive>
> 
>       </clone>
> 
>  
> 
>  
> 
> Ciao, Nicola.
> 
>   _____  
> 
> Da: Vit Pelcak [mailto:vpelcak at suse.cz] 
> Inviato: giovedì 29 aprile 2010 16.08
> A: pacemaker at oss.clusterlabs.org
> Oggetto: Re: [Pacemaker] R: Stonith external/sbd problem
> 
>  
> 
> Also, it is needed to add stonith to cib:
> 
> crm configure primitive sbd_stonith stonith:external/sbd meta target-role="Started" op monitor interval="15" timeout="15" start-delay="15" params sbd_device="/dev/sda1"
> 
> 
> Dne 29.4.2010 15:46, Nicola Sabatelli napsal(a): 
> 
> I have done exactly the configuration in the SBD_Fencing documentation.
> 
> That is:
> 
> /etc/sysconfig/sbd
> 
> SBD_DEVICE="/dev/mapper/mpath1p1"
> 
> SBD_OPTS="-W"
> 
> And I start the demon in this manner:
> 
> /usr/sbin/sbd -d /dev/mapper/mpath1p1 -D -W watch
> 
> Is correct?
> 
>  
> 
> Ciao, Nicola.
> 
>   _____  
> 
> Da: Vit Pelcak [mailto:vpelcak at suse.cz] 
> Inviato: giovedì 29 aprile 2010 15.02
> A: pacemaker at oss.clusterlabs.org
> Oggetto: Re: [Pacemaker] Stonith external/sbd problem
> 
>  
> 
> cat /etc/sysconfig/sbd
> 
> SBD_DEVICE="/dev/sda1"
> SBD_OPTS="-W"
> 
> 
> sbd -d /dev/shared_disk create
> sbd -d /dev/shared_disk allocate your_machine
> 
> 
> Dne 29.4.2010 14:55, Michael Brown napsal(a): 
> 
> Oh, I forgot a piece: I had simular trouble until I actually properly started sbd and then it worked.
> 
> M.
> 
>   _____  
> 
> From: Michael Brown 
> To: pacemaker at oss.clusterlabs.org 
> Sent: Thu Apr 29 08:53:32 2010
> Subject: Re: [Pacemaker] Stonith external/sbd problem 
> 
> 
> 
> 
> I just set this up myself and it worked fine for me.
> 
> Did you follow the guide? You need to configure the sbd daemon to run on bootup with appropriate options before external/sbd can use it.
> 
> M.
> 
>   _____  
> 
> From: Nicola Sabatelli 
> To: pacemaker at oss.clusterlabs.org 
> Sent: Thu Apr 29 08:47:04 2010
> Subject: [Pacemaker] Stonith external/sbd problem 
> 
> 
> I have a problem with STONITH plugin external/sbd.
> 
> I have configured the system in according to directive that I find at url http://www.linux-ha.org/wiki/SBD_Fencing, and the device that I use is configured with multipath software because this disk is residend on a storage system.
> 
> I have create a resurse on my cluster using clove directive.
> 
> But when I try to start the resurse I have these errors:
> 
>  
> 
> from ha-log file:
> 
>  
> 
> Apr 29 14:37:51 clover-h stonithd: [16811]: info: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/sbd status' returned 256
> 
> Apr 29 14:37:51 clover-h stonithd: [16811]: CRIT: external_status: 'sbd status' failed with rc 256
> 
> Apr 29 14:37:51 clover-h stonithd: [10615]: WARN: start stonith_external_sbd_LOCK_LUN:0 failed, because its hostlist is empty
> 
>  
> 
> from crm_verify:
> 
>  
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: main: Reading XML from: live cluster
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: unpack_config: On loss of CCM Quorum: Ignore
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: determine_online_status: Node clover-a.rsr.rupar.puglia.it is online
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: unpack_rsc_op: Processing failed op stonith_external_sbd_LOCK_LUN:1_start_0 on clover-a.rsr.rupar.puglia.it: unknown error (1)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: find_clone: Internally renamed stonith_external_sbd_LOCK_LUN:0 on clover-a.rsr.rupar.puglia.it to stonith_external_sbd_LOCK_LUN:2 (ORPHAN)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: determine_online_status: Node clover-h.rsr.rupar.puglia.it is online
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: unpack_rsc_op: Processing failed op stonith_external_sbd_LOCK_LUN:0_start_0 on clover-h.rsr.rupar.puglia.it: unknown error (1)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: clone_print:  Master/Slave Set: ms_drbd_1
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: short_print:      Stopped: [ res_drbd_1:0 res_drbd_1:1 ]
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print: res_Filesystem_TEST        (ocf::heartbeat:Filesystem):    Stopped
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print: res_IPaddr2_ip_clover      (ocf::heartbeat:IPaddr2):       Stopped
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: clone_print:  Clone Set: cl_external_sbd_1
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print:      stonith_external_sbd_LOCK_LUN:0       (stonith:external/sbd): Started clover-h.rsr.rupar.puglia.it FAILED
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: native_print:      stonith_external_sbd_LOCK_LUN:1       (stonith:external/sbd): Started clover-a.rsr.rupar.puglia.it FAILED
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: get_failcount: cl_external_sbd_1 has failed 1000000 times on clover-h.rsr.rupar.puglia.it
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: common_apply_stickiness: Forcing cl_external_sbd_1 away from clover-h.rsr.rupar.puglia.it after 1000000 failures (max=1000000)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: get_failcount: cl_external_sbd_1 has failed 1000000 times on clover-a.rsr.rupar.puglia.it
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: common_apply_stickiness: Forcing cl_external_sbd_1 away from clover-a.rsr.rupar.puglia.it after 1000000 failures (max=1000000)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: native_merge_weights: ms_drbd_1: Rolling back scores from res_Filesystem_TEST
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_drbd_1:0 cannot run anywhere
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_drbd_1:1 cannot run anywhere
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: native_merge_weights: ms_drbd_1: Rolling back scores from res_Filesystem_TEST
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: master_color: ms_drbd_1: Promoted 0 instances of a possible 1 to master
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: master_color: ms_drbd_1: Promoted 0 instances of a possible 1 to master
> 
> crm_verify[18607]: 2010/04/29_14:39:27 info: native_merge_weights: res_Filesystem_TEST: Rolling back scores from res_IPaddr2_ip_clover
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_Filesystem_TEST cannot run anywhere
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource res_IPaddr2_ip_clover cannot run anywhere
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource stonith_external_sbd_LOCK_LUN:0 cannot run anywhere
> 
> crm_verify[18607]: 2010/04/29_14:39:27 WARN: native_color: Resource stonith_external_sbd_LOCK_LUN:1 cannot run anywhere
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_drbd_1:0  (Stopped)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_drbd_1:1  (Stopped)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_Filesystem_TEST   (Stopped)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Leave resource res_IPaddr2_ip_clover (Stopped)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Stop resource stonith_external_sbd_LOCK_LUN:0        (clover-h.rsr.rupar.puglia.it)
> 
> crm_verify[18607]: 2010/04/29_14:39:27 notice: LogActions: Stop resource stonith_external_sbd_LOCK_LUN:1        (clover-a.rsr.rupar.puglia.it)
> 
> Warnings found during check: config may not be valid
> 
>  
> 
> and from crm_mon:
> 
>  
> 
> ============
> 
> Last updated: Thu Apr 29 14:39:57 2010
> 
> Stack: Heartbeat
> 
> Current DC: clover-h.rsr.rupar.puglia.it (e39bb201-2a6f-457a-a308-be6bfe71309c) - partition with quorum
> 
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 
> 2 Nodes configured, unknown expected votes
> 
> 4 Resources configured.
> 
> ============
> 
>  
> 
> Online: [ clover-h.rsr.rupar.puglia.it clover-a.rsr.rupar.puglia.it ]
> 
>  
> 
>  Clone Set: cl_external_sbd_1
> 
>      stonith_external_sbd_LOCK_LUN:0    (stonith:external/sbd): Started clover-h.rsr.rupar.puglia.it FAILED
> 
>      stonith_external_sbd_LOCK_LUN:1    (stonith:external/sbd): Started clover-a.rsr.rupar.puglia.it FAILED
> 
>  
> 
> Operations:
> 
> * Node clover-a.rsr.rupar.puglia.it:
> 
>    stonith_external_sbd_LOCK_LUN:1: migration-threshold=1000000 fail-count=1000000
> 
>     + (24) start: rc=1 (unknown error)
> 
> * Node clover-h.rsr.rupar.puglia.it:
> 
>    stonith_external_sbd_LOCK_LUN:0: migration-threshold=1000000 fail-count=1000000
> 
>     + (25) start: rc=1 (unknown error)
> 
>  
> 
> Failed actions:
> 
>     stonith_external_sbd_LOCK_LUN:1_start_0 (node=clover-a.rsr.rupar.puglia.it, call=24, rc=1, status=complete): unknown error
> 
>     stonith_external_sbd_LOCK_LUN:0_start_0 (node=clover-h.rsr.rupar.puglia.it, call=25, rc=1, status=complete): unknown error
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Ciao, Nicola.
> 
>  
>  
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>  
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
>  
> 
>  
>  
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>  
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
>  
> 
>  
>  
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>  
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
> 
> 
> 
> 
> -- 
> Michael Brown               | `One of the main causes of the fall of
> Systems Consultant          | the Roman Empire was that, lacking zero,
> Net Direct Inc.             | they had no way to indicate successful
> ☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf