[Pacemaker] Resources don't start on second node afterping fails

Mon Apr 12 05:23:17 UTC 2010

Hi Marco,

No, the physical connection was ok. The DRBD-devices weren't connected as a result of a split-brain situation I created with a previous test case. I simply didn't check and recognize that. To fix that I had to connect them via drbdadm (see http://www.drbd.org/users-guide-emb/s-resolve-split-brain.html ).

I don't think it was up to the "number"-thing as it was the same mistake with and without the "number:lte" but I'll check that and post the results here.
Greets
Benjamin

-----Ursprüngliche Nachricht-----
Von: Marco van Putten [mailto:marco.vanputten at tudelft.nl]
Gesendet: Sa 10.04.2010 00:19
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Resources don't start on second node afterping	fails

Hi Benjamin,

Congratulations!
Do you mean not connected as in physicly not connected?

I'm no expert on the matter but I just ran into the "number" problem a 
couple of weeks ago myself.
Maybe in a newer version this is no longer an issue...

Bye,
Marco.

Benjamin.Benz at t-systems.com wrote:
> Hi everybody!
>
> I fixed this 'problem'... 
> My drbd-resource wasn't connected. m(
> The configuration of the ping resource and location were correct. I implemented Marco's advice but I'm sure my solution would've also worked.
> The failover works just fine right now.
>
> Thanks for reading!
> Benjamin Benz
>
>
> -----Ursprüngliche Nachricht-----
> Von: Benz, Benjamin
> Gesendet: Do 08.04.2010 14:46
> An: pacemaker at oss.clusterlabs.org
> Betreff: [Pacemaker] Resources don't start on second node after ping fails
>  
> Hi there!
>
> I've got a problem with the configuration.
> I'm using Pacemaker 1.0.7 to move my database from node1 to node2. Everything works fine when I migrate the resources manually or pull out the power plug.
> Since I want the database to be available in case of network problems I tried to integrate a ping resource as you can see below.
> When I pull out the network cable the resources stop on node1 but don't start on node2.
>
> crm_mon output:
>
> Online: [ bb-node1 bb-node2 ]
>
>  Master/Slave Set: ms_drbd_ora
>      Slaves: [ bb-node2 ]
>      Stopped: [ drbd_ora:1 ]
>  Clone Set: connected
>      Started: [ bb-node1 bb-node2 ]
>
>
> I guess there's something wrong with my configuration of the location but I can't figure it out.
> It would be great if someone could help me out!
>
> If you have other helpful hints concerning my config feel free to answer!
>
> Regards
> Benjamin Benz
>
>
> crm configure show:
>
> node $id="d109b732-1cfc-4cd8-9cce-ba9323a56087" bb-node2
> node $id="f995b3ac-734f-4cc4-aacb-cbec22e48de5" bb-node1
> primitive drbd_ora ocf:linbit:drbd \
> 	params drbd_resource="ora" \
> 	op monitor interval="5s" timeout="20s" on-fail="restart"
> primitive fs_ora ocf:heartbeat:Filesystem \
> 	params device="/dev/drbd0" directory="/oracle" fstype="ext3" \
> 	op monitor interval="5s" timeout="40s" on-fail="restart"
> primitive ip_ora ocf:heartbeat:IPaddr2 \
> 	params ip="53.113.178.29" cidr_netmask="255.255.255.0" \
> 	op monitor interval="5s" timeout="20s" on-fail="restart"
> primitive oracle_ora ocf:heartbeat:oracle \
> 	params home="/oracle" sid="bbcluster" user="oracle" ipcrm="orauser" \
> 	op monitor interval="5s" timeout="30s" on-fail="restart"
> primitive oralsnr_ora ocf:heartbeat:oralsnr \
> 	params home="/oracle" sid="bbcluster" user="oracle" \
> 	op monitor interval="5s" timeout="30s" on-fail="restart"
> primitive ping ocf:pacemaker:ping \
> 	params dampen="5s" host_list="53.118.160.121" multiplier="1000" name="pingval" \
> 	operations $id="ping-operations" \
> 	op monitor interval="10s" timeout="10s"
> group ora_group fs_ora ip_ora oralsnr_ora oracle_ora \
> 	meta target-role="Started"
> ms ms_drbd_ora drbd_ora \
> 	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
> clone connected ping \
> 	meta globally-unique="false" target-role="Started"
>
> location ms_drbd_ora_on_connected_node ms_drbd_ora \
> 	rule $id="ms_drbd_ora_on_connected_node-rule" -inf: not_defined pingval or pingval lte 0
>
> colocation ora_group_on_ms_drbd_ora inf: ora_group ms_drbd_ora:Master
> order ms_drbd_ora_before_ora_group inf: ms_drbd_ora:promote ora_group:start
> property $id="cib-bootstrap-options" \
> 	dc-version="1.0.7-6e1815972fc236825bf3658d7f8451d33227d420" \
> 	cluster-infrastructure="Heartbeat" \
> 	no-quorum-policy="ignore" \
> 	stonith-enabled="false" \
> 	last-lrm-refresh="1270732011"
-------------- n?chster Teil --------------
Ein Dateianhang mit Bin?rdaten wurde abgetrennt...
Dateiname   : nicht verf?gbar
Dateityp    : application/ms-tnef
Dateigr??e  : 5165 bytes
Beschreibung: nicht verf?gbar
URL         : <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100412/babf5850/attachment.bin>