[Pacemaker] Resources don't start on second node afterping fails
Benjamin.Benz at t-systems.com
Benjamin.Benz at t-systems.com
Mon Apr 12 05:23:17 UTC 2010
Hi Marco,
No, the physical connection was ok. The DRBD-devices weren't connected as a result of a split-brain situation I created with a previous test case. I simply didn't check and recognize that. To fix that I had to connect them via drbdadm (see http://www.drbd.org/users-guide-emb/s-resolve-split-brain.html ).
I don't think it was up to the "number"-thing as it was the same mistake with and without the "number:lte" but I'll check that and post the results here.
Greets
Benjamin
-----Ursprüngliche Nachricht-----
Von: Marco van Putten [mailto:marco.vanputten at tudelft.nl]
Gesendet: Sa 10.04.2010 00:19
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Resources don't start on second node afterping fails
Hi Benjamin,
Congratulations!
Do you mean not connected as in physicly not connected?
I'm no expert on the matter but I just ran into the "number" problem a
couple of weeks ago myself.
Maybe in a newer version this is no longer an issue...
Bye,
Marco.
Benjamin.Benz at t-systems.com wrote:
> Hi everybody!
>
> I fixed this 'problem'...
> My drbd-resource wasn't connected. m(
> The configuration of the ping resource and location were correct. I implemented Marco's advice but I'm sure my solution would've also worked.
> The failover works just fine right now.
>
> Thanks for reading!
> Benjamin Benz
>
>
> -----Ursprüngliche Nachricht-----
> Von: Benz, Benjamin
> Gesendet: Do 08.04.2010 14:46
> An: pacemaker at oss.clusterlabs.org
> Betreff: [Pacemaker] Resources don't start on second node after ping fails
>
> Hi there!
>
> I've got a problem with the configuration.
> I'm using Pacemaker 1.0.7 to move my database from node1 to node2. Everything works fine when I migrate the resources manually or pull out the power plug.
> Since I want the database to be available in case of network problems I tried to integrate a ping resource as you can see below.
> When I pull out the network cable the resources stop on node1 but don't start on node2.
>
> crm_mon output:
>
> Online: [ bb-node1 bb-node2 ]
>
> Master/Slave Set: ms_drbd_ora
> Slaves: [ bb-node2 ]
> Stopped: [ drbd_ora:1 ]
> Clone Set: connected
> Started: [ bb-node1 bb-node2 ]
>
>
> I guess there's something wrong with my configuration of the location but I can't figure it out.
> It would be great if someone could help me out!
>
> If you have other helpful hints concerning my config feel free to answer!
>
> Regards
> Benjamin Benz
>
>
> crm configure show:
>
> node $id="d109b732-1cfc-4cd8-9cce-ba9323a56087" bb-node2
> node $id="f995b3ac-734f-4cc4-aacb-cbec22e48de5" bb-node1
> primitive drbd_ora ocf:linbit:drbd \
> params drbd_resource="ora" \
> op monitor interval="5s" timeout="20s" on-fail="restart"
> primitive fs_ora ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/oracle" fstype="ext3" \
> op monitor interval="5s" timeout="40s" on-fail="restart"
> primitive ip_ora ocf:heartbeat:IPaddr2 \
> params ip="53.113.178.29" cidr_netmask="255.255.255.0" \
> op monitor interval="5s" timeout="20s" on-fail="restart"
> primitive oracle_ora ocf:heartbeat:oracle \
> params home="/oracle" sid="bbcluster" user="oracle" ipcrm="orauser" \
> op monitor interval="5s" timeout="30s" on-fail="restart"
> primitive oralsnr_ora ocf:heartbeat:oralsnr \
> params home="/oracle" sid="bbcluster" user="oracle" \
> op monitor interval="5s" timeout="30s" on-fail="restart"
> primitive ping ocf:pacemaker:ping \
> params dampen="5s" host_list="53.118.160.121" multiplier="1000" name="pingval" \
> operations $id="ping-operations" \
> op monitor interval="10s" timeout="10s"
> group ora_group fs_ora ip_ora oralsnr_ora oracle_ora \
> meta target-role="Started"
> ms ms_drbd_ora drbd_ora \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
> clone connected ping \
> meta globally-unique="false" target-role="Started"
>
> location ms_drbd_ora_on_connected_node ms_drbd_ora \
> rule $id="ms_drbd_ora_on_connected_node-rule" -inf: not_defined pingval or pingval lte 0
>
> colocation ora_group_on_ms_drbd_ora inf: ora_group ms_drbd_ora:Master
> order ms_drbd_ora_before_ora_group inf: ms_drbd_ora:promote ora_group:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.7-6e1815972fc236825bf3658d7f8451d33227d420" \
> cluster-infrastructure="Heartbeat" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> last-lrm-refresh="1270732011"
-------------- n?chster Teil --------------
Ein Dateianhang mit Bin?rdaten wurde abgetrennt...
Dateiname : nicht verf?gbar
Dateityp : application/ms-tnef
Dateigr??e : 5165 bytes
Beschreibung: nicht verf?gbar
URL : <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100412/babf5850/attachment.bin>
More information about the Pacemaker
mailing list