[Pacemaker] Preventing Automatic Failback

Michael Monette mmonette at 2keys.ca
Tue Jan 21 16:24:35 UTC 2014


Also one final thing I want to add. 

Corosync and pacemaker are enabled with chkconfig. So a hard reboot is esentually restarting the services too. The moment pacemaker is started at boot, this happens.  (Although I've tried disabling and manually starting the services after I recover the server and same issue)

Thanks again

Mike

David Vossel <dvossel at redhat.com> wrote:
>----- Original Message -----
>> From: "Michael Monette" <mmonette at 2keys.ca>
>> To: pacemaker at oss.clusterlabs.org
>> Sent: Monday, January 20, 2014 8:22:25 AM
>> Subject: [Pacemaker] Preventing Automatic Failback
>> 
>> Hi,
>> 
>> I posted this question before but my question was a bit unclear.
>> 
>> I have 2 nodes with DRBD with Postgresql.
>> 
>> When node-1 fails, everything fails to node-2 . But when node 1 is
>recovered,
>> things try to failback to node-1 and all the services running on
>node-2 get
>> disrupted(things don't ACTUALLY fail back to node-1..they try, fail,
>and
>> then all services on node-2 are simply restarted..very annoying).
>This does
>> not happen if I perform the same tests on node-2! I can reboot
>node-2,
>> things fail to node-1 and node-2 comes online and waits until he is
>> needed(this is what I want!) It seems to only affect my node-1's.
>> 
>> I have tried to set resource stickiness, I have tried everything I
>can really
>> think of, but whenever the Primary has recovered, it will always
>disrupt
>> services running on node-2.
>> 
>> Also I tried removing things from this config to try and isolate
>this. At one
>> point I removed the atlassian_jira and drbd2_var primitives and only
>had a
>> failover-ip and drbd1_opt, but still had the same problem. Hopefully
>someone
>> can pinpoint this out for me. If I can't really avoid this, I would
>at least
>> like to make this "bug" or whatever happen on node-2 instead of the
>actives.
>
>I bet this is due to the drbd resource's master score value on node1
>being higher than node2.  When you recover node1, are you actually
>rebooting that node?  If node1 doesn't lose membership from the cluster
>(reboot), those transient attributes that the drbd agent uses to
>specify which node will be the master instance will stick around. 
>Otherwise if you are just putting node1 in standby and then bringing
>the node back online, the I believe the resources will come back if the
>drbd master was originally on node1.
>
>If you provide a policy engine file that shows the unwanted transition
>from node2 back to node1, we'll be able to tell you exactly why it is
>occurring.
>
>-- Vossel
>
>
>> 
>> Here is my config:
>> 
>> node node-1.comp.com \
>>         attributes standby="off"
>> node node-1.comp.com \
>>         attributes standby="off"
>> primitive atlassian_jira lsb:jira \
>>         op start interval="0" timeout="240" \
>>         op stop interval="0" timeout="240"
>> primitive drbd1_opt ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd1" directory="/opt/atlassian"
>fstype="ext4"
>> primitive drbd2_var ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd2" directory="/var/atlassian"
>fstype="ext4"
>> primitive drbd_data ocf:linbit:drbd \
>>         params drbd_resource="r0" \
>>         op monitor interval="29s" role="Master" \
>>         op monitor interval="31s" role="Slave"
>> primitive failover-ip ocf:heartbeat:IPaddr2 \
>>         params ip="10.199.0.13"
>> group jira_services drbd1_opt drbd2_var failover-ip atlassian_jira
>> ms ms_drbd_data drbd_data \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>>         clone-node-max="1" notify="true"
>> colocation jira_services_on_drbd inf: atlassian_jira
>ms_drbd_data:Master
>> order jira_services_after_drbd inf: ms_drbd_data:promote
>jira_services:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.1.10-14.el6_5.1-368c726" \
>>         cluster-infrastructure="classic openais (with plugin)" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore" \
>>         last-lrm-refresh="1390183165" \
>>         default-resource-stickiness="INFINITY"
>> rsc_defaults $id="rsc-options" \
>>         resource-stickiness="INFINITY"
>> 
>> Thanks
>> 
>> Mike
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140121/7af177d7/attachment.htm>


More information about the Pacemaker mailing list