[ClusterLabs] Help required for N+1 redundancy setup

Fri Jan 8 22:01:48 UTC 2016

On 01/08/2016 11:13 AM, Nikhil Utane wrote:
>> I think stickiness will do what you want here. Set a stickiness higher
>> than the original node's preference, and the resource will want to stay
>> where it is.
> 
> Not sure I understand this. Stickiness will ensure that resources don't
> move back when original node comes back up, isn't it?
> But in my case, I want the newly standby node to become the backup node for
> all other nodes. i.e. it should now be able to run all my resource groups
> albeit with a lower score. How do I achieve that?

Oh right. I forgot to ask whether you had an opt-out
(symmetric-cluster=true, the default) or opt-in
(symmetric-cluster=false) cluster. If you're opt-out, every node can run
every resource unless you give it a negative preference.

Partly it depends on whether there is a good reason to give each
instance a "home" node. Often, there's not. If you just want to balance
resources across nodes, the cluster will do that by default.

If you prefer to put certain resources on certain nodes because the
resources require more physical resources (RAM/CPU/whatever), you can
set node attributes for that and use rules to set node preferences.

Either way, you can decide whether you want stickiness with it.

> Also can you answer, how to get the values of node that goes active and the
> node that goes down inside the OCF agent?  Do I need to use notification or
> some simpler alternative is available?
> Thanks.
> 
> 
> On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> 
>> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
>>> Would like to validate my final config.
>>>
>>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
>>> standby server.
>>> The standby server should take up the role of active that went down. Each
>>> active has some unique configuration that needs to be preserved.
>>>
>>> 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2
>>> resource (for virtual IP) and my custom resource.
>>> 2) The virtual IP needs to be read inside my custom OCF agent, so I will
>>> make use of attribute reference and point to the value of IPaddr2 inside
>> my
>>> custom resource to avoid duplication.
>>> 3) I will then configure location constraint to run the group resource
>> on 5
>>> active nodes with higher score and lesser score on standby.
>>> For e.g.
>>> Group              Node            Score
>>> ---------------------------------------------
>>> MyGroup1        node1           500
>>> MyGroup1        node6           0
>>>
>>> MyGroup2        node2           500
>>> MyGroup2        node6           0
>>> ..
>>> MyGroup5        node5           500
>>> MyGroup5        node6           0
>>>
>>> 4) Now if say node1 were to go down, then stop action on node1 will first
>>> get called. Haven't decided if I need to do anything specific here.
>>
>> To clarify, if node1 goes down intentionally (e.g. standby or stop),
>> then all resources on it will be stopped first. But if node1 becomes
>> unavailable (e.g. crash or communication outage), it will get fenced.
>>
>>> 5) But when the start action of node 6 gets called, then using crm
>> command
>>> line interface, I will modify the above config to swap node 1 and node 6.
>>> i.e.
>>> MyGroup1        node6           500
>>> MyGroup1        node1           0
>>>
>>> MyGroup2        node2           500
>>> MyGroup2        node1           0
>>>
>>> 6) To do the above, I need the newly active and newly standby node names
>> to
>>> be passed to my start action. What's the best way to get this information
>>> inside my OCF agent?
>>
>> Modifying the configuration from within an agent is dangerous -- too
>> much potential for feedback loops between pacemaker and the agent.
>>
>> I think stickiness will do what you want here. Set a stickiness higher
>> than the original node's preference, and the resource will want to stay
>> where it is.
>>
>>> 7) Apart from node name, there will be other information which I plan to
>>> pass by making use of node attributes. What's the best way to get this
>>> information inside my OCF agent? Use crm command to query?
>>
>> Any of the command-line interfaces for doing so should be fine, but I'd
>> recommend using one of the lower-level tools (crm_attribute or
>> attrd_updater) so you don't have a dependency on a higher-level tool
>> that may not always be installed.
>>
>>> Thank You.
>>>
>>> On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane <
>> nikhil.subscribed at gmail.com>
>>> wrote:
>>>
>>>> Thanks to you Ken for giving all the pointers.
>>>> Yes, I can use service start/stop which should be a lot simpler. Thanks
>>>> again. :)
>>>>
>>>> On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot <kgaillot at redhat.com>
>> wrote:
>>>>
>>>>> On 12/22/2015 12:17 AM, Nikhil Utane wrote:
>>>>>> I have prepared a write-up explaining my requirements and current
>>>>> solution
>>>>>> that I am proposing based on my understanding so far.
>>>>>> Kindly let me know if what I am proposing is good or there is a better
>>>>> way
>>>>>> to achieve the same.
>>>>>>
>>>>>>
>>>>>
>> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing
>>>>>>
>>>>>> Let me know if you face any issue in accessing the above link. Thanks.
>>>>>
>>>>> This looks great. Very well thought-out.
>>>>>
>>>>> One comment:
>>>>>
>>>>> "8. In the event of any failover, the standby node will get notified
>>>>> through an event and it will execute a script that will read the
>>>>> configuration specific to the node that went down (again using
>>>>> crm_attribute) and become active."
>>>>>
>>>>> It may not be necessary to use the notifications for this. Pacemaker
>>>>> will call your resource agent with the "start" action on the standby
>>>>> node, after ensuring it is stopped on the previous node. Hopefully the
>>>>> resource agent's start action has (or can have, with configuration
>>>>> options) all the information you need.
>>>>>
>>>>> If you do end up needing notifications, be aware that the feature will
>>>>> be disabled by default in the 1.1.14 release, because changes in syntax
>>>>> are expected in further development. You can define a compile-time
>>>>> constant to enable them.