[Pacemaker] stickiness weirdness please explain

Thu Feb 24 10:38:03 UTC 2011

Hi,

On 02/23/2011 06:19 PM, Jelle de Jong wrote:
> Dear Dan,
>
> Thank you for taking the time to read and answer my question.
>
> On 23-02-11 09:42, Dan Frincu wrote:
>> This is something that you should remove from the config, as I
>> understand it, all resources should run together on the same node and
>> migrate together to the other node.
>>
>>     1.
>>        location cli-prefer-ip_virtual01 ip_virtual01 \
>>     2.
>>                rule $id="cli-prefer-rule-ip_virtual01" inf: #uname eq finley
>>     3.
>>        location cli-prefer-iscsi02_lun1 iscsi02_lun1 \
>>     4.
>>                rule $id="cli-prefer-rule-iscsi02_lun1" inf: #uname eq godfrey
>>     5.
>>        location cli-prefer-iscsi02_target iscsi02_target \
>>     6.
>>                rule $id="cli-prefer-rule-iscsi02_target" inf: #uname eq
>>        finley
> I am sorry, I don’t know what I should do with these 6 rules?
>
After you put a node in standby, if it's the active node it will migrate 
the resources to the passive node and make that one active. However you 
must remember to issue the command crm node online $nodename otherwise 
the node will not be allowed to run resources on it. Just as a side note.
>> This simplifies resource design and thus keeping the cib smaller, while
>> achieving the same functional goal.
>>
>> Output of ptest -LsVVV and some logs in a pastebin might help.
> I changed my configuration according to your comments and the standby
> and reboot of both nodes seems to works fine now! Thank you!
>
> http://debian.pastebin.com/LuUGkRLd<  configuration and ptest output
>
> However I still have the problem that I cant seem to move the resources
> between nodes with the crm resource move command.
The way I used the crm move command was not to specify the node name. I 
can't remember now why I did that (probably because I also used it on a 
2-node cluster), but the logic was use crm move groupname, and it will 
create a location constraint preventing the resources from the group 
from running on the node that's currently primary. After the migration 
of the resources has occured, in order to remove the location constraint 
(e.g.: allow the resources to move back if necessary) you must either 
remove the location constraint from the cib or use crm unmove groupname, 
I used the unmove command.

Just to be clear:

1. resources on finley ==> crm resource move ==> resources move to 
godfrey ==> crm resource unmove ==> resources remain on godfrey (we've 
just removed the constraint, but the resource stickiness prevents the 
ping-pong effect)
2. resources on godfrey ==> crm resource move ==> resources move to 
finley ==> crm resource unmove ==> resources remain on finley (same as 1 
but from a different view)

Things to be aware of:

1. resources on a node ==> crm resource move ==> before the resources 
finish migrating you issue crm resources unmove ==> the resources don't 
finish migrating to the other node and come back to the original node 
(so don't get finger happy on the keyboard, give the resources time to 
move).
2. resources on finley ==> crm resource move ==> resources move to 
godfrey ==> godfrey crashes ==> resources don't migrate to finley 
(because the crm resource unmove command was not issues, so the location 
constraint preventing the resources from running on finley is still in 
place, even if finley is the last node in the cluster) ==> crm resource 
unmove ==> resources start on finley

One thing to test would be to first remove any config that looks like this
location cli-prefer-rg_iscsi rg_iscsi \
         rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq finley
With reference either to finley or to godfrey. Reboot both nodes, let 
them start and settle on a location, do a crm configure save 
initial.config. Issue the crm resource move (let them migrate), then crm 
configure save migrated.config, then crm resource unmove, then crm 
configure save unmigrated.config, and compare the results. This way 
you'll see how the setup looks and what rules are added and removed 
during the process.

If the move command somehow doesn't work, you might want to take a look 
if you've configured resource level fencing for DRBD, 
http://www.drbd.org/users-guide/s-pacemaker-fencing.html
The fence peer handler will add a constraint in some cases (such as when 
you put a node in standby) preventing the DRBD resource to run. When you 
bring a node online, and there have been disk changes and DRBD has to 
sync some data, until the data is synced the constraint is still there, 
so issuing a crm resource move while DRBD is syncing won't have the 
expected outcome (again the reference to being finger happy on the 
keyboard). After the sync is done, the crm-unfence-peer.sh removes the 
constraint, then the move command will work.

Just a couple of things to keep in mind.

HTH,
Dan

> Would you be willing to take a look at the pastebin config and ptest
> output and maybe tell how to move the resources?
>
> With kind regards,
>
> Jelle de Jong

-- 
Dan Frincu
CCNA, RHCE