[Pacemaker] stickiness weirdness please explain
Dan Frincu
df.cluster at gmail.com
Thu Feb 24 10:38:03 UTC 2011
Hi,
On 02/23/2011 06:19 PM, Jelle de Jong wrote:
> Dear Dan,
>
> Thank you for taking the time to read and answer my question.
>
> On 23-02-11 09:42, Dan Frincu wrote:
>> This is something that you should remove from the config, as I
>> understand it, all resources should run together on the same node and
>> migrate together to the other node.
>>
>> 1.
>> location cli-prefer-ip_virtual01 ip_virtual01 \
>> 2.
>> rule $id="cli-prefer-rule-ip_virtual01" inf: #uname eq finley
>> 3.
>> location cli-prefer-iscsi02_lun1 iscsi02_lun1 \
>> 4.
>> rule $id="cli-prefer-rule-iscsi02_lun1" inf: #uname eq godfrey
>> 5.
>> location cli-prefer-iscsi02_target iscsi02_target \
>> 6.
>> rule $id="cli-prefer-rule-iscsi02_target" inf: #uname eq
>> finley
> I am sorry, I don’t know what I should do with these 6 rules?
>
After you put a node in standby, if it's the active node it will migrate
the resources to the passive node and make that one active. However you
must remember to issue the command crm node online $nodename otherwise
the node will not be allowed to run resources on it. Just as a side note.
>> This simplifies resource design and thus keeping the cib smaller, while
>> achieving the same functional goal.
>>
>> Output of ptest -LsVVV and some logs in a pastebin might help.
> I changed my configuration according to your comments and the standby
> and reboot of both nodes seems to works fine now! Thank you!
>
> http://debian.pastebin.com/LuUGkRLd< configuration and ptest output
>
> However I still have the problem that I cant seem to move the resources
> between nodes with the crm resource move command.
The way I used the crm move command was not to specify the node name. I
can't remember now why I did that (probably because I also used it on a
2-node cluster), but the logic was use crm move groupname, and it will
create a location constraint preventing the resources from the group
from running on the node that's currently primary. After the migration
of the resources has occured, in order to remove the location constraint
(e.g.: allow the resources to move back if necessary) you must either
remove the location constraint from the cib or use crm unmove groupname,
I used the unmove command.
Just to be clear:
1. resources on finley ==> crm resource move ==> resources move to
godfrey ==> crm resource unmove ==> resources remain on godfrey (we've
just removed the constraint, but the resource stickiness prevents the
ping-pong effect)
2. resources on godfrey ==> crm resource move ==> resources move to
finley ==> crm resource unmove ==> resources remain on finley (same as 1
but from a different view)
Things to be aware of:
1. resources on a node ==> crm resource move ==> before the resources
finish migrating you issue crm resources unmove ==> the resources don't
finish migrating to the other node and come back to the original node
(so don't get finger happy on the keyboard, give the resources time to
move).
2. resources on finley ==> crm resource move ==> resources move to
godfrey ==> godfrey crashes ==> resources don't migrate to finley
(because the crm resource unmove command was not issues, so the location
constraint preventing the resources from running on finley is still in
place, even if finley is the last node in the cluster) ==> crm resource
unmove ==> resources start on finley
One thing to test would be to first remove any config that looks like this
location cli-prefer-rg_iscsi rg_iscsi \
rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq finley
With reference either to finley or to godfrey. Reboot both nodes, let
them start and settle on a location, do a crm configure save
initial.config. Issue the crm resource move (let them migrate), then crm
configure save migrated.config, then crm resource unmove, then crm
configure save unmigrated.config, and compare the results. This way
you'll see how the setup looks and what rules are added and removed
during the process.
If the move command somehow doesn't work, you might want to take a look
if you've configured resource level fencing for DRBD,
http://www.drbd.org/users-guide/s-pacemaker-fencing.html
The fence peer handler will add a constraint in some cases (such as when
you put a node in standby) preventing the DRBD resource to run. When you
bring a node online, and there have been disk changes and DRBD has to
sync some data, until the data is synced the constraint is still there,
so issuing a crm resource move while DRBD is syncing won't have the
expected outcome (again the reference to being finger happy on the
keyboard). After the sync is done, the crm-unfence-peer.sh removes the
constraint, then the move command will work.
Just a couple of things to keep in mind.
HTH,
Dan
> Would you be willing to take a look at the pastebin config and ptest
> output and maybe tell how to move the resources?
>
> With kind regards,
>
> Jelle de Jong
--
Dan Frincu
CCNA, RHCE
More information about the Pacemaker
mailing list