[Pacemaker] Resource Group Scoring - failover node showing -1000000
Bobbie Lind
blind at sms-fed.com
Mon Aug 15 13:54:16 CET 2011
Some more relevant info:
OS: RHEL 5.6
Pacemaker: 1.0.11
Corosync 1.2.8
OpenAIS 1.1.4
Would I be better off dumping the group all together and just using location
and ordering? I thought that was what groups were supposed to take care of,
to simplify the configuration and restraints.
Bobbie Lind
Systems Engineer
*Solutions Made Simple, Inc (SMSi)*
On Thu, Aug 11, 2011 at 8:43 AM, Bobbie Lind <blind at sms-fed.com> wrote:
> I'm having a hard time trying to understand the scoring that is displayed
> for a Resource group.
>
> I'm trying to accomplish a Resource group with two resources (an LVM and a
> LUN) that runs in an Active/Passive method, that only attempts to run on two
> nodes the Primary s02ns070 and the secondary s02ns090.
>
> Everything appears to work correctly for s02ns070 (Primary node) except
> that the scoring for the group_color and native_color of the lun
> (resMDT0000) is not displaying properly, see snip below (but it works).
> The backup node s02ns090 does not seem to have the proper scoring for the
> native_color for both the LVM and the LUN. It currently shows -1000000,
> which is why it's not failing over.
>
> I'm trying to figure out where the -1000000 score is coming from.
>
> Here are the relevant portions of my config file:
>
> <snip>
> primitive resMDT0000 ocf:heartbeat:Filesystem \
> meta target-role="Started" \
> operations $id="resMDT0000-operations" \
> op monitor interval="120" timeout="60" \
> op start interval="0" timeout="300" \
> op stop interval="0" timeout="300" \
> params device="/dev/mapper/dsdw_mdt_vg-dsdw_mdt_vol"
> directory="/lustre/dsdw-MDT0000" fstype="lustre"
> primitive resMDTLVM ocf:heartbeat:LVM \
> params volgrpname="dsdw_mdt_vg"
> group MDSgroup resMDTLVM resMDT0000
> location locMDSprimary MDSgroup inf: s02ns070
> location locMDSsecondary MDSgroup 5000: s02ns090
> colocation colocMDSOSS1 -inf: anchorOSS1 MDSgroup
> colocation colocMDSOSS2 -inf: anchorOSS2 MDSgroup
> colocation colocMDSOSS3 -inf: anchorOSS3 MDSgroup
> colocation colocMDSOSS4 -inf: anchorOSS4 MDSgroup
> <snip>
>
> On first startup of the cluster the following scores are set to the
> relevant nodes: found using ptest -Ls
>
> <snip>
> group_color: MDSgroup allocation score on s02ns070: 1000000
> group_color: MDSgroup allocation score on s02ns090: 5000
> group_color: resMDTLVM allocation score on s02ns070: 1000000
> group_color: resMDTLVM allocation score on s02ns090: 5000
>
> group_color: resMDT0000 allocation score on s02ns070: 0
> group_color: resMDT0000 allocation score on s02ns090: 0
>
> native_color: resMDTLVM allocation score on s02ns070: 1000000
> native_color: resMDTLVM allocation score on s02ns090: -1000000
>
> native_color: resMDT0000 allocation score on s02ns070: 0
> native_color: resMDT0000 allocation score on s02ns090: -1000000
> <snip>
>
> On top of this the secondary node is trying to start resources that it
> shouldn't have access to (according to how I think I have colocation set up)
>
> I have attached an hb_report from the time I start both nodes until it
> settles in the odd configuration of primary node holding the resource and
> the secondary node trying to start other seemingly random resources.
>
> I have looked into the Asymmetrical "opt-in" clusters from
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch06s02s02.htmland I am wondering if this will fix some (if not all) of my issues with the
> secondary node. I have also checked out the Master/Slave configuration but
> I'm not sure that's what I am looking for since LVMs and the LUN can not
> (and should not) be started in more than one place.
>
> My questions are:
> 1) Why does the resource resMDT0000 not seem to pull the proper scoring
> both in the group_color or that native_color? And what is it about my
> configuration that I set up wrong to make this happen?
> 2) Is there a way to 'reset' scoring or force a score recalculation?
> 3) What would be the proper debug tool to use to find out where and what is
> changing/affecting the scores?
>
> Any help would be greatly appreciated.
>
>
> Bobbie Lind
> Systems Engineer
> *Solutions Made Simple, Inc (SMSi)*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110815/2877e02a/attachment.html>
More information about the Pacemaker
mailing list