[Pacemaker] Pacemaker 1.1.6 order possible bug ?

Mon Sep 10 08:15:54 EDT 2012

On 09/06/2012 07:41 AM, Tomáš Vavřička wrote:
> On 09/05/2012 04:54 PM, Dejan Muhamedagic wrote:
>> Hi,
>>
>> On Wed, Sep 05, 2012 at 03:09:15PM +0200, Tomáš Vavřička wrote:
>>> On 09/05/2012 11:44 AM, Dejan Muhamedagic wrote:
>>>> On Wed, Sep 05, 2012 at 07:51:35AM +1000, Andrew Beekhof wrote:
>>>>> On Mon, Sep 3, 2012 at 3:41 PM, Tomáš Vavřička <vavricka at ttc.cz> 
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Sorry If I send same question twice, but message did not appeared 
>>>>>> on mailing
>>>>>> list.
>>>>>>
>>>>>> I have a problem with orders in pacemaker 1.1.6 and corosync 1.4.1.
>>>>>>
>>>>>> Order below is working for failover, but it is not working when 
>>>>>> one cluster
>>>>>> node starts up (drbd stays in Slave state and ms_toponet is 
>>>>>> started before
>>>>>> DRBD gets promoted).
>>>>>>
>>>>>> order o_start inf: ms_drbd_postgres:promote postgres:start
>>>>>> ms_toponet:promote monitor_cluster:start
>>>>>>
>>>>>> Order below is not working for failover (it kills slave toponet 
>>>>>> app and
>>>>>> start it again) but it is working correctly when cluster starts up.
>>>>>>
>>>>>> order o_start inf: ms_drbd_postgres:promote postgres:start 
>>>>>> ms_toponet:start
>>>>>> ms_toponet:promote monitor_cluster:start
>>>>> I would recommend breaking this into "basic" constraints.
>>>>> The shell syntax for constraint sets has been a source of 
>>>>> confusion for a while.
>>>> Nothing's wrong with the shell syntax here. I believe that this
>>>> has been discussed before. When in doubt what the shell does,
>>>> just use "show xml".
>>> crm configure show xml output:
>> There are no resource_set elements in this configuration.
>>
>> Thanks,
>>
>> Dejan
> Yesterday I posted configuration with already splitted colocations 
> (c1, c2, c3).
>
> Automatically created resource_set for colocation pg_on_drbd:
> colocation pg_on_drbd inf: monitor_cluster ms_toponet:Master postgres 
> ms_drbd_postgres:Master
>
>       <rsc_colocation id="pg_on_drbd" score="INFINITY">
>         <resource_set id="pg_on_drbd-0">
>           <resource_ref id="monitor_cluster"/>
>         </resource_set>
>         <resource_set id="pg_on_drbd-1" role="Master">
>           <resource_ref id="ms_toponet"/>
>         </resource_set>
>         <resource_set id="pg_on_drbd-2">
>           <resource_ref id="postgres"/>
>         </resource_set>
>         <resource_set id="pg_on_drbd-3" role="Master">
>           <resource_ref id="ms_drbd_postgres"/>
>         </resource_set>
>       </rsc_colocation>
>
> Then I tried:
>
>       <rsc_colocation id="pg_on_drbd" score="INFINITY">
>         <resource_set id="pg_on_drbd-0" role="Master" sequential="true">
>           <resource_ref id="monitor_cluster"/>
>           <resource_ref id="ms_toponet"/>
>         </resource_set>
>         <resource_set id="pg_on_drbd-1" role="Master" sequential="true">
>           <resource_ref id="postgres"/>
>           <resource_ref id="ms_drbd_postgres"/>
>         </resource_set>
>       </rsc_colocation>
>
> or
>
>       <rsc_colocation id="pg_on_drbd" score="INFINITY">
>         <resource_set id="pg_on_drbd-0" role="Master" sequential="true">
>           <resource_ref id="monitor_cluster"/>
>           <resource_ref id="ms_toponet"/>
>           <resource_ref id="postgres"/>
>           <resource_ref id="ms_drbd_postgres"/>
>         </resource_set>
>       </rsc_colocation>
>
> None of them helped situation.
>

Is this valid resource_set elements ?

Anyone got some idea where problem is ? Nothing else, what I can try, 
comes on my mind.

Tomas

>>> <?xml version="1.0" ?>
>>> <cib admin_epoch="0" cib-last-written="Wed Sep  5 07:56:52 2012"
>>> crm_feature_set="3.0.5" dc-uuid="toponet30" epoch="28" have-quorum="1"
>>> num_updates="119" update-client="cibadmin" updat
>>> e-origin="toponet31" update-user="root" validate-with="pacemaker-1.2">
>>>    <configuration>
>>>      <crm_config>
>>>        <cluster_property_set id="cib-bootstrap-options">
>>>          <nvpair id="cib-bootstrap-options-dc-version" 
>>> name="dc-version"
>>> value="1.1.6-b988976485d15cb702c9307df55512d323831a5e"/>
>>>          <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>>> name="cluster-infrastructure" value="openais"/>
>>>          <nvpair id="cib-bootstrap-options-expected-quorum-votes"
>>> name="expected-quorum-votes" value="2"/>
>>>          <nvpair id="cib-bootstrap-options-no-quorum-policy"
>>> name="no-quorum-policy" value="ignore"/>
>>>          <nvpair id="cib-bootstrap-options-stonith-enabled"
>>> name="stonith-enabled" value="true"/>
>>>          <nvpair id="cib-bootstrap-options-last-lrm-refresh"
>>> name="last-lrm-refresh" value="1346417427"/>
>>>        </cluster_property_set>
>>>      </crm_config>
>>>      <nodes>
>>>        <node id="toponet31" type="normal" uname="toponet31"/>
>>>        <node id="toponet30" type="normal" uname="toponet30"/>
>>>      </nodes>
>>>      <resources>
>>>        <group id="postgres">
>>>          <primitive class="ocf" id="pg_fs" provider="heartbeat"
>>> type="Filesystem">
>>>            <instance_attributes id="pg_fs-instance_attributes">
>>>              <nvpair id="pg_fs-instance_attributes-device" 
>>> name="device"
>>> value="/dev/drbd0"/>
>>>              <nvpair id="pg_fs-instance_attributes-directory"
>>> name="directory" value="/var/lib/pgsql"/>
>>>              <nvpair id="pg_fs-instance_attributes-fstype" 
>>> name="fstype"
>>> value="ext3"/>
>>>            </instance_attributes>
>>>          </primitive>
>>>          <primitive class="ocf" id="PGIP" provider="heartbeat"
>>> type="IPaddr2">
>>>            <instance_attributes id="PGIP-instance_attributes">
>>>              <nvpair id="PGIP-instance_attributes-ip" name="ip"
>>> value="192.168.100.3"/>
>>>              <nvpair id="PGIP-instance_attributes-cidr_netmask"
>>> name="cidr_netmask" value="29"/>
>>>            </instance_attributes>
>>>            <operations>
>>>              <op id="PGIP-monitor-5s" interval="5s" name="monitor"/>
>>>            </operations>
>>>          </primitive>
>>>          <primitive class="ocf" id="postgresql" provider="heartbeat"
>>> type="pgsql">
>>>            <operations>
>>>              <op id="postgresql-start-0" interval="0" name="start"
>>> timeout="80s"/>
>>>              <op id="postgresql-stop-0" interval="0" name="stop"
>>> timeout="60s"/>
>>>              <op id="postgresql-monitor-10s" interval="10s"
>>> name="monitor" timeout="10s">
>>>                <instance_attributes
>>> id="postgresql-monitor-10s-instance_attributes">
>>>                  <nvpair
>>> id="postgresql-monitor-10s-instance_attributes-depth" name="depth"
>>> value="0"/>
>>>                </instance_attributes>
>>>              </op>
>>>            </operations>
>>>          </primitive>
>>>        </group>
>>>        <primitive class="ocf" id="monitor_cluster" provider="heartbeat"
>>> type="monitor_cluster">
>>>          <operations>
>>>            <op id="monitor_cluster-monitor-30s" interval="30s"
>>> name="monitor"/>
>>>            <op id="monitor_cluster-start-0" interval="0" name="start"
>>> timeout="30s"/>
>>>          </operations>
>>>          <meta_attributes id="monitor_cluster-meta_attributes">
>>>            <nvpair id="monitor_cluster-meta_attributes-target-role"
>>> name="target-role" value="Started"/>
>>>          </meta_attributes>
>>>        </primitive>
>>>        <master id="ms_toponet">
>>>          <meta_attributes id="ms_toponet-meta_attributes">
>>>            <nvpair id="ms_toponet-meta_attributes-master-max"
>>> name="master-max" value="1"/>
>>>            <nvpair id="ms_toponet-meta_attributes-master-node-max"
>>> name="master-node-max" value="1"/>
>>>            <nvpair id="ms_toponet-meta_attributes-clone-max"
>>> name="clone-max" value="2"/>
>>>            <nvpair id="ms_toponet-meta_attributes-clone-node-max"
>>> name="clone-node-max" value="1"/>
>>>            <nvpair id="ms_toponet-meta_attributes-target-role"
>>> name="target-role" value="Master"/>
>>>          </meta_attributes>
>>>          <primitive class="ocf" id="toponet" provider="heartbeat"
>>> type="toponet">
>>>            <operations>
>>>              <op id="toponet-start-0" interval="0" name="start"
>>> timeout="180s"/>
>>>              <op id="toponet-stop-0" interval="0" name="stop"
>>> timeout="60s"/>
>>>              <op id="toponet-monitor-10s" interval="10s" name="monitor"
>>> on-fail="standby" role="Master" timeout="20s"/>
>>>              <op id="toponet-monitor-20s" interval="20s" name="monitor"
>>> role="Slave" timeout="40s"/>
>>>              <op id="toponet-promote-0" interval="0" name="promote"
>>> timeout="120s"/>
>>>              <op id="toponet-demote-0" interval="0" name="demote"
>>> timeout="120s"/>
>>>            </operations>
>>>          </primitive>
>>>        </master>
>>>        <master id="ms_drbd_postgres">
>>>          <meta_attributes id="ms_drbd_postgres-meta_attributes">
>>>            <nvpair id="ms_drbd_postgres-meta_attributes-master-max"
>>> name="master-max" value="1"/>
>>>            <nvpair 
>>> id="ms_drbd_postgres-meta_attributes-master-node-max"
>>> name="master-node-max" value="1"/>
>>>            <nvpair id="ms_drbd_postgres-meta_attributes-clone-max"
>>> name="clone-max" value="2"/>
>>>            <nvpair id="ms_drbd_postgres-meta_attributes-clone-node-max"
>>> name="clone-node-max" value="1"/>
>>>            <nvpair id="ms_drbd_postgres-meta_attributes-notify"
>>> name="notify" value="true"/>
>>>            <nvpair id="ms_drbd_postgres-meta_attributes-target-role"
>>> name="target-role" value="Master"/>
>>>          </meta_attributes>
>>>          <primitive class="ocf" id="drbd_postgres" provider="linbit"
>>> type="drbd">
>>>            <instance_attributes id="drbd_postgres-instance_attributes">
>>>              <nvpair 
>>> id="drbd_postgres-instance_attributes-drbd_resource"
>>> name="drbd_resource" value="postgres"/>
>>>            </instance_attributes>
>>>            <operations>
>>>              <op id="drbd_postgres-start-0" interval="0" name="start"
>>> timeout="240s"/>
>>>              <op id="drbd_postgres-stop-0" interval="0" name="stop"
>>> timeout="120s"/>
>>>              <op id="drbd_postgres-monitor-5s" interval="5s"
>>> name="monitor" role="Master" timeout="10s"/>
>>>              <op id="drbd_postgres-monitor-10s" interval="10s"
>>> name="monitor" role="Slave" timeout="20s"/>
>>>            </operations>
>>>          </primitive>
>>>        </master>
>>>        <primitive class="stonith" id="st_primary" type="external/xen0">
>>>          <operations>
>>>            <op id="st_primary-start-0" interval="0" name="start"
>>> timeout="60s"/>
>>>          </operations>
>>>          <instance_attributes id="st_primary-instance_attributes">
>>>            <nvpair id="st_primary-instance_attributes-hostlist"
>>> name="hostlist" value="toponet31:/etc/xen/vm/toponet31"/>
>>>            <nvpair id="st_primary-instance_attributes-dom0" name="dom0"
>>> value="172.16.103.54"/>
>>>          </instance_attributes>
>>>        </primitive>
>>>        <primitive class="stonith" id="st_secondary" 
>>> type="external/xen0">
>>>          <operations>
>>>            <op id="st_secondary-start-0" interval="0" name="start"
>>> timeout="60s"/>
>>>          </operations>
>>>          <instance_attributes id="st_secondary-instance_attributes">
>>>            <nvpair id="st_secondary-instance_attributes-hostlist"
>>> name="hostlist" value="toponet30:/etc/xen/vm/toponet30"/>
>>>            <nvpair id="st_secondary-instance_attributes-dom0" 
>>> name="dom0"
>>> value="172.16.103.54"/>
>>>          </instance_attributes>
>>>        </primitive>
>>>      </resources>
>>>      <constraints>
>>>        <rsc_colocation id="c1" rsc="monitor_cluster" score="INFINITY"
>>> with-rsc="ms_toponet" with-rsc-role="Master"/>
>>>        <rsc_location id="master-prefer-node1" node="toponet30"
>>> rsc="postgres" score="100"/>
>>>        <rsc_location id="loc_st_sec" node="toponet30" 
>>> rsc="st_secondary"
>>> score="-INFINITY"/>
>>>        <rsc_order first="ms_toponet" first-action="promote" id="o4"
>>> score="INFINITY" then="monitor_cluster" then-action="start"/>
>>>        <rsc_colocation id="c3" rsc="postgres" score="INFINITY"
>>> with-rsc="ms_drbd_postgres" with-rsc-role="Master"/>
>>>        <rsc_order first="ms_toponet" first-action="start" id="o3"
>>> score="INFINITY" then="ms_toponet" then-action="promote"/>
>>>        <rsc_colocation id="c2" rsc="ms_toponet" rsc-role="Master"
>>> score="INFINITY" with-rsc="postgres"/>
>>>        <rsc_location id="loc_st_pri" node="toponet31" rsc="st_primary"
>>> score="-INFINITY"/>
>>>        <rsc_order first="ms_drbd_postgres" first-action="promote" 
>>> id="o1"
>>> score="INFINITY" then="postgres" then-action="start"/>
>>>        <rsc_order first="postgres" first-action="start" id="o2"
>>> score="INFINITY" then="ms_toponet" then-action="start"/>
>>>      </constraints>
>>>      <rsc_defaults>
>>>        <meta_attributes id="rsc-options">
>>>          <nvpair id="rsc-options-resource-stickiness"
>>> name="resource-stickiness" value="5000"/>
>>>        </meta_attributes>
>>>      </rsc_defaults>
>>>    </configuration>
>>> </cib>
>>>
>>>
>>>
>>>
>>>>> order o1 inf: ms_drbd_postgres:promote postgres:start
>>>>> order o2 inf: postgres:start ms_toponet:start
>>>>> order o3 inf: ms_toponet:start ms_toponet:promote
>>>>> order o4 inf: ms_toponet:promote monitor_cluster:start
>>>>>
>>>>> If you still have problems with the expanded form, let us know.
>>>> Resource sets are not an issue in order constraints, but rather
>>>> in collocations.
>>>>
>>>> Thanks,
>>>>
>>>> Dejan
>>>>
>>>>>> I want to the pacemaker to act as in 1.0.12 version.
>>>>>> * when toponet master app is killed, move postgres resource to 
>>>>>> other node
>>>>>> and promote ms_toponet and ms_drbd_postgres to Master
>>>>>> * when one node is starting promote DRBD to master is is UpToDate
>>>>>>
>>>>>> Am I doing something wrong?
>>>>>>
>>>>>> It looks to me pacemaker ignores some orders (pacemaker should 
>>>>>> wait for DRBD
>>>>>> promotion when starting toponet app, but toponet app is started 
>>>>>> right after
>>>>>> DRBD start (slave)). I tried to solve this by different orders with
>>>>>> combination symmetrical=false, split orders, different orders for 
>>>>>> start and
>>>>>> stop, but no success at all (seems to me like completely ignoring
>>>>>> symmetrical=false directive).
>>>>>>
>>>>>> Pacemaker 1.1.7 is not working for me, because it has broken on-fail
>>>>>> directive.
>>>>>>
>>>>>> crm_mon output:
>>>>>>
>>>>>> ============
>>>>>> Last updated: Fri Aug 31 14:51:11 2012
>>>>>> Last change: Fri Aug 31 14:50:27 2012 by hacluster via crmd on 
>>>>>> toponet30
>>>>>> Stack: openais
>>>>>> Current DC: toponet30 - partition WITHOUT quorum
>>>>>> Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e
>>>>>> 2 Nodes configured, 2 expected votes
>>>>>> 10 Resources configured.
>>>>>> ============
>>>>>>
>>>>>> Online: [ toponet30 toponet31 ]
>>>>>>
>>>>>> st_primary      (stonith:external/xen0):        Started toponet30
>>>>>> st_secondary    (stonith:external/xen0):        Started toponet31
>>>>>>    Master/Slave Set: ms_drbd_postgres
>>>>>>        Masters: [ toponet30 ]
>>>>>>        Slaves: [ toponet31 ]
>>>>>>    Resource Group: postgres
>>>>>>        pg_fs      (ocf::heartbeat:Filesystem): Started toponet30
>>>>>>        PGIP       (ocf::heartbeat:IPaddr2): Started toponet30
>>>>>>        postgresql (ocf::heartbeat:pgsql): Started toponet30
>>>>>> monitor_cluster (ocf::heartbeat:monitor_cluster): Started toponet30
>>>>>>    Master/Slave Set: ms_toponet
>>>>>>        Masters: [ toponet30 ]
>>>>>>        Slaves: [ toponet31 ]
>>>>>>
>>>>>> configuration:
>>>>>>
>>>>>> node toponet30
>>>>>> node toponet31
>>>>>> primitive PGIP ocf:heartbeat:IPaddr2 \
>>>>>>           params ip="192.168.100.3" cidr_netmask="29" \
>>>>>>           op monitor interval="5s"
>>>>>> primitive drbd_postgres ocf:linbit:drbd \
>>>>>>           params drbd_resource="postgres" \
>>>>>>           op start interval="0" timeout="240s" \
>>>>>>           op stop interval="0" timeout="120s" \
>>>>>>           op monitor interval="5s" role="Master" timeout="10s" \
>>>>>>           op monitor interval="10s" role="Slave" timeout="20s"
>>>>>> primitive monitor_cluster ocf:heartbeat:monitor_cluster \
>>>>>>           op monitor interval="30s" \
>>>>>>           op start interval="0" timeout="30s" \
>>>>>>           meta target-role="Started"
>>>>>> primitive pg_fs ocf:heartbeat:Filesystem \
>>>>>>           params device="/dev/drbd0" directory="/var/lib/pgsql" 
>>>>>> fstype="ext3"
>>>>>> primitive postgresql ocf:heartbeat:pgsql \
>>>>>>           op start interval="0" timeout="80s" \
>>>>>>           op stop interval="0" timeout="60s" \
>>>>>>           op monitor interval="10s" timeout="10s" depth="0"
>>>>>> primitive st_primary stonith:external/xen0 \
>>>>>>           op start interval="0" timeout="60s" \
>>>>>>           params hostlist="toponet31:/etc/xen/vm/toponet31"
>>>>>> dom0="172.16.103.54"
>>>>>> primitive st_secondary stonith:external/xen0 \
>>>>>>           op start interval="0" timeout="60s" \
>>>>>>           params hostlist="toponet30:/etc/xen/vm/toponet30"
>>>>>> dom0="172.16.103.54"
>>>>>> primitive toponet ocf:heartbeat:toponet \
>>>>>>           op start interval="0" timeout="180s" \
>>>>>>           op stop interval="0" timeout="60s" \
>>>>>>           op monitor interval="10s" role="Master" timeout="20s"
>>>>>> on-fail="standby" \
>>>>>>           op monitor interval="20s" role="Slave" timeout="40s" \
>>>>>>           op promote interval="0" timeout="120s" \
>>>>>>           op demote interval="0" timeout="120s"
>>>>>> group postgres pg_fs PGIP postgresql
>>>>>> ms ms_drbd_postgres drbd_postgres \
>>>>>>           meta master-max="1" master-node-max="1" clone-max="2"
>>>>>> clone-node-max="1" notify="true" target-role="Master"
>>>>>> ms ms_toponet toponet \
>>>>>>           meta master-max="1" master-node-max="1" clone-max="2"
>>>>>> clone-node-max="1" target-role="Master"
>>>>>> location loc_st_pri st_primary -inf: toponet31
>>>>>> location loc_st_sec st_secondary -inf: toponet30
>>>>>> location master-prefer-node1 postgres 100: toponet30
>>>>>> colocation pg_on_drbd inf: monitor_cluster ms_toponet:Master 
>>>>>> postgres
>>>>>> ms_drbd_postgres:Master
>>>>>> order o_start inf: ms_drbd_postgres:start ms_drbd_postgres:promote
>>>>>> postgres:start ms_toponet:start ms_toponet:promote 
>>>>>> monitor_cluster:start
>>>>>> property $id="cib-bootstrap-options" \
>>>>>> dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>>>>>>           cluster-infrastructure="openais" \
>>>>>>           expected-quorum-votes="2" \
>>>>>>           no-quorum-policy="ignore" \
>>>>>>           stonith-enabled="true"
>>>>>> rsc_defaults $id="rsc-options" \
>>>>>>           resource-stickiness="5000"
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: 
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: 
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: 
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org