[Pacemaker] Pacemaker 1.1.6 order possible bug ?

Wed Sep 5 14:54:53 UTC 2012

Hi,

On Wed, Sep 05, 2012 at 03:09:15PM +0200, Tomáš Vavřička wrote:
> On 09/05/2012 11:44 AM, Dejan Muhamedagic wrote:
>> On Wed, Sep 05, 2012 at 07:51:35AM +1000, Andrew Beekhof wrote:
>>> On Mon, Sep 3, 2012 at 3:41 PM, Tomáš Vavřička <vavricka at ttc.cz> wrote:
>>>> Hello,
>>>>
>>>> Sorry If I send same question twice, but message did not appeared on mailing
>>>> list.
>>>>
>>>> I have a problem with orders in pacemaker 1.1.6 and corosync 1.4.1.
>>>>
>>>> Order below is working for failover, but it is not working when one cluster
>>>> node starts up (drbd stays in Slave state and ms_toponet is started before
>>>> DRBD gets promoted).
>>>>
>>>> order o_start inf: ms_drbd_postgres:promote postgres:start
>>>> ms_toponet:promote monitor_cluster:start
>>>>
>>>> Order below is not working for failover (it kills slave toponet app and
>>>> start it again) but it is working correctly when cluster starts up.
>>>>
>>>> order o_start inf: ms_drbd_postgres:promote postgres:start ms_toponet:start
>>>> ms_toponet:promote monitor_cluster:start
>>> I would recommend breaking this into "basic" constraints.
>>> The shell syntax for constraint sets has been a source of confusion for a while.
>> Nothing's wrong with the shell syntax here. I believe that this
>> has been discussed before. When in doubt what the shell does,
>> just use "show xml".
>
> crm configure show xml output:

There are no resource_set elements in this configuration.

Thanks,

Dejan

> <?xml version="1.0" ?>
> <cib admin_epoch="0" cib-last-written="Wed Sep  5 07:56:52 2012"  
> crm_feature_set="3.0.5" dc-uuid="toponet30" epoch="28" have-quorum="1"  
> num_updates="119" update-client="cibadmin" updat
> e-origin="toponet31" update-user="root" validate-with="pacemaker-1.2">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"  
> value="1.1.6-b988976485d15cb702c9307df55512d323831a5e"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"  
> name="cluster-infrastructure" value="openais"/>
>         <nvpair id="cib-bootstrap-options-expected-quorum-votes"  
> name="expected-quorum-votes" value="2"/>
>         <nvpair id="cib-bootstrap-options-no-quorum-policy"  
> name="no-quorum-policy" value="ignore"/>
>         <nvpair id="cib-bootstrap-options-stonith-enabled"  
> name="stonith-enabled" value="true"/>
>         <nvpair id="cib-bootstrap-options-last-lrm-refresh"  
> name="last-lrm-refresh" value="1346417427"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="toponet31" type="normal" uname="toponet31"/>
>       <node id="toponet30" type="normal" uname="toponet30"/>
>     </nodes>
>     <resources>
>       <group id="postgres">
>         <primitive class="ocf" id="pg_fs" provider="heartbeat"  
> type="Filesystem">
>           <instance_attributes id="pg_fs-instance_attributes">
>             <nvpair id="pg_fs-instance_attributes-device" name="device"  
> value="/dev/drbd0"/>
>             <nvpair id="pg_fs-instance_attributes-directory"  
> name="directory" value="/var/lib/pgsql"/>
>             <nvpair id="pg_fs-instance_attributes-fstype" name="fstype"  
> value="ext3"/>
>           </instance_attributes>
>         </primitive>
>         <primitive class="ocf" id="PGIP" provider="heartbeat"  
> type="IPaddr2">
>           <instance_attributes id="PGIP-instance_attributes">
>             <nvpair id="PGIP-instance_attributes-ip" name="ip"  
> value="192.168.100.3"/>
>             <nvpair id="PGIP-instance_attributes-cidr_netmask"  
> name="cidr_netmask" value="29"/>
>           </instance_attributes>
>           <operations>
>             <op id="PGIP-monitor-5s" interval="5s" name="monitor"/>
>           </operations>
>         </primitive>
>         <primitive class="ocf" id="postgresql" provider="heartbeat"  
> type="pgsql">
>           <operations>
>             <op id="postgresql-start-0" interval="0" name="start"  
> timeout="80s"/>
>             <op id="postgresql-stop-0" interval="0" name="stop"  
> timeout="60s"/>
>             <op id="postgresql-monitor-10s" interval="10s"  
> name="monitor" timeout="10s">
>               <instance_attributes  
> id="postgresql-monitor-10s-instance_attributes">
>                 <nvpair  
> id="postgresql-monitor-10s-instance_attributes-depth" name="depth"  
> value="0"/>
>               </instance_attributes>
>             </op>
>           </operations>
>         </primitive>
>       </group>
>       <primitive class="ocf" id="monitor_cluster" provider="heartbeat"  
> type="monitor_cluster">
>         <operations>
>           <op id="monitor_cluster-monitor-30s" interval="30s"  
> name="monitor"/>
>           <op id="monitor_cluster-start-0" interval="0" name="start"  
> timeout="30s"/>
>         </operations>
>         <meta_attributes id="monitor_cluster-meta_attributes">
>           <nvpair id="monitor_cluster-meta_attributes-target-role"  
> name="target-role" value="Started"/>
>         </meta_attributes>
>       </primitive>
>       <master id="ms_toponet">
>         <meta_attributes id="ms_toponet-meta_attributes">
>           <nvpair id="ms_toponet-meta_attributes-master-max"  
> name="master-max" value="1"/>
>           <nvpair id="ms_toponet-meta_attributes-master-node-max"  
> name="master-node-max" value="1"/>
>           <nvpair id="ms_toponet-meta_attributes-clone-max"  
> name="clone-max" value="2"/>
>           <nvpair id="ms_toponet-meta_attributes-clone-node-max"  
> name="clone-node-max" value="1"/>
>           <nvpair id="ms_toponet-meta_attributes-target-role"  
> name="target-role" value="Master"/>
>         </meta_attributes>
>         <primitive class="ocf" id="toponet" provider="heartbeat"  
> type="toponet">
>           <operations>
>             <op id="toponet-start-0" interval="0" name="start"  
> timeout="180s"/>
>             <op id="toponet-stop-0" interval="0" name="stop"  
> timeout="60s"/>
>             <op id="toponet-monitor-10s" interval="10s" name="monitor"  
> on-fail="standby" role="Master" timeout="20s"/>
>             <op id="toponet-monitor-20s" interval="20s" name="monitor"  
> role="Slave" timeout="40s"/>
>             <op id="toponet-promote-0" interval="0" name="promote"  
> timeout="120s"/>
>             <op id="toponet-demote-0" interval="0" name="demote"  
> timeout="120s"/>
>           </operations>
>         </primitive>
>       </master>
>       <master id="ms_drbd_postgres">
>         <meta_attributes id="ms_drbd_postgres-meta_attributes">
>           <nvpair id="ms_drbd_postgres-meta_attributes-master-max"  
> name="master-max" value="1"/>
>           <nvpair id="ms_drbd_postgres-meta_attributes-master-node-max"  
> name="master-node-max" value="1"/>
>           <nvpair id="ms_drbd_postgres-meta_attributes-clone-max"  
> name="clone-max" value="2"/>
>           <nvpair id="ms_drbd_postgres-meta_attributes-clone-node-max"  
> name="clone-node-max" value="1"/>
>           <nvpair id="ms_drbd_postgres-meta_attributes-notify"  
> name="notify" value="true"/>
>           <nvpair id="ms_drbd_postgres-meta_attributes-target-role"  
> name="target-role" value="Master"/>
>         </meta_attributes>
>         <primitive class="ocf" id="drbd_postgres" provider="linbit"  
> type="drbd">
>           <instance_attributes id="drbd_postgres-instance_attributes">
>             <nvpair id="drbd_postgres-instance_attributes-drbd_resource"  
> name="drbd_resource" value="postgres"/>
>           </instance_attributes>
>           <operations>
>             <op id="drbd_postgres-start-0" interval="0" name="start"  
> timeout="240s"/>
>             <op id="drbd_postgres-stop-0" interval="0" name="stop"  
> timeout="120s"/>
>             <op id="drbd_postgres-monitor-5s" interval="5s"  
> name="monitor" role="Master" timeout="10s"/>
>             <op id="drbd_postgres-monitor-10s" interval="10s"  
> name="monitor" role="Slave" timeout="20s"/>
>           </operations>
>         </primitive>
>       </master>
>       <primitive class="stonith" id="st_primary" type="external/xen0">
>         <operations>
>           <op id="st_primary-start-0" interval="0" name="start"  
> timeout="60s"/>
>         </operations>
>         <instance_attributes id="st_primary-instance_attributes">
>           <nvpair id="st_primary-instance_attributes-hostlist"  
> name="hostlist" value="toponet31:/etc/xen/vm/toponet31"/>
>           <nvpair id="st_primary-instance_attributes-dom0" name="dom0"  
> value="172.16.103.54"/>
>         </instance_attributes>
>       </primitive>
>       <primitive class="stonith" id="st_secondary" type="external/xen0">
>         <operations>
>           <op id="st_secondary-start-0" interval="0" name="start"  
> timeout="60s"/>
>         </operations>
>         <instance_attributes id="st_secondary-instance_attributes">
>           <nvpair id="st_secondary-instance_attributes-hostlist"  
> name="hostlist" value="toponet30:/etc/xen/vm/toponet30"/>
>           <nvpair id="st_secondary-instance_attributes-dom0" name="dom0" 
> value="172.16.103.54"/>
>         </instance_attributes>
>       </primitive>
>     </resources>
>     <constraints>
>       <rsc_colocation id="c1" rsc="monitor_cluster" score="INFINITY"  
> with-rsc="ms_toponet" with-rsc-role="Master"/>
>       <rsc_location id="master-prefer-node1" node="toponet30"  
> rsc="postgres" score="100"/>
>       <rsc_location id="loc_st_sec" node="toponet30" rsc="st_secondary"  
> score="-INFINITY"/>
>       <rsc_order first="ms_toponet" first-action="promote" id="o4"  
> score="INFINITY" then="monitor_cluster" then-action="start"/>
>       <rsc_colocation id="c3" rsc="postgres" score="INFINITY"  
> with-rsc="ms_drbd_postgres" with-rsc-role="Master"/>
>       <rsc_order first="ms_toponet" first-action="start" id="o3"  
> score="INFINITY" then="ms_toponet" then-action="promote"/>
>       <rsc_colocation id="c2" rsc="ms_toponet" rsc-role="Master"  
> score="INFINITY" with-rsc="postgres"/>
>       <rsc_location id="loc_st_pri" node="toponet31" rsc="st_primary"  
> score="-INFINITY"/>
>       <rsc_order first="ms_drbd_postgres" first-action="promote" id="o1" 
> score="INFINITY" then="postgres" then-action="start"/>
>       <rsc_order first="postgres" first-action="start" id="o2"  
> score="INFINITY" then="ms_toponet" then-action="start"/>
>     </constraints>
>     <rsc_defaults>
>       <meta_attributes id="rsc-options">
>         <nvpair id="rsc-options-resource-stickiness"  
> name="resource-stickiness" value="5000"/>
>       </meta_attributes>
>     </rsc_defaults>
>   </configuration>
> </cib>
>
>
>
>
>>> order o1 inf: ms_drbd_postgres:promote postgres:start
>>> order o2 inf: postgres:start ms_toponet:start
>>> order o3 inf: ms_toponet:start ms_toponet:promote
>>> order o4 inf: ms_toponet:promote monitor_cluster:start
>>>
>>> If you still have problems with the expanded form, let us know.
>> Resource sets are not an issue in order constraints, but rather
>> in collocations.
>>
>> Thanks,
>>
>> Dejan
>>
>>>> I want to the pacemaker to act as in 1.0.12 version.
>>>> * when toponet master app is killed, move postgres resource to other node
>>>> and promote ms_toponet and ms_drbd_postgres to Master
>>>> * when one node is starting promote DRBD to master is is UpToDate
>>>>
>>>> Am I doing something wrong?
>>>>
>>>> It looks to me pacemaker ignores some orders (pacemaker should wait for DRBD
>>>> promotion when starting toponet app, but toponet app is started right after
>>>> DRBD start (slave)). I tried to solve this by different orders with
>>>> combination symmetrical=false, split orders, different orders for start and
>>>> stop, but no success at all (seems to me like completely ignoring
>>>> symmetrical=false directive).
>>>>
>>>> Pacemaker 1.1.7 is not working for me, because it has broken on-fail
>>>> directive.
>>>>
>>>> crm_mon output:
>>>>
>>>> ============
>>>> Last updated: Fri Aug 31 14:51:11 2012
>>>> Last change: Fri Aug 31 14:50:27 2012 by hacluster via crmd on toponet30
>>>> Stack: openais
>>>> Current DC: toponet30 - partition WITHOUT quorum
>>>> Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e
>>>> 2 Nodes configured, 2 expected votes
>>>> 10 Resources configured.
>>>> ============
>>>>
>>>> Online: [ toponet30 toponet31 ]
>>>>
>>>> st_primary      (stonith:external/xen0):        Started toponet30
>>>> st_secondary    (stonith:external/xen0):        Started toponet31
>>>>   Master/Slave Set: ms_drbd_postgres
>>>>       Masters: [ toponet30 ]
>>>>       Slaves: [ toponet31 ]
>>>>   Resource Group: postgres
>>>>       pg_fs      (ocf::heartbeat:Filesystem):    Started toponet30
>>>>       PGIP       (ocf::heartbeat:IPaddr2):       Started toponet30
>>>>       postgresql (ocf::heartbeat:pgsql): Started toponet30
>>>> monitor_cluster (ocf::heartbeat:monitor_cluster):       Started toponet30
>>>>   Master/Slave Set: ms_toponet
>>>>       Masters: [ toponet30 ]
>>>>       Slaves: [ toponet31 ]
>>>>
>>>> configuration:
>>>>
>>>> node toponet30
>>>> node toponet31
>>>> primitive PGIP ocf:heartbeat:IPaddr2 \
>>>>          params ip="192.168.100.3" cidr_netmask="29" \
>>>>          op monitor interval="5s"
>>>> primitive drbd_postgres ocf:linbit:drbd \
>>>>          params drbd_resource="postgres" \
>>>>          op start interval="0" timeout="240s" \
>>>>          op stop interval="0" timeout="120s" \
>>>>          op monitor interval="5s" role="Master" timeout="10s" \
>>>>          op monitor interval="10s" role="Slave" timeout="20s"
>>>> primitive monitor_cluster ocf:heartbeat:monitor_cluster \
>>>>          op monitor interval="30s" \
>>>>          op start interval="0" timeout="30s" \
>>>>          meta target-role="Started"
>>>> primitive pg_fs ocf:heartbeat:Filesystem \
>>>>          params device="/dev/drbd0" directory="/var/lib/pgsql" fstype="ext3"
>>>> primitive postgresql ocf:heartbeat:pgsql \
>>>>          op start interval="0" timeout="80s" \
>>>>          op stop interval="0" timeout="60s" \
>>>>          op monitor interval="10s" timeout="10s" depth="0"
>>>> primitive st_primary stonith:external/xen0 \
>>>>          op start interval="0" timeout="60s" \
>>>>          params hostlist="toponet31:/etc/xen/vm/toponet31"
>>>> dom0="172.16.103.54"
>>>> primitive st_secondary stonith:external/xen0 \
>>>>          op start interval="0" timeout="60s" \
>>>>          params hostlist="toponet30:/etc/xen/vm/toponet30"
>>>> dom0="172.16.103.54"
>>>> primitive toponet ocf:heartbeat:toponet \
>>>>          op start interval="0" timeout="180s" \
>>>>          op stop interval="0" timeout="60s" \
>>>>          op monitor interval="10s" role="Master" timeout="20s"
>>>> on-fail="standby" \
>>>>          op monitor interval="20s" role="Slave" timeout="40s" \
>>>>          op promote interval="0" timeout="120s" \
>>>>          op demote interval="0" timeout="120s"
>>>> group postgres pg_fs PGIP postgresql
>>>> ms ms_drbd_postgres drbd_postgres \
>>>>          meta master-max="1" master-node-max="1" clone-max="2"
>>>> clone-node-max="1" notify="true" target-role="Master"
>>>> ms ms_toponet toponet \
>>>>          meta master-max="1" master-node-max="1" clone-max="2"
>>>> clone-node-max="1" target-role="Master"
>>>> location loc_st_pri st_primary -inf: toponet31
>>>> location loc_st_sec st_secondary -inf: toponet30
>>>> location master-prefer-node1 postgres 100: toponet30
>>>> colocation pg_on_drbd inf: monitor_cluster ms_toponet:Master postgres
>>>> ms_drbd_postgres:Master
>>>> order o_start inf: ms_drbd_postgres:start ms_drbd_postgres:promote
>>>> postgres:start ms_toponet:start ms_toponet:promote monitor_cluster:start
>>>> property $id="cib-bootstrap-options" \
>>>>          dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>>>>          cluster-infrastructure="openais" \
>>>>          expected-quorum-votes="2" \
>>>>          no-quorum-policy="ignore" \
>>>>          stonith-enabled="true"
>>>> rsc_defaults $id="rsc-options" \
>>>>          resource-stickiness="5000"
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org