[Pacemaker] Ordered resource is not restarting after migration if it's already started on new host

Mon Dec 17 19:28:25 UTC 2012

On Dec 16, 2012, at 7:29 PM, pacemaker-request at oss.clusterlabs.org wrote:

> Message: 5
> Date: Mon, 17 Dec 2012 14:23:15 +1100
> From: Andrew Beekhof <andrew at beekhof.net>
> To: The Pacemaker cluster resource manager
> 	<pacemaker at oss.clusterlabs.org>
> Subject: Re: [Pacemaker] Ordered resource is not restarting after
> 	migration if it's already started on new host
> Message-ID:
> 	<CAEDLWG35TfnGhMM_FuSSxedryAMSS5OwFxRdLG5Ytcmj7yxaWw at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Sat, Dec 15, 2012 at 10:58 AM, Neal Peters <nealppeters at gmail.com> wrote:
>> Hello-
>> 
>> I'm running Pacemaker v. 1.1 (pacemaker-1.1.7-6.el6.x86_64) on CentOS 6.3 and am observing behavior on my systems that differs from the behavior described in the manual.
>> 
>> Basically, the desired behavior (and the behavior described in Pacemaker Explained Section 6.3.1) is that when a "first" resource in an ordered set is moved to a host where the "then" resource is already running, the "then" resource will be restarted.
>> 
>> From Pacemaker Explained 6.3.1 Mandatory Ordering:
>> -If the first resource is (re)started while the then resource is running, the then resource will be stopped and restarted.
>> 
>> I am not seeing this behavior however.  I am seeing that the "then" resource is left running.
>> 
>> 
>> I have 2 servers running a fairly basic setup that is fairly close to the one described in the Clusters from Scratch document. Config follows:
>> 
>> node host2
>> node host1
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>        params ip="192.168.0.225" cidr_netmask="32" \
>>        op monitor interval="1s" \
>>        meta target-role="Started"
>> primitive DNSserver lsb:named \
>>        op monitor interval="1s"
>> colocation ip-with-DNSserver inf: DNSserver ClusterIP
>> order DNS-server-after-ip inf: ClusterIP DNSserver
>> property $id="cib-bootstrap-options" \
>>        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
>>        cluster-infrastructure="openais" \
>>        expected-quorum-votes="2" \
>>        stonith-enabled="false" \
>>        no-quorum-policy="ignore" \
>>        last-lrm-refresh="1355268791"
>> rsc_defaults $id="rsc-options" \
>>        resource-stickiness="102"
>> 
>> When the DNSserver resource is migrated from one node to the other and named is already started on the other node (for whatever reason), named is not restarted
> 
> 1) Ordering constraints are behaving as expected, DNSserver is started
> after ClusterIP
> 2) Starting something (DNSserver) that is already started is a no-op
> 3) Don't start cluster services outside of the cluster
> 
> 3 is the root problem in your case

Thank you for your prompt reply.  It sounds as though Pacemaker is operating in the way that you expect in this situation.

Your description of Pacemaker behavior
> 2) Starting something (DNSserver) that is already started is a no-op

differs from behavior described in the documentation
>> -If the first resource is (re)started while the then resource is running, the then resource will be stopped and restarted.
( http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-resource-ordering.html#_mandatory_ordering Section 6.3.1)

Is there a place that I can/should report this discrepancy between actual behavior and behavior described in the documentation?

Thank you.

> 
>> 
>> Dec 14 15:32:28 host1 snmpd[5296]: Connection from UDP: [192.168.0.129]:51000->[192.168.0.93]
>> Dec 14 15:32:40 host1 lrmd: [8733]: info: rsc:ClusterIP:5: start
>> Dec 14 15:32:40 host1 IPaddr2(ClusterIP)[9542]: INFO: ip -f inet addr add 192.168.0.225/32 brd 192.168.0.225 dev eth1
>> Dec 14 15:32:40 host1 IPaddr2(ClusterIP)[9542]: INFO: ip link set eth1 up
>> Dec 14 15:32:40 host1 IPaddr2(ClusterIP)[9542]: INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/se
>> nd_arp-192.168.0.225 eth1 192.168.0.225 auto not_used not_used
>> Dec 14 15:32:41 host1 crmd[8736]:     info: process_lrm_event: LRM operation ClusterIP_start_0 (call=5, rc=0, cib-update=10, co
>> nfirmed=true) ok
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: rsc:ClusterIP:6: monitor
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: rsc:DNSserver:7: start
>> Dec 14 15:32:41 host1 lrmd: [9601]: WARN: For LSB init script, no additional parameters are needed.
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: RA output: (DNSserver:start:stdout) Starting named:
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: RA output: (DNSserver:start:stdout) named: already running
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: RA output: (DNSserver:start:stdout) [  OK
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: RA output: (DNSserver:start:stdout) ]#015
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: RA output: (DNSserver:start:stdout)
>> Dec 14 15:32:41 host1 crmd[8736]:     info: process_lrm_event: LRM operation DNSserver_start_0 (call=7, rc=0, cib-update=11, co
>> nfirmed=true) ok
>> Dec 14 15:32:41 host1 lrmd: [8733]: info: rsc:DNSserver:8: monitor
>> Dec 14 15:32:41 host1 crmd[8736]:     info: process_lrm_event: LRM operation ClusterIP_monitor_1000 (call=6, rc=0, cib-update=1
>> 2, confirmed=false) ok
>> Dec 14 15:32:41 host1 crmd[8736]:     info: process_lrm_event: LRM operation DNSserver_monitor_1000 (call=8, rc=0, cib-update=1
>> 3, confirmed=false) ok
>> 
>> 
>> Are there errors in my config that are keeping the restart from happening?
>> 
>> Thanks in advance.
>> 
>> 
>> -Neal
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121217/6004eaa2/attachment.htm>