[Pacemaker] Stopping resource using pcs

Mon Mar 3 06:40:47 EST 2014

Has no one ever faced this issue ?

On Fri, Feb 28, 2014 at 11:51 PM, K Mehta <kiranmehta1981 at gmail.com> wrote:

> Yes, the issue is seen only with multi state resource. Non multi state
> resource work fine. Looks like is_resource_started function in utils.py
> does not compare resource name properly. Let fs be the resource name.
> is_resource_started compares fs with fs:0 and fs:1 and hence match is not
> found and false is returned.
>
>
> def resource_disable(argv):
>     if len(argv) < 1:
>         utils.err("You must specify a resource to disable")
>
>     resource = argv[0]
>     args = ["crm_resource", "-r", argv[0], "-m", "-p", "target-role",
> "-v", "Stopped"]
>     output, retval = utils.run(args)
>     if retval != 0:
>         utils.err(output)
>
>     if "--wait" in utils.pcs_options:
>         wait = utils.pcs_options["--wait"]
>         if not wait.isdigit():
>             utils.err("%s is not a valid number of seconds to wait" % wait)
>             sys.exit(1)
>         did_stop = utils.is_resource_started(resource,int(wait),True) <<<
> did_stop is false
>
>         if did_stop:
>             return True
>         else:
>             utils.err("unable to stop: '%s', please check logs for failure
> information" % resource)
>
>
>
> def is_resource_started(resource,wait,stopped=False):
>     expire_time = int(time.time()) + wait
>     while True:
>         state = getClusterState()
>         resources = state.getElementsByTagName("resource")
>         for res in resources:
>             if res.getAttribute("id") == resource:  <<<< never succeeds
>                 if (res.getAttribute("role") == "Started" and not stopped)
> or (res.getAttribute("role") == "Stopped" and stopped):
>                     return True
>                 break
>         if (expire_time < int(time.time())):
>             break
>         time.sleep(1)
>     return False    <<< False is returned
>
>
>
>
>
> On Fri, Feb 28, 2014 at 10:49 PM, David Vossel <dvossel at redhat.com> wrote:
>
>>
>>
>>
>>
>> ----- Original Message -----
>> > From: "K Mehta" <kiranmehta1981 at gmail.com>
>> > To: "The Pacemaker cluster resource manager" <
>> pacemaker at oss.clusterlabs.org>
>> > Sent: Friday, February 28, 2014 7:05:47 AM
>> > Subject: Re: [Pacemaker] Stopping resource using pcs
>> >
>> > Can anyone tell me why --wait parameter always causes pcs resource
>> disable to
>> > return failure though resource actually stops within time ?
>>
>> does it only show an error with multi-state resources?  It is probably a
>> bug.
>>
>> -- Vossel
>>
>> >
>> >
>> > On Wed, Feb 26, 2014 at 10:45 PM, K Mehta < kiranmehta1981 at gmail.com >
>> wrote:
>> >
>> >
>> >
>> > Deleting master resource id does not work. I see the same issue.
>> > However, uncloning helps. Delete works after disabling and uncloning.
>> >
>> > I see anissue in using --wait option with disable. Resources moves into
>> > stopped state but still error an error message is printed.
>> > When --wait option is not provided, error message is not seen
>> >
>> > [root at sys11 ~]# pcs resource
>> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > Masters: [ sys11 ]
>> > Slaves: [ sys12 ]
>> > [root at sys11 ~]# pcs resource disable
>> ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > --wait
>> > Error: unable to stop: 'ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8', please
>> > check logs for failure information
>> > [root at sys11 ~]# pcs resource
>> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
>> > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
>> > [root at sys11 ~]# pcs resource disable
>> ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > --wait
>> > Error: unable to stop: 'ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8', please
>> > check logs for failure information <<<<<error message
>> > [root at sys11 ~]# pcs resource enable
>> ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > [root at sys11 ~]# pcs resource
>> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > Masters: [ sys11 ]
>> > Slaves: [ sys12 ]
>> > [root at sys11 ~]# pcs resource disable
>> ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > [root at sys11 ~]# pcs resource
>> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
>> > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Feb 26, 2014 at 8:55 PM, David Vossel < dvossel at redhat.com >
>> wrote:
>> >
>> >
>> >
>> > ----- Original Message -----
>> > > From: "Frank Brendel" < frank.brendel at eurolog.com >
>> > > To: pacemaker at oss.clusterlabs.org
>> > > Sent: Wednesday, February 26, 2014 8:53:19 AM
>> > > Subject: Re: [Pacemaker] Stopping resource using pcs
>> > >
>> > > I guess we need some real experts here.
>> > >
>> > > I think it's because you're attempting to delete the resource and not
>> the
>> > > Master.
>> > > Try deleting the Master instead of the resource.
>> >
>> > Yes, delete the Master resource id, not the primitive resource within
>> the
>> > master. When using pcs, you should always refer to the resource's top
>> most
>> > parent id, not the id of the children resources within the parent. If
>> you
>> > make a resource a clone, start using the clone id. Same with master. If
>> you
>> > add a resource to a group, reference the group id from then on and not
>> any
>> > of the children resources within the group.
>> >
>> > As a general practice, it is always better to stop a resource (pcs
>> resource
>> > disable) and only delete the resource after the stop has completed.
>> >
>> > This is especially important for group resources where stop order
>> matters. If
>> > you delete a group, then we have no information on what order to stop
>> the
>> > resources in that group. This can cause stop failures when the orphaned
>> > resources are cleaned up.
>> >
>> > Recently pcs gained the ability to attempt to stop resources before
>> deleting
>> > them in order to avoid scenarios like i described above. Pcs will block
>> for
>> > a period of time waiting for the resource to stop before deleting it.
>> Even
>> > with this logic in place it is preferred to stop the resource manually
>> then
>> > delete the resource once you have verified it stopped.
>> >
>> > -- Vossel
>> >
>> > >
>> > > I had a similar problem with a cloned group and solved it by
>> un-cloning
>> > > before deleting the group.
>> > > Maybe un-cloning the multi-state resource could help too.
>> > > It's easy to reproduce.
>> > >
>> > > # pcs resource create resPing ping host_list="10.0.0.1 10.0.0.2" op
>> monitor
>> > > on-fail="restart"
>> > > # pcs resource group add groupPing resPing
>> > > # pcs resource clone groupPing clone-max=3 clone-node-max=1
>> > > # pcs resource
>> > > Clone Set: groupPing-clone [groupPing]
>> > > Started: [ node1 node2 node3 ]
>> > > # pcs resource delete groupPing-clone
>> > > Deleting Resource (and group) - resPing
>> > > Error: Unable to remove resource 'resPing' (do constraints exist?)
>> > > # pcs resource unclone groupPing
>> > > # pcs resource delete groupPing
>> > > Removing group: groupPing (and all resources within group)
>> > > Stopping all resources in group: groupPing...
>> > > Deleting Resource (and group) - resPing
>> > >
>> > > Log:
>> > > Feb 26 15:43:16 node1 cibadmin[2368]: notice: crm_log_args: Invoked:
>> > > /usr/sbin/cibadmin -o resources -D --xml-text <group
>> id="groupPing">#012
>> > > <primitive class="ocf" id="resPing" provider="pacemaker"
>> type="ping">#012
>> > > <instance_attributes id="resPing-instance_attributes">#012 <nvpair
>> > > id="resPing-instance_attributes-host_list" name="host_list"
>> value="10.0.0.1
>> > > 10.0.0.2"/>#012 </instance_attributes>#012 <operations>#012 <op
>> > > id="resPing-monitor-on-fail-restart" interval="60s" name="monitor"
>> > > on-fail="restart"/>#012 </operations>#012 </primi
>> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Expecting an element
>> > > meta_attributes, got nothing
>> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Invalid sequence in
>> > > interleave
>> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Element clone failed
>> to
>> > > validate content
>> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Element resources has
>> > > extra
>> > > content: primitive
>> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Invalid sequence in
>> > > interleave
>> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Element cib failed to
>> > > validate content
>> > > Feb 26 15:43:16 node1 cib[1820]: warning: cib_perform_op: Updated CIB
>> does
>> > > not validate against pacemaker-1.2 schema/dtd
>> > > Feb 26 15:43:16 node1 cib[1820]: warning: cib_diff_notify: Update
>> (client:
>> > > cibadmin, call:2): 0.516.7 -> 0.517.1 (Update does not conform to the
>> > > configured schema)
>> > > Feb 26 15:43:16 node1 stonith-ng[1821]: warning: update_cib_cache_cb:
>> > > [cib_diff_notify] ABORTED: Update does not conform to the configured
>> schema
>> > > (-203)
>> > > Feb 26 15:43:16 node1 cib[1820]: warning: cib_process_request:
>> Completed
>> > > cib_delete operation for section resources: Update does not conform
>> to the
>> > > configured schema (rc=-203, origin=local/cibadmin/2, version=0.516.7)
>> > >
>> > >
>> > > Frank
>> > >
>> > > Am 26.02.2014 15 :00, schrieb K Mehta:
>> > >
>> > >
>> > >
>> > > Here is the config and output of few commands
>> > >
>> > > [root at sys11 ~]# pcs config
>> > > Cluster Name: kpacemaker1.1
>> > > Corosync Nodes:
>> > >
>> > > Pacemaker Nodes:
>> > > sys11 sys12
>> > >
>> > > Resources:
>> > > Master: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > Meta Attrs: clone-max=2 globally-unique=false target-role=Started
>> > > Resource: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8 (class=ocf
>> > > provider=heartbeat type=vgc-cm-agent.ocf)
>> > > Attributes: cluster_uuid=de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > Operations: monitor interval=30s role=Master timeout=100s
>> > > (vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-monitor-interval-30s)
>> > > monitor interval=31s role=Slave timeout=100s
>> > > (vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-monitor-interval-31s)
>> > >
>> > > Stonith Devices:
>> > > Fencing Levels:
>> > >
>> > > Location Constraints:
>> > > Resource: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > Enabled on: sys11 (score:200)
>> > > (id:location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200)
>> > > Enabled on: sys12 (score:200)
>> > > (id:location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200)
>> > > Resource: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > Enabled on: sys11 (score:200)
>> > > (id:location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200)
>> > > Enabled on: sys12 (score:200)
>> > > (id:location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200)
>> > > Ordering Constraints:
>> > > Colocation Constraints:
>> > >
>> > > Cluster Properties:
>> > > cluster-infrastructure: cman
>> > > dc-version: 1.1.8-7.el6-394e906
>> > > no-quorum-policy: ignore
>> > > stonith-enabled: false
>> > > symmetric-cluster: false
>> > >
>> > >
>> > >
>> > > [root at sys11 ~]# pcs resource
>> > > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > > Masters: [ sys11 ]
>> > > Slaves: [ sys12 ]
>> > >
>> > >
>> > >
>> > > [root at sys11 ~]# pcs resource disable
>> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > >
>> > > [root at sys11 ~]# pcs resource
>> > > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
>> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
>> > >
>> > >
>> > > [root at sys11 ~]# pcs resource delete
>> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > Removing Constraint -
>> > > location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200
>> > > Removing Constraint -
>> > > location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200
>> > > Removing Constraint -
>> > > location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200
>> > > Removing Constraint -
>> > > location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200
>> > > Attempting to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8...Error:
>> > > Unable
>> > > to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8 before deleting
>> (re-run
>> > > with --force to force deletion)
>> > >
>> > >
>> > > [root at sys11 ~]# pcs resource delete
>> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > Attempting to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8...Error:
>> > > Unable
>> > > to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8 before deleting
>> (re-run
>> > > with --force to force deletion)
>> > >
>> > > [root at sys11 ~]# pcs resource
>> > > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
>> > > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
>> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
>> > >
>> > > [root at sys11 ~]# pcs resource delete
>> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > --force
>> > > Deleting Resource - vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
>> > > [root at sys11 ~]# pcs resource
>> > > NO resources configured
>> > > [root at sys11 ~]#
>> > >
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >
>> > > Project Home: http://www.clusterlabs.org
>> > > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > > Bugs: http://bugs.clusterlabs.org
>> > >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140303/4f07785e/attachment-0002.html>