[Pacemaker] Stopping resource using pcs
Andrew Beekhof
andrew at beekhof.net
Wed Mar 5 00:08:41 UTC 2014
On 3 Mar 2014, at 10:40 pm, K Mehta <kiranmehta1981 at gmail.com> wrote:
> Has no one ever faced this issue ?
>
>
> On Fri, Feb 28, 2014 at 11:51 PM, K Mehta <kiranmehta1981 at gmail.com> wrote:
> Yes, the issue is seen only with multi state resource. Non multi state resource work fine. Looks like is_resource_started function in utils.py does not compare resource name properly. Let fs be the resource name.
_is_ that the resource name though?
From one of your earlier examples:
> > > # pcs resource clone groupPing clone-max=3 clone-node-max=1
> > > # pcs resource
> > > Clone Set: groupPing-clone [groupPing]
The name you should be passing to pcs is groupPing-clone, _not_ groupPing.
Please ignore me if you are already doing this. In that case its a bug for Chris.
Chris: let me know if we're not exposing something you need.
> is_resource_started compares fs with fs:0 and fs:1 and hence match is not found and false is returned.
>
>
> def resource_disable(argv):
> if len(argv) < 1:
> utils.err("You must specify a resource to disable")
>
> resource = argv[0]
> args = ["crm_resource", "-r", argv[0], "-m", "-p", "target-role", "-v", "Stopped"]
> output, retval = utils.run(args)
> if retval != 0:
> utils.err(output)
>
> if "--wait" in utils.pcs_options:
> wait = utils.pcs_options["--wait"]
> if not wait.isdigit():
> utils.err("%s is not a valid number of seconds to wait" % wait)
> sys.exit(1)
> did_stop = utils.is_resource_started(resource,int(wait),True) <<< did_stop is false
>
> if did_stop:
> return True
> else:
> utils.err("unable to stop: '%s', please check logs for failure information" % resource)
>
>
>
> def is_resource_started(resource,wait,stopped=False):
> expire_time = int(time.time()) + wait
> while True:
> state = getClusterState()
> resources = state.getElementsByTagName("resource")
> for res in resources:
> if res.getAttribute("id") == resource: <<<< never succeeds
> if (res.getAttribute("role") == "Started" and not stopped) or (res.getAttribute("role") == "Stopped" and stopped):
> return True
> break
> if (expire_time < int(time.time())):
> break
> time.sleep(1)
> return False <<< False is returned
>
>
>
>
>
> On Fri, Feb 28, 2014 at 10:49 PM, David Vossel <dvossel at redhat.com> wrote:
>
>
>
>
> ----- Original Message -----
> > From: "K Mehta" <kiranmehta1981 at gmail.com>
> > To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> > Sent: Friday, February 28, 2014 7:05:47 AM
> > Subject: Re: [Pacemaker] Stopping resource using pcs
> >
> > Can anyone tell me why --wait parameter always causes pcs resource disable to
> > return failure though resource actually stops within time ?
>
> does it only show an error with multi-state resources? It is probably a bug.
>
> -- Vossel
>
> >
> >
> > On Wed, Feb 26, 2014 at 10:45 PM, K Mehta < kiranmehta1981 at gmail.com > wrote:
> >
> >
> >
> > Deleting master resource id does not work. I see the same issue.
> > However, uncloning helps. Delete works after disabling and uncloning.
> >
> > I see anissue in using --wait option with disable. Resources moves into
> > stopped state but still error an error message is printed.
> > When --wait option is not provided, error message is not seen
> >
> > [root at sys11 ~]# pcs resource
> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > Masters: [ sys11 ]
> > Slaves: [ sys12 ]
> > [root at sys11 ~]# pcs resource disable ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > --wait
> > Error: unable to stop: 'ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8', please
> > check logs for failure information
> > [root at sys11 ~]# pcs resource
> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
> > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
> > [root at sys11 ~]# pcs resource disable ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > --wait
> > Error: unable to stop: 'ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8', please
> > check logs for failure information <<<<<error message
> > [root at sys11 ~]# pcs resource enable ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > [root at sys11 ~]# pcs resource
> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > Masters: [ sys11 ]
> > Slaves: [ sys12 ]
> > [root at sys11 ~]# pcs resource disable ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > [root at sys11 ~]# pcs resource
> > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
> > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
> >
> >
> >
> >
> >
> > On Wed, Feb 26, 2014 at 8:55 PM, David Vossel < dvossel at redhat.com > wrote:
> >
> >
> >
> > ----- Original Message -----
> > > From: "Frank Brendel" < frank.brendel at eurolog.com >
> > > To: pacemaker at oss.clusterlabs.org
> > > Sent: Wednesday, February 26, 2014 8:53:19 AM
> > > Subject: Re: [Pacemaker] Stopping resource using pcs
> > >
> > > I guess we need some real experts here.
> > >
> > > I think it's because you're attempting to delete the resource and not the
> > > Master.
> > > Try deleting the Master instead of the resource.
> >
> > Yes, delete the Master resource id, not the primitive resource within the
> > master. When using pcs, you should always refer to the resource's top most
> > parent id, not the id of the children resources within the parent. If you
> > make a resource a clone, start using the clone id. Same with master. If you
> > add a resource to a group, reference the group id from then on and not any
> > of the children resources within the group.
> >
> > As a general practice, it is always better to stop a resource (pcs resource
> > disable) and only delete the resource after the stop has completed.
> >
> > This is especially important for group resources where stop order matters. If
> > you delete a group, then we have no information on what order to stop the
> > resources in that group. This can cause stop failures when the orphaned
> > resources are cleaned up.
> >
> > Recently pcs gained the ability to attempt to stop resources before deleting
> > them in order to avoid scenarios like i described above. Pcs will block for
> > a period of time waiting for the resource to stop before deleting it. Even
> > with this logic in place it is preferred to stop the resource manually then
> > delete the resource once you have verified it stopped.
> >
> > -- Vossel
> >
> > >
> > > I had a similar problem with a cloned group and solved it by un-cloning
> > > before deleting the group.
> > > Maybe un-cloning the multi-state resource could help too.
> > > It's easy to reproduce.
> > >
> > > # pcs resource create resPing ping host_list="10.0.0.1 10.0.0.2" op monitor
> > > on-fail="restart"
> > > # pcs resource group add groupPing resPing
> > > # pcs resource clone groupPing clone-max=3 clone-node-max=1
> > > # pcs resource
> > > Clone Set: groupPing-clone [groupPing]
> > > Started: [ node1 node2 node3 ]
> > > # pcs resource delete groupPing-clone
> > > Deleting Resource (and group) - resPing
> > > Error: Unable to remove resource 'resPing' (do constraints exist?)
> > > # pcs resource unclone groupPing
> > > # pcs resource delete groupPing
> > > Removing group: groupPing (and all resources within group)
> > > Stopping all resources in group: groupPing...
> > > Deleting Resource (and group) - resPing
> > >
> > > Log:
> > > Feb 26 15:43:16 node1 cibadmin[2368]: notice: crm_log_args: Invoked:
> > > /usr/sbin/cibadmin -o resources -D --xml-text <group id="groupPing">#012
> > > <primitive class="ocf" id="resPing" provider="pacemaker" type="ping">#012
> > > <instance_attributes id="resPing-instance_attributes">#012 <nvpair
> > > id="resPing-instance_attributes-host_list" name="host_list" value="10.0.0.1
> > > 10.0.0.2"/>#012 </instance_attributes>#012 <operations>#012 <op
> > > id="resPing-monitor-on-fail-restart" interval="60s" name="monitor"
> > > on-fail="restart"/>#012 </operations>#012 </primi
> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Expecting an element
> > > meta_attributes, got nothing
> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Invalid sequence in
> > > interleave
> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Element clone failed to
> > > validate content
> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Element resources has
> > > extra
> > > content: primitive
> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Invalid sequence in
> > > interleave
> > > Feb 26 15:43:16 node1 cib[1820]: error: xml_log: Element cib failed to
> > > validate content
> > > Feb 26 15:43:16 node1 cib[1820]: warning: cib_perform_op: Updated CIB does
> > > not validate against pacemaker-1.2 schema/dtd
> > > Feb 26 15:43:16 node1 cib[1820]: warning: cib_diff_notify: Update (client:
> > > cibadmin, call:2): 0.516.7 -> 0.517.1 (Update does not conform to the
> > > configured schema)
> > > Feb 26 15:43:16 node1 stonith-ng[1821]: warning: update_cib_cache_cb:
> > > [cib_diff_notify] ABORTED: Update does not conform to the configured schema
> > > (-203)
> > > Feb 26 15:43:16 node1 cib[1820]: warning: cib_process_request: Completed
> > > cib_delete operation for section resources: Update does not conform to the
> > > configured schema (rc=-203, origin=local/cibadmin/2, version=0.516.7)
> > >
> > >
> > > Frank
> > >
> > > Am 26.02.2014 15 :00, schrieb K Mehta:
> > >
> > >
> > >
> > > Here is the config and output of few commands
> > >
> > > [root at sys11 ~]# pcs config
> > > Cluster Name: kpacemaker1.1
> > > Corosync Nodes:
> > >
> > > Pacemaker Nodes:
> > > sys11 sys12
> > >
> > > Resources:
> > > Master: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > Meta Attrs: clone-max=2 globally-unique=false target-role=Started
> > > Resource: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8 (class=ocf
> > > provider=heartbeat type=vgc-cm-agent.ocf)
> > > Attributes: cluster_uuid=de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > Operations: monitor interval=30s role=Master timeout=100s
> > > (vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-monitor-interval-30s)
> > > monitor interval=31s role=Slave timeout=100s
> > > (vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-monitor-interval-31s)
> > >
> > > Stonith Devices:
> > > Fencing Levels:
> > >
> > > Location Constraints:
> > > Resource: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > Enabled on: sys11 (score:200)
> > > (id:location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200)
> > > Enabled on: sys12 (score:200)
> > > (id:location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200)
> > > Resource: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > Enabled on: sys11 (score:200)
> > > (id:location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200)
> > > Enabled on: sys12 (score:200)
> > > (id:location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200)
> > > Ordering Constraints:
> > > Colocation Constraints:
> > >
> > > Cluster Properties:
> > > cluster-infrastructure: cman
> > > dc-version: 1.1.8-7.el6-394e906
> > > no-quorum-policy: ignore
> > > stonith-enabled: false
> > > symmetric-cluster: false
> > >
> > >
> > >
> > > [root at sys11 ~]# pcs resource
> > > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > > Masters: [ sys11 ]
> > > Slaves: [ sys12 ]
> > >
> > >
> > >
> > > [root at sys11 ~]# pcs resource disable
> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > >
> > > [root at sys11 ~]# pcs resource
> > > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
> > >
> > >
> > > [root at sys11 ~]# pcs resource delete
> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > Removing Constraint -
> > > location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200
> > > Removing Constraint -
> > > location-ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200
> > > Removing Constraint -
> > > location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys11-200
> > > Removing Constraint -
> > > location-vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8-sys12-200
> > > Attempting to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8...Error:
> > > Unable
> > > to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8 before deleting (re-run
> > > with --force to force deletion)
> > >
> > >
> > > [root at sys11 ~]# pcs resource delete
> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > Attempting to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8...Error:
> > > Unable
> > > to stop: vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8 before deleting (re-run
> > > with --force to force deletion)
> > >
> > > [root at sys11 ~]# pcs resource
> > > Master/Slave Set: ms-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > [vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8]
> > > Stopped: [ vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:0
> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8:1 ]
> > >
> > > [root at sys11 ~]# pcs resource delete
> > > vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > --force
> > > Deleting Resource - vha-de5566b1-c2a3-4dc6-9712-c82bb43f19d8
> > > [root at sys11 ~]# pcs resource
> > > NO resources configured
> > > [root at sys11 ~]#
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140305/f436eb86/attachment-0004.sig>
More information about the Pacemaker
mailing list