[Pacemaker] Issue in deleting multi state resource

Thu May 22 06:32:57 EDT 2014

On 21 May 2014, at 3:13 pm, K Mehta <kiranmehta1981 at gmail.com> wrote:

> Andrew,
> 1.  Is there a workaround for this issue ? '

For now its basically just "retry the command"

> 2. Also, can let me know if there are more issues with old versions in deleting multistate resource as mentioned in http://www.gossamer-threads.com/lists/linuxha/pacemaker/91230

that looks like an issue with pcs not removing constraints that reference the resource you're trying to delete

> 
> Regards,
>  Kiran
> 
> 
> On Wed, May 21, 2014 at 9:44 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> On 19 May 2014, at 5:43 pm, K Mehta <kiranmehta1981 at gmail.com> wrote:
> 
> > Please see my reply inline. Attached is the crm_report output.
> >
> >
> > On Thu, May 8, 2014 at 5:45 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> >
> > On 8 May 2014, at 12:38 am, K Mehta <kiranmehta1981 at gmail.com> wrote:
> >
> > > I created a multi state resource ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2 (vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2).
> > >
> > > Here is the configuration:
> > > ==========================
> > > [root at vsanqa11 ~]# pcs config
> > > Cluster Name: vsanqa11_12
> > > Corosync Nodes:
> > >
> > > Pacemaker Nodes:
> > >  vsanqa11 vsanqa12
> > >
> > > Resources:
> > >  Master: ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >   Meta Attrs: clone-max=2 globally-unique=false target-role=started
> > >   Resource: vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2 (class=ocf provider=heartbeat type=vgc-cm-agent.ocf)
> > >    Attributes: cluster_uuid=2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >    Operations: monitor interval=30s role=Master timeout=100s (vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-monitor-interval-30s)
> > >                monitor interval=31s role=Slave timeout=100s (vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-monitor-interval-31s)
> > >
> > > Stonith Devices:
> > > Fencing Levels:
> > >
> > > Location Constraints:
> > >   Resource: ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >     Enabled on: vsanqa11 (score:INFINITY) (id:location-ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa11-INFINITY)
> > >     Enabled on: vsanqa12 (score:INFINITY) (id:location-ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa12-INFINITY)
> > >   Resource: vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >     Enabled on: vsanqa11 (score:INFINITY) (id:location-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa11-INFINITY)
> > >     Enabled on: vsanqa12 (score:INFINITY) (id:location-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa12-INFINITY)
> > > Ordering Constraints:
> > > Colocation Constraints:
> > >
> > > Cluster Properties:
> > >  cluster-infrastructure: cman
> > >  dc-version: 1.1.10-14.el6_5.2-368c726
> > >  last-lrm-refresh: 1399466204
> > >  no-quorum-policy: ignore
> > >  stonith-enabled: false
> > >
> > > ==============================================
> > > When i try to create and delete this resource in a loop,
> >
> > Why would you do that? :-)
> >
> > Just to test if things are fine if resource is created and deleted in quick succession. But the issue is also seen arbitrarily. Issue is sometimes seen even in first iteration of the loop.
> >
> > > after few iteration, delete fails as shown below. This can be reproduced easily. I make sure to unclone resource before deleting the resource. Unclone happens successfully
> 
> [snip]
> 
> > > May  7 07:20:13 vsanqa12 attrd[4317]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2 (<null>)
> > > May  7 07:20:13 vsanqa12 attrd[4317]:   notice: attrd_perform_update: Sent delete 4404: node=vsanqa12, attr=master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2, id=<n/a>, set=(null), section=status
> > > May  7 07:20:13 vsanqa12 crmd[4319]:   notice: process_lrm_event: LRM operation vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2_stop_0 (call=1379, rc=0, cib-update=1161, confirmed=true) ok
> > > May  7 07:20:13 vsanqa12 attrd[4317]:   notice: attrd_perform_update: Sent delete 4406: node=vsanqa12, attr=master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2, id=<n/a>, set=(null), section=status
> > > May  7 07:20:13 vsanqa12 attrd[4317]:  warning: attrd_cib_callback: Update 4404 for master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2=(null) failed: Application of an update diff failed
> > > May  7 07:20:13 vsanqa12 cib[4314]:  warning: cib_process_diff: Diff 0.6804.2 -> 0.6804.3 from vsanqa11 not applied to 0.6804.2: Failed application of an update diff
> > > May  7 07:20:13 vsanqa12 cib[4314]:   notice: cib_server_process_diff: Not applying diff 0.6804.3 -> 0.6804.4 (sync in progress)
> 
> 
> Ah. Now I recognise this :-(
> 
> First the good news, this will be fixed when the new CIB code arrives in 6.6
> 
> The way the old cib works is that one node makes the change and sends it out as a patch to the other nodes.
> Great in theory except the old patch format wasn't real great at preserving ordering changes - but it can detect them, hence:
> 
> > May  7 07:20:13 vsanqa12 cib[4314]:  warning: cib_process_diff: Diff 0.6804.2 -> 0.6804.3 from vsanqa11 not applied to 0.6804.2: Failed application of an update diff
> 
> The cib does recover, but the operation is reported as having failed to pcs.
> 
> We are considering a couple of options that may make it into 6.5
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140522/b1244a5f/attachment-0003.sig>