[Pacemaker] Issue in deleting multi state resource

Wed May 21 01:13:36 EDT 2014

Andrew,
1.  Is there a workaround for this issue ?
2. Also, can let me know if there are more issues with old versions in
deleting multistate resource as mentioned in
http://www.gossamer-threads.com/lists/linuxha/pacemaker/91230

Regards,
 Kiran

On Wed, May 21, 2014 at 9:44 AM, Andrew Beekhof <andrew at beekhof.net> wrote:

>
> On 19 May 2014, at 5:43 pm, K Mehta <kiranmehta1981 at gmail.com> wrote:
>
> > Please see my reply inline. Attached is the crm_report output.
> >
> >
> > On Thu, May 8, 2014 at 5:45 AM, Andrew Beekhof <andrew at beekhof.net>
> wrote:
> >
> > On 8 May 2014, at 12:38 am, K Mehta <kiranmehta1981 at gmail.com> wrote:
> >
> > > I created a multi state resource
> ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> (vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2).
> > >
> > > Here is the configuration:
> > > ==========================
> > > [root at vsanqa11 ~]# pcs config
> > > Cluster Name: vsanqa11_12
> > > Corosync Nodes:
> > >
> > > Pacemaker Nodes:
> > >  vsanqa11 vsanqa12
> > >
> > > Resources:
> > >  Master: ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >   Meta Attrs: clone-max=2 globally-unique=false target-role=started
> > >   Resource: vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2 (class=ocf
> provider=heartbeat type=vgc-cm-agent.ocf)
> > >    Attributes: cluster_uuid=2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >    Operations: monitor interval=30s role=Master timeout=100s
> (vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-monitor-interval-30s)
> > >                monitor interval=31s role=Slave timeout=100s
> (vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-monitor-interval-31s)
> > >
> > > Stonith Devices:
> > > Fencing Levels:
> > >
> > > Location Constraints:
> > >   Resource: ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >     Enabled on: vsanqa11 (score:INFINITY)
> (id:location-ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa11-INFINITY)
> > >     Enabled on: vsanqa12 (score:INFINITY)
> (id:location-ms-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa12-INFINITY)
> > >   Resource: vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2
> > >     Enabled on: vsanqa11 (score:INFINITY)
> (id:location-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa11-INFINITY)
> > >     Enabled on: vsanqa12 (score:INFINITY)
> (id:location-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2-vsanqa12-INFINITY)
> > > Ordering Constraints:
> > > Colocation Constraints:
> > >
> > > Cluster Properties:
> > >  cluster-infrastructure: cman
> > >  dc-version: 1.1.10-14.el6_5.2-368c726
> > >  last-lrm-refresh: 1399466204
> > >  no-quorum-policy: ignore
> > >  stonith-enabled: false
> > >
> > > ==============================================
> > > When i try to create and delete this resource in a loop,
> >
> > Why would you do that? :-)
> >
> > Just to test if things are fine if resource is created and deleted in
> quick succession. But the issue is also seen arbitrarily. Issue is
> sometimes seen even in first iteration of the loop.
> >
> > > after few iteration, delete fails as shown below. This can be
> reproduced easily. I make sure to unclone resource before deleting the
> resource. Unclone happens successfully
>
> [snip]
>
> > > May  7 07:20:13 vsanqa12 attrd[4317]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for:
> master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2 (<null>)
> > > May  7 07:20:13 vsanqa12 attrd[4317]:   notice: attrd_perform_update:
> Sent delete 4404: node=vsanqa12,
> attr=master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2, id=<n/a>, set=(null),
> section=status
> > > May  7 07:20:13 vsanqa12 crmd[4319]:   notice: process_lrm_event: LRM
> operation vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2_stop_0 (call=1379, rc=0,
> cib-update=1161, confirmed=true) ok
> > > May  7 07:20:13 vsanqa12 attrd[4317]:   notice: attrd_perform_update:
> Sent delete 4406: node=vsanqa12,
> attr=master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2, id=<n/a>, set=(null),
> section=status
> > > May  7 07:20:13 vsanqa12 attrd[4317]:  warning: attrd_cib_callback:
> Update 4404 for master-vha-2be6c088-a1fa-464a-b00d-f4bccb4f5af2=(null)
> failed: Application of an update diff failed
> > > May  7 07:20:13 vsanqa12 cib[4314]:  warning: cib_process_diff: Diff
> 0.6804.2 -> 0.6804.3 from vsanqa11 not applied to 0.6804.2: Failed
> application of an update diff
> > > May  7 07:20:13 vsanqa12 cib[4314]:   notice: cib_server_process_diff:
> Not applying diff 0.6804.3 -> 0.6804.4 (sync in progress)
>
>
> Ah. Now I recognise this :-(
>
> First the good news, this will be fixed when the new CIB code arrives in
> 6.6
>
> The way the old cib works is that one node makes the change and sends it
> out as a patch to the other nodes.
> Great in theory except the old patch format wasn't real great at
> preserving ordering changes - but it can detect them, hence:
>
> > May  7 07:20:13 vsanqa12 cib[4314]:  warning: cib_process_diff: Diff
> 0.6804.2 -> 0.6804.3 from vsanqa11 not applied to 0.6804.2: Failed
> application of an update diff
>
> The cib does recover, but the operation is reported as having failed to
> pcs.
>
> We are considering a couple of options that may make it into 6.5
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140521/98e1be16/attachment-0003.html>