[Pacemaker] Pacemaker unnecessarily (?) restarts a vm on active node when other node brought out of standby

Wed May 14 14:34:35 UTC 2014

Andrew Beekhof wrote:
> On 14 May 2014, at 5:23 am, Ian <cl-3627 at jusme.com> wrote:
> 
> Hmmm, master-max=2... I'd bet that is something the code might not be
> handling optimally.
> Can you attach a crm_report tarball for the period covered by your 
> test?

Attached. Sequence was:

[root at sv07 ~]# date; ssh sv06 date
Wed May 14 15:06:23 BST 2014
Wed May 14 15:06:23 BST 2014

[root at sv07 ~]# pcs status
Cluster name: jusme
Last updated: Wed May 14 15:06:35 2014
Last change: Wed May 14 15:02:05 2014 via crm_attribute on sv07
Stack: cman
Current DC: sv07 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
7 Resources configured

Node sv06: standby
Online: [ sv07 ]

Full list of resources:

  Master/Slave Set: vm_storage_core_dev-master [vm_storage_core_dev]
      Masters: [ sv07 ]
      Stopped: [ sv06 ]
  Clone Set: vm_storage_core-clone [vm_storage_core]
      Started: [ sv07 ]
      Stopped: [ sv06 ]
  Master/Slave Set: nfs_server_dev-master [nfs_server_dev]
      Masters: [ sv07 ]
      Stopped: [ sv06 ]
  res_vm_nfs_server      (ocf::heartbeat:VirtualDomain): Started sv07

[root at sv07 ~]# pcs cluster unstandby sv06

[root at sv07 ~]# date; ssh sv06 date
Wed May 14 15:07:18 BST 2014
Wed May 14 15:07:18 BST 2014

[root at sv07 ~]# pcs status
Cluster name: jusme
Last updated: Wed May 14 15:07:29 2014
Last change: Wed May 14 15:06:52 2014 via crm_attribute on sv07
Stack: cman
Current DC: sv07 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
7 Resources configured

Online: [ sv06 sv07 ]

Full list of resources:

  Master/Slave Set: vm_storage_core_dev-master [vm_storage_core_dev]
      Masters: [ sv07 ]
      Slaves: [ sv06 ]
  Clone Set: vm_storage_core-clone [vm_storage_core]
      Started: [ sv07 ]
      Stopped: [ sv06 ]
  Master/Slave Set: nfs_server_dev-master [nfs_server_dev]
      Masters: [ sv07 ]
      Slaves: [ sv06 ]
  res_vm_nfs_server      (ocf::heartbeat:VirtualDomain): Started sv07

## About 1 minute later vm_storage_core_dev gets automatically promoted 
to
## master/master, provoking the unwanted gfs/vm restart...

[root at sv07 ~]# date; ssh sv06 date
Wed May 14 15:08:27 BST 2014
Wed May 14 15:08:27 BST 2014

[root at sv07 ~]# pcs status
Cluster name: jusme
Last updated: Wed May 14 15:08:28 2014
Last change: Wed May 14 15:06:52 2014 via crm_attribute on sv07
Stack: cman
Current DC: sv07 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
7 Resources configured

Online: [ sv06 sv07 ]

Full list of resources:

  Master/Slave Set: vm_storage_core_dev-master [vm_storage_core_dev]
      Masters: [ sv06 sv07 ]
  Clone Set: vm_storage_core-clone [vm_storage_core]
      Started: [ sv06 sv07 ]
  Master/Slave Set: nfs_server_dev-master [nfs_server_dev]
      Masters: [ sv07 ]
      Slaves: [ sv06 ]
  res_vm_nfs_server      (ocf::heartbeat:VirtualDomain): Started sv07

[root at sv07 ~]# crm_report -f "2014-05-14 15:05:00" report-20140514-1

Ian.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: report-20140514-1.tar.bz2
Type: application/x-bzip2
Size: 227398 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140514/792131fa/attachment-0004.bz2>