[Pacemaker] [Problem] Fail-over is delayed.(State transition is not calculated.)

Andrew Beekhof andrew at beekhof.net
Tue Feb 18 22:16:17 EST 2014


I'll follow up on the bug.

On 19 Feb 2014, at 10:55 am, renayama19661014 at ybb.ne.jp wrote:

> Hi David,
> 
> Thank you for comments.
> 
>> You have resource-stickiness=INFINITY, this is what is preventing the failover from occurring. Set resource-stickiness=1 or 0 and the failover should occur.
>> 
> 
> However, the resource moves by a calculation of the next state transition.
> By a calculation of the first trouble, can it not travel the resource?
> 
> In addition, the resource moves when the resource deletes next colocation.
> 
> colocation rsc_colocation-master-3 INFINITY: vip-rep msPostgresql:Master
> 
> There is the problem with handling of colocation of some Pacemaker?
> 
> Best Regards,
> Hideo Yamauchi.
> 
> --- On Wed, 2014/2/19, David Vossel <dvossel at redhat.com> wrote:
> 
>> 
>> ----- Original Message -----
>>> From: renayama19661014 at ybb.ne.jp
>>> To: "PaceMaker-ML" <pacemaker at oss.clusterlabs.org>
>>> Sent: Monday, February 17, 2014 7:06:53 PM
>>> Subject: [Pacemaker] [Problem] Fail-over is delayed.(State transition is not    calculated.)
>>> 
>>> Hi All,
>>> 
>>> I confirmed movement at the time of the trouble in one of Master/Slave in
>>> Pacemaker1.1.11.
>>> 
>>> -------------------------------------
>>> 
>>> Step1) Constitute a cluster.
>>> 
>>> [root at srv01 ~]# crm_mon -1 -Af
>>> Last updated: Tue Feb 18 18:07:24 2014
>>> Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
>>> Stack: corosync
>>> Current DC: srv01 (3232238180) - partition with quorum
>>> Version: 1.1.10-9d39a6b
>>> 2 Nodes configured
>>> 6 Resources configured
>>> 
>>> 
>>> Online: [ srv01 srv02 ]
>>> 
>>>   vip-master     (ocf::heartbeat:Dummy): Started srv01
>>>   vip-rep        (ocf::heartbeat:Dummy): Started srv01
>>>   Master/Slave Set: msPostgresql [pgsql]
>>>       Masters: [ srv01 ]
>>>       Slaves: [ srv02 ]
>>>   Clone Set: clnPingd [prmPingd]
>>>       Started: [ srv01 srv02 ]
>>> 
>>> Node Attributes:
>>> * Node srv01:
>>>      + default_ping_set                  : 100
>>>      + master-pgsql                      : 10
>>> * Node srv02:
>>>      + default_ping_set                  : 100
>>>      + master-pgsql                      : 5
>>> 
>>> Migration summary:
>>> * Node srv01:
>>> * Node srv02:
>>> 
>>> Step2) Monitor error in vip-master.
>>> 
>>> [root at srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state
>>> 
>>> [root at srv01 ~]# crm_mon -1 -Af
>>> Last updated: Tue Feb 18 18:07:58 2014
>>> Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
>>> Stack: corosync
>>> Current DC: srv01 (3232238180) - partition with quorum
>>> Version: 1.1.10-9d39a6b
>>> 2 Nodes configured
>>> 6 Resources configured
>>> 
>>> 
>>> Online: [ srv01 srv02 ]
>>> 
>>>   Master/Slave Set: msPostgresql [pgsql]
>>>       Masters: [ srv01 ]
>>>       Slaves: [ srv02 ]
>>>   Clone Set: clnPingd [prmPingd]
>>>       Started: [ srv01 srv02 ]
>>> 
>>> Node Attributes:
>>> * Node srv01:
>>>      + default_ping_set                  : 100
>>>      + master-pgsql                      : 10
>>> * Node srv02:
>>>      + default_ping_set                  : 100
>>>      + master-pgsql                      : 5
>>> 
>>> Migration summary:
>>> * Node srv01:
>>>     vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18
>>>     18:07:50 2014'
>>> * Node srv02:
>>> 
>>> Failed actions:
>>>      vip-master_monitor_10000 on srv01 'not running' (7): call=30,
>>>      status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms,
>>>      exec=0ms
>>> -------------------------------------
>>> 
>>> However, the resource does not fail-over.
>>> 
>>> But, fail-over is calculated when I check cib in crm_simulate at this point
>>> in time.
>>> 
>>> -------------------------------------
>>> [root at srv01 ~]# crm_simulate -L -s
>>> 
>>> Current cluster status:
>>> Online: [ srv01 srv02 ]
>>> 
>>>   vip-master     (ocf::heartbeat:Dummy): Stopped
>>>   vip-rep        (ocf::heartbeat:Dummy): Stopped
>>>   Master/Slave Set: msPostgresql [pgsql]
>>>       Masters: [ srv01 ]
>>>       Slaves: [ srv02 ]
>>>   Clone Set: clnPingd [prmPingd]
>>>       Started: [ srv01 srv02 ]
>>> 
>>> Allocation scores:
>>> clone_color: clnPingd allocation score on srv01: 0
>>> clone_color: clnPingd allocation score on srv02: 0
>>> clone_color: prmPingd:0 allocation score on srv01: INFINITY
>>> clone_color: prmPingd:0 allocation score on srv02: 0
>>> clone_color: prmPingd:1 allocation score on srv01: 0
>>> clone_color: prmPingd:1 allocation score on srv02: INFINITY
>>> native_color: prmPingd:0 allocation score on srv01: INFINITY
>>> native_color: prmPingd:0 allocation score on srv02: 0
>>> native_color: prmPingd:1 allocation score on srv01: -INFINITY
>>> native_color: prmPingd:1 allocation score on srv02: INFINITY
>>> clone_color: msPostgresql allocation score on srv01: 0
>>> clone_color: msPostgresql allocation score on srv02: 0
>>> clone_color: pgsql:0 allocation score on srv01: INFINITY
>>> clone_color: pgsql:0 allocation score on srv02: 0
>>> clone_color: pgsql:1 allocation score on srv01: 0
>>> clone_color: pgsql:1 allocation score on srv02: INFINITY
>>> native_color: pgsql:0 allocation score on srv01: INFINITY
>>> native_color: pgsql:0 allocation score on srv02: 0
>>> native_color: pgsql:1 allocation score on srv01: -INFINITY
>>> native_color: pgsql:1 allocation score on srv02: INFINITY
>>> pgsql:1 promotion score on srv02: 5
>>> pgsql:0 promotion score on srv01: 1
>>> native_color: vip-master allocation score on srv01: -INFINITY
>>> native_color: vip-master allocation score on srv02: INFINITY
>>> native_color: vip-rep allocation score on srv01: -INFINITY
>>> native_color: vip-rep allocation score on srv02: INFINITY
>>> 
>>> Transition Summary:
>>>   * Start   vip-master   (srv02)
>>>   * Start   vip-rep      (srv02)
>>>   * Demote  pgsql:0      (Master -> Slave srv01)
>>>   * Promote pgsql:1      (Slave -> Master srv02)
>>> 
>>> -------------------------------------
>>> 
>>> In addition, fail-over is calculated even if "cluster_recheck_interval" is
>>> carried out.
>>> 
>>> Fail-over is carried out even if I carry out cibadmin -B.
>>> 
>>> -------------------------------------
>>> [root at srv01 ~]# cibadmin -B
>>> 
>>> [root at srv01 ~]# crm_mon -1 -Af
>>> Last updated: Tue Feb 18 18:21:15 2014
>>> Last change: Tue Feb 18 18:21:00 2014 via cibadmin on srv01
>>> Stack: corosync
>>> Current DC: srv01 (3232238180) - partition with quorum
>>> Version: 1.1.10-9d39a6b
>>> 2 Nodes configured
>>> 6 Resources configured
>>> 
>>> 
>>> Online: [ srv01 srv02 ]
>>> 
>>>   vip-master     (ocf::heartbeat:Dummy): Started srv02
>>>   vip-rep        (ocf::heartbeat:Dummy): Started srv02
>>>   Master/Slave Set: msPostgresql [pgsql]
>>>       Masters: [ srv02 ]
>>>       Slaves: [ srv01 ]
>>>   Clone Set: clnPingd [prmPingd]
>>>       Started: [ srv01 srv02 ]
>>> 
>>> Node Attributes:
>>> * Node srv01:
>>>      + default_ping_set                  : 100
>>>      + master-pgsql                      : 5
>>> * Node srv02:
>>>      + default_ping_set                  : 100
>>>      + master-pgsql                      : 10
>>> 
>>> Migration summary:
>>> * Node srv01:
>>>     vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18
>>>     18:07:50 2014'
>> 
>> You have resource-stickiness=INFINITY, this is what is preventing the failover from occurring. Set resource-stickiness=1 or 0 and the failover should occur.
>> 
>> -- Vossel
>> 
>>> * Node srv02:
>>> 
>>> Failed actions:
>>>      vip-master_monitor_10000 on srv01 'not running' (7): call=30,
>>>      status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms,
>>>      exec=0ms
>>> 
>>> -------------------------------------
>>> 
>>> It is a problem to be behind with practice of fail-over.
>>> I think that the cause that fail-over is late for from error is Pacemaker.
>>> 
>>> I registered these contents and log information with Bugzilla.
>>>   * http://bugs.clusterlabs.org/show_bug.cgi?id=5197
>>> 
>>> Best Regards,
>>> Hideo Yamauchi.
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140219/67b0a74f/attachment-0003.sig>


More information about the Pacemaker mailing list