[ClusterLabs] 答复: No slave is promoted to be master

Andrei Borzenkov arvidjaar at gmail.com
Tue Apr 17 01:10:00 EDT 2018



Отправлено с iPhone

> 17 апр. 2018 г., в 7:16, 范国腾 <fanguoteng at highgo.com> написал(а):
> 
> I check the status again. It is not not promoted but it promoted about 15 minutes after the cluster starts. 
> 
> I try in three labs and the results are same: The promotion happens 15 minutes after the cluster starts. 
> 
> Why is there about 15 minutes delay every time?
> 

That rings the bell. 15 minutes is default interval for time based rules re-evaluation; my understanding so far is that it alarm triggers other configuration changes (basically it runs policy engine to make decision). 

I had similar effect when I attempted to change quorum state directly, without going via external node events.

So it looks like whatever sets master scores does not trigger policy engine.


> 
> Apr 16 22:08:32 node1 attrd[16618]:  notice: Node sds1 state is now member
> Apr 16 22:08:32 node1 attrd[16618]:  notice: Node sds2 state is now member
> 
> ......
> 
> Apr 16 22:21:36 node1 pgsqlms(pgsqld)[18230]: INFO: Execute action monitor and the result 0
> Apr 16 22:21:52 node1 pgsqlms(pgsqld)[18257]: INFO: Execute action monitor and the result 0
> Apr 16 22:22:09 node1 pgsqlms(pgsqld)[18296]: INFO: Execute action monitor and the result 0
> Apr 16 22:22:25 node1 pgsqlms(pgsqld)[18315]: INFO: Execute action monitor and the result 0
> Apr 16 22:22:41 node1 pgsqlms(pgsqld)[18343]: INFO: Execute action monitor and the result 0
> Apr 16 22:22:57 node1 pgsqlms(pgsqld)[18362]: INFO: Execute action monitor and the result 0
> Apr 16 22:23:13 node1 pgsqlms(pgsqld)[18402]: INFO: Execute action monitor and the result 0
> Apr 16 22:23:29 node1 pgsqlms(pgsqld)[18421]: INFO: Execute action monitor and the result 0
> Apr 16 22:23:45 node1 pgsqlms(pgsqld)[18449]: INFO: Execute action monitor and the result 0
> Apr 16 22:23:57 node1 crmd[16620]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
> Apr 16 22:23:57 node1 pengine[16619]:  notice: Promote pgsqld:0#011(Slave -> Master sds1)
> Apr 16 22:23:57 node1 pengine[16619]:  notice: Start   master-vip#011(sds1)
> Apr 16 22:23:57 node1 pengine[16619]:  notice: Start   pgsql-master-ip#011(sds1)
> Apr 16 22:23:57 node1 pengine[16619]:  notice: Calculated transition 1, saving inputs in /var/lib/pacemaker/pengine/pe-input-18.bz2
> Apr 16 22:23:57 node1 crmd[16620]:  notice: Initiating cancel operation pgsqld_monitor_16000 locally on sds1
> Apr 16 22:23:57 node1 crmd[16620]:  notice: Initiating notify operation pgsqld_pre_notify_promote_0 locally on sds1
> Apr 16 22:23:57 node1 crmd[16620]:  notice: Initiating notify operation pgsqld_pre_notify_promote_0 on sds2
> Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18467]: INFO: Promoting instance on node "sds1"
> Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18467]: INFO: Current node TL#LSN: 4#117440512
> Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18467]: INFO: Execute action notify and the result 0
> Apr 16 22:23:58 node1 crmd[16620]:  notice: Result of notify operation for pgsqld on sds1: 0 (ok)
> Apr 16 22:23:58 node1 crmd[16620]:  notice: Initiating promote operation pgsqld_promote_0 locally on sds1
> Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18499]: INFO: Waiting for the promote to complete
> Apr 16 22:23:59 node1 pgsqlms(pgsqld)[18499]: INFO: Promote complete
> 
> 
> 
> [root at node1 ~]# crm_simulate -sL
> 
> Current cluster status:
> Online: [ sds1 sds2 ]
> 
> Master/Slave Set: pgsql-ha [pgsqld]
>     Masters: [ sds1 ]
>     Slaves: [ sds2 ]
> Resource Group: mastergroup
>     master-vip (ocf::heartbeat:IPaddr2):       Started sds1
> pgsql-master-ip        (ocf::heartbeat:IPaddr2):       Started sds1
> 
> Allocation scores:
> clone_color: pgsql-ha allocation score on sds1: 1
> clone_color: pgsql-ha allocation score on sds2: 1
> clone_color: pgsqld:0 allocation score on sds1: 1003
> clone_color: pgsqld:0 allocation score on sds2: 1
> clone_color: pgsqld:1 allocation score on sds1: 1
> clone_color: pgsqld:1 allocation score on sds2: 1002
> native_color: pgsqld:0 allocation score on sds1: 1003
> native_color: pgsqld:0 allocation score on sds2: 1
> native_color: pgsqld:1 allocation score on sds1: -INFINITY
> native_color: pgsqld:1 allocation score on sds2: 1002
> pgsqld:0 promotion score on sds1: 1002
> pgsqld:1 promotion score on sds2: 1001
> group_color: mastergroup allocation score on sds1: 0
> group_color: mastergroup allocation score on sds2: 0
> group_color: master-vip allocation score on sds1: 0
> group_color: master-vip allocation score on sds2: 0
> native_color: master-vip allocation score on sds1: 1003
> native_color: master-vip allocation score on sds2: -INFINITY
> native_color: pgsql-master-ip allocation score on sds1: 1003
> native_color: pgsql-master-ip allocation score on sds2: -INFINITY
> 
> Transition Summary:
> [root at node1 ~]#
> 
> You could reproduce the issue in two nodes, and execute the following command. Then run "pcs cluster stop --all" and "pcs cluster start --all".
> 
> pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir=/home/highgo/highgo/database/4.3.1/bin pgdata=/home/highgo/highgo/database/4.3.1/data op start timeout=600s op stop timeout=60s op promote timeout=300s op demote timeout=120s op monitor interval=10s timeout=100s role="Master" op monitor interval=16s timeout=100s role="Slave" op notify timeout=60s
> pcs resource master pgsql-ha pgsqld notify=true interleave=true
> 
> 
> 
> 
> 
> -----邮件原件-----
> 发件人: 范国腾 
> 发送时间: 2018年4月17日 10:25
> 收件人: 'Jehan-Guillaume de Rorthais' <jgdr at dalibo.com>
> 抄送: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> 主题: [ClusterLabs] No slave is promoted to be master
> 
> Hi,
> 
> We install a new lab which only have the postgres resource and the vip resource. After the cluster is installed, the status is ok: only node is master and the other is slave. Then I run "pcs cluster stop --all" to close the cluster and then I run the "pcs cluster start  --all" to start the cluster. All of the pgsql is slave status and they could not be promoted to be master any more like this:
> 
> Master/Slave Set: pgsql-ha [pgsqld]
>     Slaves: [ sds1 sds2 ] 
> 
> 
> There is no error in the log and the " crm_simulate -sL" show the flowing and it seems that the score is ok too. The detailed log and config is in the attachment.
> 
> [root at node1 ~]# crm_simulate -sL
> 
> Current cluster status:
> Online: [ sds1 sds2 ]
> 
> Master/Slave Set: pgsql-ha [pgsqld]
>     Slaves: [ sds1 sds2 ]
> Resource Group: mastergroup
>     master-vip (ocf::heartbeat:IPaddr2):       Stopped
> pgsql-master-ip        (ocf::heartbeat:IPaddr2):       Stopped
> 
> Allocation scores:
> clone_color: pgsql-ha allocation score on sds1: 1
> clone_color: pgsql-ha allocation score on sds2: 1
> clone_color: pgsqld:0 allocation score on sds1: 1003
> clone_color: pgsqld:0 allocation score on sds2: 1
> clone_color: pgsqld:1 allocation score on sds1: 1
> clone_color: pgsqld:1 allocation score on sds2: 1002
> native_color: pgsqld:0 allocation score on sds1: 1003
> native_color: pgsqld:0 allocation score on sds2: 1
> native_color: pgsqld:1 allocation score on sds1: -INFINITY
> native_color: pgsqld:1 allocation score on sds2: 1002
> pgsqld:0 promotion score on sds1: 1002
> pgsqld:1 promotion score on sds2: 1001
> group_color: mastergroup allocation score on sds1: 0
> group_color: mastergroup allocation score on sds2: 0
> group_color: master-vip allocation score on sds1: 0
> group_color: master-vip allocation score on sds2: 0
> native_color: master-vip allocation score on sds1: 1003
> native_color: master-vip allocation score on sds2: -INFINITY
> native_color: pgsql-master-ip allocation score on sds1: 1003
> native_color: pgsql-master-ip allocation score on sds2: -INFINITY
> 
> Transition Summary:
> * Promote pgsqld:0     (Slave -> Master sds1)
> * Start   master-vip   (sds1)
> * Start   pgsql-master-ip      (sds1)
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



More information about the Users mailing list