[ClusterLabs] pcs create master/slave resource doesn't work

Fri Nov 24 11:00:19 CET 2017

Jan,

  Very appreciated on your help, I am getting further more, but still it
looks very strange.

1. To use "debug-promote", I upgrade pacemaker from 1.12 to 1.16, pcs to
0.9.160.

2. Recreate resource with below commands
pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
  master_ip=192.168.0.99 \
  op monitor interval="10s" \
  op monitor interval="11s" role=Master
pcs resource master ovndb_servers-master ovndb_servers \
  meta notify="true" master-max="1" master-node-max="1" clone-max="3"
clone-node-max="1"
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 \
    op monitor interval=10s
pcs constraint colocation add VirtualIP with master ovndb_servers-master \
  score=INFINITY

3. pcs status
 Master/Slave Set: ovndb_servers-master [ovndb_servers]
     Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
 VirtualIP (ocf::heartbeat:IPaddr2): Stopped

4. Manually run 'debug-start' on 3 nodes and 'debug-promote' on one of nodes
run below on [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
# pcs resource debug-start ovndb_servers --full
run below on [ node-1.domain.tld ]
# pcs resource debug-promote ovndb_servers --full

5. pcs status
 Master/Slave Set: ovndb_servers-master [ovndb_servers]
     Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
 VirtualIP (ocf::heartbeat:IPaddr2): Stopped

6. However I have seen that one of ovndb_servers has been indeed promoted
as master, but pcs status still showed all 'stopped'
what am I missing?

 >  stderr: + 17:45:59: ocf_log:327: __OCF_MSG='ovndb_servers: Promoting
node-1.domain.tld as the master'
 >  stderr: + 17:45:59: ocf_log:329: case "${__OCF_PRIO}" in
 >  stderr: + 17:45:59: ocf_log:333: __OCF_PRIO=INFO
 >  stderr: + 17:45:59: ocf_log:338: '[' INFO = DEBUG ']'
 >  stderr: + 17:45:59: ocf_log:341: ha_log 'INFO: ovndb_servers: Promoting
node-1.domain.tld as the master'
 >  stderr: + 17:45:59: ha_log:253: __ha_log 'INFO: ovndb_servers:
Promoting node-1.domain.tld as the master'
 >  stderr: + 17:45:59: __ha_log:185: local ignore_stderr=false
 >  stderr: + 17:45:59: __ha_log:186: local loglevel
 >  stderr: + 17:45:59: __ha_log:188: '[' 'xINFO: ovndb_servers: Promoting
node-1.domain.tld as the master' = x--ignore-stderr ']'
 >  stderr: + 17:45:59: __ha_log:190: '[' none = '' ']'
 >  stderr: + 17:45:59: __ha_log:192: tty
 >  stderr: + 17:45:59: __ha_log:193: '[' x = x0 -a x = xdebug ']'
 >  stderr: + 17:45:59: __ha_log:195: '[' false = true ']'
 >  stderr: + 17:45:59: __ha_log:199: '[' '' ']'
 >  stderr: + 17:45:59: __ha_log:202: echo 'INFO: ovndb_servers: Promoting
node-1.domain.tld as the master'
 >  stderr: INFO: ovndb_servers: Promoting node-1.domain.tld as the master
 >  stderr: + 17:45:59: __ha_log:204: return 0
 >  stderr: + 17:45:59: ovsdb_server_promote:378: /usr/sbin/crm_attribute
--type crm_config --name OVN_REPL_INFO -s ovn_ovsdb_master_server -v
node-1.domain.tld
 >  stderr: + 17:45:59: ovsdb_server_promote:379:
ovsdb_server_master_update 8
 >  stderr: + 17:45:59: ovsdb_server_master_update:214: case $1 in
 >  stderr: + 17:45:59: ovsdb_server_master_update:218:
/usr/sbin/crm_master -l reboot -v 10
 >  stderr: + 17:45:59: ovsdb_server_promote:380: return 0
 >  stderr: + 17:45:59: 458: rc=0
 >  stderr: + 17:45:59: 459: exit 0

On 23/11/17 23:52 +0800, Hui Xiang wrote:
>* I am working on HA with 3-nodes, which has below configurations:
*> >* """
*>* pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
*>*   master_ip=168.254.101.2 \
*>*   op monitor interval="10s" \
*>*   op monitor interval="11s" role=Master
*>* pcs resource master ovndb_servers-master ovndb_servers \
*>*   meta notify="true" master-max="1" master-node-max="1" clone-max="3"
*>* clone-node-max="1"
*>* pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=168.254.101.2 \
*>*     op monitor interval=10s
*>* pcs constraint order promote ovndb_servers-master then VirtualIP
*>* pcs constraint colocation add VirtualIP with master ovndb_servers-master \
*>*   score=INFINITY
*>* """
*
(Out of curiosity, this looks like a mix of output from
pcs config export pcs-commands [or clufter cib2pcscmd -s]
and manual editing.  Is this a good guess?)

It's the output of "pcs status".

>*   However, after setting it as above, the master is not being selected, all
*>* are stopped, from pacemaker log, node-1 has been chosen as the master, I am
*>* confuse where is wrong, can anybody give a help, it would be very
*>* appreciated.
*> > >*  Master/Slave Set: ovndb_servers-master [ovndb_servers]
*>*      Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
*>*  VirtualIP (ocf::heartbeat:IPaddr2): Stopped
*> > >* # pacemaker log
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++ /cib/configuration/resources:  <primitive class="ocf"
*>* id="ovndb_servers" provider="ovn" type="ovndb-servers"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                  <instance_attributes
*>* id="ovndb_servers-instance_attributes">
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <nvpair
*>* id="ovndb_servers-instance_attributes-master_ip" name="master_ip"
*>* value="168.254.101.2"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                  </instance_attributes>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                  <operations>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <op
*>* id="ovndb_servers-start-timeout-30s" interval="0s" name="start"
*>* timeout="30s"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <op
*>* id="ovndb_servers-stop-timeout-20s" interval="0s" name="stop"
*>* timeout="20s"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <op
*>* id="ovndb_servers-promote-timeout-50s" interval="0s" name="promote"
*>* timeout="50s"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <op
*>* id="ovndb_servers-demote-timeout-50s" interval="0s" name="demote"
*>* timeout="50s"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <op
*>* id="ovndb_servers-monitor-interval-10s" interval="10s" name="monitor"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                    <op
*>* id="ovndb_servers-monitor-interval-11s-role-Master" interval="11s"
*>* name="monitor" role="Master"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                  </operations>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++                                </primitive>
*> >* Nov 23 23:06:03 [665249] node-1.domain.tld      attrd:     info:
*>* attrd_peer_update: Setting master-ovndb_servers[node-1.domain.tld]: (null)
*>* -> 5 from node-1.domain.tld
*
If it's probable your ocf:ovn:ovndb-servers agent in master mode can
run something like "attrd_updater -n master-ovndb_servers -U 5", then
it was indeed launched OK, and if it does not continue to run as
expected, there may be a problem with the agent itself.

no change.

You can try running "pcs resource debug-promote ovndb_servers --full"
to examine the executation details (assuming the agent responds to
OCF_TRACE_RA=1 environment variable, which is what shell-based
agents built on top ocf-shellfuncs sourcable shell library from
resource-agents project, hence incl. also agents it ships,
customarily do).

Yes, thank, it's helpful.

>* Nov 23 23:06:03 [665251] node-1.domain.tld       crmd:   notice:
*>* process_lrm_event: Operation ovndb_servers_monitor_0: ok
*>* (node=node-1.domain.tld, call=185, rc=0, cib-update=88, confirmed=true)
*>* <29>Nov 23 23:06:03 node-1 crmd[665251]:   notice: process_lrm_event:
*>* Operation ovndb_servers_monitor_0: ok (node=node-1.domain.tld, call=185,
*>* rc=0, cib-update=88, confirmed=true)
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: Diff: --- 0.630.2 2
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: Diff: +++ 0.630.3 (null)
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: +  /cib:  @num_updates=3
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_perform_op: ++
*>* /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
*>* <nvpair id="status-1-master-ovndb_servers" name="master-ovndb_servers"
*>* value="5"/>
*>* Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
*>* cib_process_request: Completed cib_modify operation for section status: OK
*>* (rc=0, origin=node-3.domain.tld/attrd/80, version=0.630.3)
*
Also depends if there's anything interesting after this point...

-- 
Jan (Poki)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171124/d196670f/attachment-0001.html>