[ClusterLabs] Cannot stop cluster due to order constraint
Leon Steffens
leon at steffensonline.com
Fri Sep 8 01:31:22 EDT 2017
Hi all,
We are running Pacemaker 1.1.15 under Centos 6.9, and have a simple 3-node
cluster with 6 sets of "main" and "backup" resources (just Dummy ones):
main1
backup1
main2
backup2
etc.
We have the following co-location constraint between main1 and backup1
(-200 because we don't want them to be on the same node, but under some
circumstances they can end up on the same node)
pcs constraint colocation add backup1 with main1 -200
We also have the following order constraint between main1 and backup1.
This caters for the scenario where they end up on the same node - we want
to make sure that "main" gets started before "backup" gets stopped, and
started somewhere else (because of co-location score):
pcs constraint order start main1 then stop backup1 kind=Serialize
When the cluster is started, everything works fine:
main1 (ocf::heartbeat:Dummy): Started straddie1
main2 (ocf::heartbeat:Dummy): Started straddie2
main3 (ocf::heartbeat:Dummy): Started straddie3
main4 (ocf::heartbeat:Dummy): Started straddie1
main5 (ocf::heartbeat:Dummy): Started straddie2
main6 (ocf::heartbeat:Dummy): Started straddie3
backup1 (ocf::heartbeat:Dummy): Started straddie2
backup2 (ocf::heartbeat:Dummy): Started straddie1
backup3 (ocf::heartbeat:Dummy): Started straddie1
backup4 (ocf::heartbeat:Dummy): Started straddie2
backup5 (ocf::heartbeat:Dummy): Started straddie1
backup6 (ocf::heartbeat:Dummy): Started straddie2
When we do a "pcs cluster stop --all", things do not go so well. pcs
cluster stop hangs and the cluster state is as follows:
main1 (ocf::heartbeat:Dummy): Stopped
main2 (ocf::heartbeat:Dummy): Stopped
main3 (ocf::heartbeat:Dummy): Stopped
main4 (ocf::heartbeat:Dummy): Stopped
main5 (ocf::heartbeat:Dummy): Stopped
main6 (ocf::heartbeat:Dummy): Stopped
backup1 (ocf::heartbeat:Dummy): Started straddie2
backup2 (ocf::heartbeat:Dummy): Started straddie1
backup3 (ocf::heartbeat:Dummy): Started straddie1
backup4 (ocf::heartbeat:Dummy): Started straddie2
backup5 (ocf::heartbeat:Dummy): Started straddie1
backup6 (ocf::heartbeat:Dummy): Started straddie2
The corosync.log clearly shows why this is happening. It looks like
Pacemaker wants to stop the backup resources, but the order constraint
states that the "main" resources should be started first. At this stage
the "main" resources have already been stopped, and because the cluster is
shutting down, the "main" resources cannot be started, and we are stuck:
Sep 08 15:15:07 [23862] straddie3 crmd: info: match_graph_event:
Action main1_stop_0 (14) confirmed on straddie1 (rc=0)
Sep 08 15:15:07 [23862] straddie3 crmd: warning: run_graph:
Transition 48 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=10,
Source=/var/lib/pacemaker/pengine/pe-input-496.bz2): Terminated
Sep 08 15:15:07 [23862] straddie3 crmd: warning: te_graph_trigger:
Transition failed: terminated
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_graph:
Graph 48 with 16 actions: batch-limit=0 jobs, network-delay=60000ms
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 14]: Completed rsc op main1_stop_0 on
straddie1 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 15]: Completed rsc op main4_stop_0 on
straddie1 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 16]: Pending rsc op backup2_stop_0 on
straddie1 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse: *
[Input 31]: Unresolved dependency rsc op main2_start_0
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 17]: Pending rsc op backup3_stop_0 on
straddie1 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse: *
[Input 32]: Unresolved dependency rsc op main3_start_0
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 18]: Pending rsc op backup5_stop_0 on
straddie1 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse: *
[Input 34]: Unresolved dependency rsc op main5_start_0
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 19]: Completed rsc op main2_stop_0 on
straddie2 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 20]: Completed rsc op main5_stop_0 on
straddie2 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 21]: Pending rsc op backup1_stop_0 on
straddie2 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse: *
[Input 30]: Unresolved dependency rsc op main1_start_0
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 22]: Pending rsc op backup4_stop_0 on
straddie2 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse: *
[Input 33]: Unresolved dependency rsc op main4_start_0
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 23]: Pending rsc op backup6_stop_0 on
straddie2 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse: *
[Input 35]: Unresolved dependency rsc op main6_start_0
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 24]: Completed rsc op main3_stop_0 on
straddie3 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 25]: Completed rsc op main6_stop_0 on
straddie3 (priority: 0, waiting: none)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 29]: Pending crm op do_shutdown-straddie3 on
straddie3 (priority: 0, waiting: 27 28)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 28]: Pending crm op do_shutdown-straddie2 on
straddie2 (priority: 0, waiting: 21 22 23)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 27]: Pending crm op do_shutdown-straddie1 on
straddie1 (priority: 0, waiting: 16 17 18)
Sep 08 15:15:07 [23862] straddie3 crmd: notice: print_synapse:
[Action 13]: Pending pseudo op all_stopped on N/A
(priority: 0, waiting: 16 17 18 21 22 23)
Sep 08 15:15:07 [23862] straddie3 crmd: info: do_log: Input
I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Sep 08 15:15:07 [23862] straddie3 crmd: notice:
do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE |
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Sep 08 15:15:07 [23862] straddie3 crmd: info:
do_state_transition: (Re)Issuing shutdown request now that we are the DC
Sep 08 15:15:07 [23862] straddie3 crmd: info: do_shutdown_req:
Sending shutdown request to straddie3
Sep 08 15:15:07 [23862] straddie3 crmd: info:
handle_shutdown_request: Creating shutdown request for straddie3
(state=S_IDLE)
Our current workaround is to delete the constraints before calling "pcs
cluster stop --all", but we would prefer not to do that.
If I add "symmetrical=false" it seems to work fine, but we need the
constraint to work in both directions. I've tried adding a separate order
constraint for "start backup then stop main kind=Serialized", but I hit the
same issue.
I've also added another optional order constraint between main and backup
to say backup must be stopped first before stopping partition, but this
didn't seem to work.
Does anyone have any ideas on how to solve this?
Thanks,
Leon
PS: The full script to create the resources on 3 nodes is:
echo "Creating main and backup"
pcs resource create main1 ocf:heartbeat:Dummy
pcs resource create main2 ocf:heartbeat:Dummy
pcs resource create main3 ocf:heartbeat:Dummy
pcs resource create main4 ocf:heartbeat:Dummy
pcs resource create main5 ocf:heartbeat:Dummy
pcs resource create main6 ocf:heartbeat:Dummy
pcs resource create backup1 ocf:heartbeat:Dummy
pcs resource create backup2 ocf:heartbeat:Dummy
pcs resource create backup3 ocf:heartbeat:Dummy
pcs resource create backup4 ocf:heartbeat:Dummy
pcs resource create backup5 ocf:heartbeat:Dummy
pcs resource create backup6 ocf:heartbeat:Dummy
pcs constraint order start main1 then stop backup1 kind=Serialize
pcs constraint order start main2 then stop backup2 kind=Serialize
pcs constraint order start main3 then stop backup3 kind=Serialize
pcs constraint order start main4 then stop backup4 kind=Serialize
pcs constraint order start main5 then stop backup5 kind=Serialize
pcs constraint order start main6 then stop backup6 kind=Serialize
pcs constraint colocation add backup1 with main1 -200
pcs constraint colocation add backup2 with main2 -200
pcs constraint colocation add backup3 with main3 -200
pcs constraint colocation add backup4 with main3 -200
pcs constraint colocation add backup5 with main3 -200
pcs constraint colocation add backup6 with main3 -200
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170908/763fd0a9/attachment-0002.html>
More information about the Users
mailing list