[ClusterLabs] Pacemaker stopped monitoring the resource
Abhay B
abhayyb at gmail.com
Thu Aug 31 02:41:26 EDT 2017
Hi,
I have a 2 node HA cluster configured on CentOS 7 with pcs command.
Below are the properties of the cluster :
# pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: SVSDEHA
cluster-recheck-interval: 2s
dc-deadtime: 5
dc-version: 1.1.15-11.el7_3.5-e174ec8
have-watchdog: false
last-lrm-refresh: 1504090367
no-quorum-policy: ignore
start-failure-is-fatal: false
stonith-enabled: false
PFA the cib.
Also attached is the corosync.log around the time the below issue happened.
After around 10 hrs and multiple failures, pacemaker stops monitoring
resource on one of the nodes in the cluster.
So even though the resource on other node fails, it is never migrated to
the node on which the resource is not monitored.
Wanted to know what could have triggered this and how to avoid getting into
such scenarios.
I am going through the logs and couldn't find why this happened.
After this log the monitoring stopped.
*Aug 29 11:01:44 [16500] TPC-D12-10-002.phaedrus.sandvine.com
<http://TPC-D12-10-002.phaedrus.sandvine.com> crmd: info:
process_lrm_event: Result of monitor operation for SVSDEHA on
TPC-D12-10-002.phaedrus.sandvine.com
<http://TPC-D12-10-002.phaedrus.sandvine.com>: 0 (ok) | call=538
key=SVSDEHA_monitor_2000 confirmed=false cib-update=50013*
Below log says the resource is leaving the cluster.
*Aug 29 11:01:44 [16499] TPC-D12-10-002.phaedrus.sandvine.com
<http://TPC-D12-10-002.phaedrus.sandvine.com> pengine: info:
LogActions: Leave SVSDEHA:0 (Slave
TPC-D12-10-002.phaedrus.sandvine.com
<http://TPC-D12-10-002.phaedrus.sandvine.com>)*
Let me know if anything more is needed.
Regards,
Abhay
*PS:'pcs resource cleanup' brought the cluster back into good state. *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170831/e49deebf/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.xml
Type: text/xml
Size: 7659 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170831/e49deebf/attachment-0002.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosyn_filt.log
Type: application/octet-stream
Size: 327548 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170831/e49deebf/attachment-0002.obj>
More information about the Users
mailing list