[ClusterLabs] Pacemaker stopped monitoring the resource

Ken Gaillot kgaillot at redhat.com
Tue Sep 5 09:13:56 EDT 2017


On Tue, 2017-09-05 at 06:54 +0000, Abhay B wrote:
> Ken,
> 
> 
> I have another set of logs : 
> 
> 
> Sep 01 09:10:05 [1328] TPC-F9-26.phaedrus.sandvine.com       crmd:
> info: do_lrm_rsc_op: Performing
> key=5:50864:0:86160921-abd7-4e14-94d4-f53cee278858
> op=SVSDEHA_monitor_2000
> SvsdeStateful(SVSDEHA)[6174]:   2017/09/01_09:10:06 ERROR: Resource is
> in failed state
> Sep 01 09:10:06 [1328] TPC-F9-26.phaedrus.sandvine.com       crmd:
> info: action_synced_wait:    Managed SvsdeStateful_meta-data_0 process
> 6274 exited with rc=4
> Sep 01 09:10:06 [1328] TPC-F9-26.phaedrus.sandvine.com       crmd:
> error: generic_get_metadata:  Failed to receive meta-data for
> ocf:pacemaker:SvsdeStateful
> Sep 01 09:10:06 [1328] TPC-F9-26.phaedrus.sandvine.com       crmd:
> error: build_operation_update:    No metadata for
> ocf::pacemaker:SvsdeStateful
> Sep 01 09:10:06 [1328] TPC-F9-26.phaedrus.sandvine.com       crmd:
> info: process_lrm_event: Result of monitor operation for SVSDEHA on
> TPC-F9-26.phaedrus.sandvine.com: 0 (ok) | call=939
> key=SVSDEHA_monitor_2000 confirmed=false cib-update=476
> Sep 01 09:10:06 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_process_request:   Forwarding cib_modify operation for
> section status to all (origin=local/crmd/476)
> Sep 01 09:10:06 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    Diff: --- 0.37.4054 2
> Sep 01 09:10:06 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    Diff: +++ 0.37.4055 (null)
> Sep 01 09:10:06 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    +  /cib:  @num_updates=4055
> Sep 01 09:10:06 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:
> ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='SVSDEHA']:  <lrm_rsc_op id="SVSDEHA_monitor_2000" operation_key="SVSDEHA_monitor_2000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="5:50864:0:86160921-abd7-4e14-94d4-f53cee278858" transition-magic="0:0;5:50864:0:86160921-abd7-4e14-94d4-f53cee278858" on_node="TPC-F9-26.phaedrus.sandvi
> Sep 01 09:10:06 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_process_request:   Completed cib_modify operation for
> section status: OK (rc=0,
> origin=TPC-F9-26.phaedrus.sandvine.com/crmd/476, version=0.37.4055)
> Sep 01 09:10:12 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_process_ping:  Reporting our current digest to
> TPC-E9-23.phaedrus.sandvine.com: 74bbb7e9f35fabfdb624300891e32018 for
> 0.37.4055 (0x7f5719954560 0)
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    Diff: --- 0.37.4055 2
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    Diff: +++ 0.37.4056 (null)
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    +  /cib:  @num_updates=4056
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:
> ++ /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='SVSDEHA']:  <lrm_rsc_op id="SVSDEHA_last_failure_0" operation_key="SVSDEHA_monitor_1000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="7:50662:8:86160921-abd7-4e14-94d4-f53cee278858" transition-magic="2:1;7:50662:8:86160921-abd7-4e14-94d4-f53cee278858" on_node="TPC-E9-23.phaedrus.sand
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_process_request:   Completed cib_modify operation for
> section status: OK (rc=0,
> origin=TPC-E9-23.phaedrus.sandvine.com/crmd/53508, version=0.37.4056)
> Sep 01 09:15:33 [1327] TPC-F9-26.phaedrus.sandvine.com      attrd:
> info: attrd_peer_update: Setting
> fail-count-SVSDEHA[TPC-E9-23.phaedrus.sandvine.com]: (null) -> 1 from
> TPC-E9-23.phaedrus.sandvine.com
> Sep 01 09:15:33 [1327] TPC-F9-26.phaedrus.sandvine.com      attrd:
> info: attrd_peer_update: Setting
> last-failure-SVSDEHA[TPC-E9-23.phaedrus.sandvine.com]: (null) ->
> 1504271733 from TPC-E9-23.phaedrus.sandvine.com
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    Diff: --- 0.37.4056 2
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    Diff: +++ 0.37.4057 (null)
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:    +  /cib:  @num_updates=4057
> Sep 01 09:15:33 [1325] TPC-F9-26.phaedrus.sandvine.com        cib:
> info: cib_perform_op:
> ++ /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2']:  <nvpair id="status-2-fail-count-SVSDEHA" name="fail-count-SVSDEHA" 
> value="1"/>
> 
> 
> I was suspecting around the highlighted parts of the logs above. 
> After 09:10:12 the next log is at 09:15:33. During this time other
> node failed several times but was not migrated here.

One of the nodes is elected the "DC" at any given time. That node
calculates what needs to be done about failures. It looks like the other
node was DC at this time, so its logs will be more relevant. It's fine
for this node not to have logs if the DC didn't ask it to do anything.

Logs with "pengine:" on the other node will show the decisions made.

> 
> 
> I am yet to check with sbd fencing with  the patch shared by Klaus.
> I am on CentOS. 
> 
> 
> # cat /etc/centos-release
> CentOS Linux release 7.3.1611 (Core)
> 
> 
> Regards,
> Abhay
> 
> 
> 
> 
> 
> 
> 
> On Sat, 2 Sep 2017 at 15:23 Klaus Wenninger <kwenning at redhat.com>
> wrote:
> 
>         On 09/01/2017 11:45 PM, Ken Gaillot wrote:
>         > On Fri, 2017-09-01 at 15:06 +0530, Abhay B wrote:
>         >>         Are you sure the monitor stopped? Pacemaker only
>         logs
>         >>         recurring monitors
>         >>         when the status changes. Any successful monitors
>         after this
>         >>         wouldn't be
>         >>         logged.
>         >>
>         >> Yes. Since there  were no logs which said "RecurringOp:
>         Start
>         >> recurring monitor" on the node after it had failed.
>         >> Also there were no logs for any actions pertaining to
>         >> The problem was that even though the one node was failing,
>         the
>         >> resources were never moved to the other node(the node on
>         which I
>         >> suspect monitoring had stopped).
>         >>
>         >>
>         >>         There are a lot of resource action failures, so I'm
>         not sure
>         >>         where the
>         >>         issue is, but I'm guessing it has to do with
>         >>         migration-threshold=1 --
>         >>         once a resource has failed once on a node, it won't
>         be allowed
>         >>         back on
>         >>         that node until the failure is cleaned up. Of
>         course you also
>         >>         have
>         >>         failure-timeout=1s, which should clean it up
>         immediately, so
>         >>         I'm not
>         >>         sure.
>         >>
>         >>
>         >> migration-threshold=1
>         >> failure-timeout=1s
>         >>
>         >> cluster-recheck-interval=2s
>         >>
>         >>
>         >>         first, set "two_node:
>         >>         1" in corosync.conf and let no-quorum-policy
>         default in
>         >>         pacemaker
>         >>
>         >>
>         >> This is already configured.
>         >> # cat /etc/corosync/corosync.conf
>         >> totem {
>         >>     version: 2
>         >>     secauth: off
>         >>     cluster_name: SVSDEHA
>         >>     transport: udpu
>         >>     token: 5000
>         >> }
>         >>
>         >>
>         >> nodelist {
>         >>     node {
>         >>         ring0_addr: 2.0.0.10
>         >>         nodeid: 1
>         >>     }
>         >>
>         >>
>         >>     node {
>         >>         ring0_addr: 2.0.0.11
>         >>         nodeid: 2
>         >>     }
>         >> }
>         >>
>         >>
>         >> quorum {
>         >>     provider: corosync_votequorum
>         >>     two_node: 1
>         >> }
>         >>
>         >>
>         >> logging {
>         >>     to_logfile: yes
>         >>     logfile: /var/log/cluster/corosync.log
>         >>     to_syslog: yes
>         >> }
>         >>
>         >>
>         >>         let no-quorum-policy default in pacemaker; then,
>         >>         get stonith configured, tested, and enabled
>         >>
>         >>
>         >> By not configuring no-quorum-policy, would it ignore quorum
>         for a 2
>         >> node cluster?
>         > With two_node, corosync always provides quorum to pacemaker,
>         so
>         > pacemaker doesn't see any quorum loss. The only significant
>         difference
>         > from ignoring quorum is that corosync won't form a cluster
>         from a cold
>         > start unless both nodes can reach each other (a safety
>         feature).
>         >
>         >> For my use case I don't need stonith enabled. My intention
>         is to have
>         >> a highly available system all the time.
>         > Stonith is the only way to recover from certain types of
>         failure, such
>         > as the "split brain" scenario, and a resource that fails to
>         stop.
>         >
>         > If your nodes are physical machines with hardware watchdogs,
>         you can set
>         > up sbd for fencing without needing any extra equipment.
>         
>         Small caveat here:
>         If I get it right you have a 2-node-setup. In this case the
>         watchdog-only
>         sbd-setup would not be usable as it relies on 'real' quorum.
>         In 2-node-setups sbd needs at least a single shared disk.
>         For the sbd-single-disk-setup working with 2-node
>         you need the patch from
>         https://github.com/ClusterLabs/sbd/pull/23
>         in place. (Saw you mentioning RHEL documentation - RHEL-7.4
>         has
>         it in since GA)
>         
>         Regards,
>         Klaus
>         
>         >
>         >> I will test my RA again as suggested with
>         no-quorum-policy=default.
>         >>
>         >>
>         >> One more doubt.
>         >> Why do we see this is 'pcs property' ?
>         >> last-lrm-refresh: 1504090367
>         >>
>         >>
>         >>
>         >> Never seen this on a healthy cluster.
>         >> From RHEL documentation:
>         >> last-lrm-refresh
>         >>
>         >> Last refresh of the
>         >> Local Resource Manager,
>         >> given in units of
>         >> seconds since epoca.
>         >> Used for diagnostic
>         >> purposes; not
>         >> user-configurable.
>         >>
>         >>
>         >> Doesn't explain much.
>         > Whenever a cluster property changes, the cluster rechecks
>         the current
>         > state to see if anything needs to be done. last-lrm-refresh
>         is just a
>         > dummy property that the cluster uses to trigger that. It's
>         set in
>         > certain rare circumstances when a resource cleanup is done.
>         You should
>         > see a line in your logs like "Triggering a refresh after ...
>         deleted ...
>         > from the LRM". That might give some idea of why.
>         >
>         >> Also. does avg. CPU load impact resource monitoring ?
>         >>
>         >>
>         >> Regards,
>         >> Abhay
>         > Well, it could cause the monitor to take so long that it
>         times out. The
>         > only direct effect of load on pacemaker is that the cluster
>         might lower
>         > the number of agent actions that it can execute
>         simultaneously.
>         >
>         >
>         >> On Thu, 31 Aug 2017 at 20:11 Ken Gaillot
>         <kgaillot at redhat.com> wrote:
>         >>
>         >>         On Thu, 2017-08-31 at 06:41 +0000, Abhay B wrote:
>         >>         > Hi,
>         >>         >
>         >>         >
>         >>         > I have a 2 node HA cluster configured on CentOS 7
>         with pcs
>         >>         command.
>         >>         >
>         >>         >
>         >>         > Below are the properties of the cluster :
>         >>         >
>         >>         >
>         >>         > # pcs property
>         >>         > Cluster Properties:
>         >>         >  cluster-infrastructure: corosync
>         >>         >  cluster-name: SVSDEHA
>         >>         >  cluster-recheck-interval: 2s
>         >>         >  dc-deadtime: 5
>         >>         >  dc-version: 1.1.15-11.el7_3.5-e174ec8
>         >>         >  have-watchdog: false
>         >>         >  last-lrm-refresh: 1504090367
>         >>         >  no-quorum-policy: ignore
>         >>         >  start-failure-is-fatal: false
>         >>         >  stonith-enabled: false
>         >>         >
>         >>         >
>         >>         > PFA the cib.
>         >>         > Also attached is the corosync.log around the time
>         the below
>         >>         issue
>         >>         > happened.
>         >>         >
>         >>         >
>         >>         > After around 10 hrs and multiple failures,
>         pacemaker stops
>         >>         monitoring
>         >>         > resource on one of the nodes in the cluster.
>         >>         >
>         >>         >
>         >>         > So even though the resource on other node fails,
>         it is never
>         >>         migrated
>         >>         > to the node on which the resource is not
>         monitored.
>         >>         >
>         >>         >
>         >>         > Wanted to know what could have triggered this and
>         how to
>         >>         avoid getting
>         >>         > into such scenarios.
>         >>         > I am going through the logs and couldn't find why
>         this
>         >>         happened.
>         >>         >
>         >>         >
>         >>         > After this log the monitoring stopped.
>         >>         >
>         >>         > Aug 29 11:01:44 [16500]
>         TPC-D12-10-002.phaedrus.sandvine.com
>         >>         > crmd:     info: process_lrm_event:   Result of
>         monitor
>         >>         operation for
>         >>         > SVSDEHA on TPC-D12-10-002.phaedrus.sandvine.com:
>         0 (ok) |
>         >>         call=538
>         >>         > key=SVSDEHA_monitor_2000 confirmed=false
>         cib-update=50013
>         >>
>         >>         Are you sure the monitor stopped? Pacemaker only
>         logs
>         >>         recurring monitors
>         >>         when the status changes. Any successful monitors
>         after this
>         >>         wouldn't be
>         >>         logged.
>         >>
>         >>         > Below log says the resource is leaving the
>         cluster.
>         >>         > Aug 29 11:01:44 [16499]
>         TPC-D12-10-002.phaedrus.sandvine.com
>         >>         > pengine:     info: LogActions:  Leave   SVSDEHA:0
>         >>          (Slave
>         >>         > TPC-D12-10-002.phaedrus.sandvine.com)
>         >>
>         >>         This means that the cluster will leave the resource
>         where it
>         >>         is (i.e. it
>         >>         doesn't need a start, stop, move, demote, promote,
>         etc.).
>         >>
>         >>         > Let me know if anything more is needed.
>         >>         >
>         >>         >
>         >>         > Regards,
>         >>         > Abhay
>         >>         >
>         >>         >
>         >>         > PS:'pcs resource cleanup' brought the cluster
>         back into good
>         >>         state.
>         >>
>         >>         There are a lot of resource action failures, so I'm
>         not sure
>         >>         where the
>         >>         issue is, but I'm guessing it has to do with
>         >>         migration-threshold=1 --
>         >>         once a resource has failed once on a node, it won't
>         be allowed
>         >>         back on
>         >>         that node until the failure is cleaned up. Of
>         course you also
>         >>         have
>         >>         failure-timeout=1s, which should clean it up
>         immediately, so
>         >>         I'm not
>         >>         sure.
>         >>
>         >>         My gut feeling is that you're trying to do too many
>         things at
>         >>         once. I'd
>         >>         start over from scratch and proceed more slowly:
>         first, set
>         >>         "two_node:
>         >>         1" in corosync.conf and let no-quorum-policy
>         default in
>         >>         pacemaker; then,
>         >>         get stonith configured, tested, and enabled; then,
>         test your
>         >>         resource
>         >>         agent manually on the command line to make sure it
>         conforms to
>         >>         the
>         >>         expected return values
>         >>
>          ( http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf ); then add your resource to the cluster without migration-threshold or failure-timeout, and work out any issues with frequent failures; then finally set migration-threshold and failure-timeout to reflect how you want recovery to proceed.
>         >>         --
>         >>         Ken Gaillot <kgaillot at redhat.com>
>         >>
>         >>
>         >>
>         >>
>         >>
>         >>         _______________________________________________
>         >>         Users mailing list: Users at clusterlabs.org
>         >>         http://lists.clusterlabs.org/mailman/listinfo/users
>         >>
>         >>         Project Home: http://www.clusterlabs.org
>         >>         Getting started:
>         >>
>          http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>         >>         Bugs: http://bugs.clusterlabs.org
>         
>         
>         _______________________________________________
>         Users mailing list: Users at clusterlabs.org
>         http://lists.clusterlabs.org/mailman/listinfo/users
>         
>         Project Home: http://www.clusterlabs.org
>         Getting started:
>         http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>         Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgaillot at redhat.com>








More information about the Users mailing list