[ClusterLabs] big trouble with a DRBD resource
Ken Gaillot
kgaillot at redhat.com
Wed Aug 16 10:30:53 EDT 2017
On Wed, 2017-08-16 at 15:20 +0200, Lentes, Bernd wrote:
>
> > Hi,
> >
>
> >
> > What happened:
> > I tried to configure a simple drbd resource following
> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296
> > I used this simple snip from the doc:
> > configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
> > op monitor interval=60s
> >
> > I did it on live cluster, which is in testing currently. I will never do this
> > again. Shadow will be my friend.
> >
> > The cluster reacted promptly:
> > crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params
> > drbd_resource=idcc-devel \
> > > op monitor interval=60
> > WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller than the
> > advised 240
> > WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than the
> > advised 100
> > WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, it
> > may not be supported by the RA
> >
> > From what i understand until now is that i didn't configure start/stop
> > operations, so the cluster chooses the default from default-action-timeout.
> > It didn't configure the monitor operation, because this is not in the meta-data.
>
> >
> > The log says:
> > Aug 1 14:19:33 ha-idg-1 drbd(prim_drbd_idcc_devel)[11325]: ERROR: meta
> > parameter misconfigured, expected clone-max -le 2, but found unset.
> > ^^^^^^^^^
> > Aug 1 14:19:33 ha-idg-1 crmd[4692]: notice: process_lrm_event: Operation
> > prim_drbd_idcc_devel_monitor_0: not configured (node=ha-idg-1, call=73, rc=6,
> > cib-update=37, confirmed=true)
> > Aug 1 14:19:33 ha-idg-1 crmd[4692]: notice: process_lrm_event: Operation
> > prim_drbd_idcc_devel_stop_0: not configured (node=ha-idg-1, call=74, rc=6,
> > cib-update=38, confirmed=true)
> >
>
> >
> > crm_mon said:
> > Failed actions:
> > prim_drbd_idcc_devel_stop_0 on ha-idg-1 'not configured' (6): call=6967,
> > status=complete, exit-reason='none', last-rc-change='Tue Aug 1 14:28:33 2017',
> > queued=0ms, exec=41ms
> > prim_drbd_idcc_devel_monitor_60000 on ha-idg-1 'not configured' (6): call=6968,
> > status=complete, exit-reason='none', last-rc-change='Tue Aug 1 14:28:33 2017',
> > queued=0ms, exec=41ms
> > prim_drbd_idcc_devel_stop_0 on ha-idg-2 'not configured' (6): call=6963,
> > status=complete, exit-reason='none', last-rc-change='Tue Aug 1 14:28:33 2017',
> > queued=0ms, exec=40ms
> >
> > A big problem was that i have a ClusterMon resource running on each node. It
> > triggered about 20000 snmp traps in 193 seconds to my management station, which
> > triggered 20000 e-Mails ...
> > From where comes this incredible amount of traps ? Nearly all traps said that
> > stop is not configured for the drdb resource. Why complaining so often ? And
> > why stopping after ~20.000 traps ?
> > And complaining about not configured monitor operation just 8 times.
>
> Ok. I configured the drbd resource wrong/completely, and that caused the trouble.
> What i would like to know:
> - from where does crm_mon retrieves its information ?
It uses the C API to be notified of CIB changes (which has all the
cluster state) and stonith events, and additionally polls the state
every couple of seconds.
> - why did i get tons of lines in syslog ? One message that the resource isn't configured correctly/completely would be enough.
> I got thousands and thousands lines telling the same.
I'm not sure from this information. Most commonly, if a resource agent
start fails, and migration-threshold is left at the default (1,000,000),
it's the result of retrying start/stop repeatedly. However, "not
configured" is a fatal error, so pacemaker wouldn't retry that
particular operation. It would log the message every time a new
operation was executed and returned that result, and every time it did a
policy engine run (until the error was cleaned up).
>
> Bernd
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
More information about the Users
mailing list