[ClusterLabs] Pacemaker resource parameter reload confusion

Ken Gaillot kgaillot at redhat.com
Fri Sep 22 10:56:05 EDT 2017


On Fri, 2017-09-22 at 16:23 +0200, Ferenc Wágner wrote:
> Hi,
> 
> I'm running a custom resourcre agent under Pacemaker 1.1.16, which
> has
> several reloadable parameters:
> 
> $ /usr/sbin/crm_resource --show-metadata=ocf:niif:TransientDomain |
> fgrep unique=
> <parameter name="domxml" unique="1" required="1">
> <parameter name="graceful" unique="0" required="0">
> <parameter name="desturi_template" unique="0" required="1">
> <parameter name="migrateuri_template" unique="0" required="0">
> <parameter name="migr_timeout" unique="0" required="0">
> <parameter name="admins" unique="0" required="0">
> <parameter name="expect_startup_signal" unique="0" required="0">
> <parameter name="dummy" unique="1" required="0">
> <parameter name="dummy_delay" unique="0" required="0">
> 
> I used to routinely change the unique="0" parameters without having
> the
> corresponding resources restarted.  But now something like
> 
> $ sudo crm_resource -r vm-alder -p admins -v "kissg wferi"
> 
> restarts the resource in a somewhat strange way:
> 
> crmd[27037]:   notice: State transition S_IDLE -> S_POLICY_ENGINE
> pengine[27036]:   notice: Reload  vm-alder#011(Started vhbl05)
> pengine[27036]:   notice: Calculated transition 1309, saving inputs
> in /var/lib/pacemaker/pengine/pe-input-1033.bz2
> crmd[27037]:   notice: Initiating stop operation vm-alder_stop_0 on
> vhbl05
> crmd[27037]:   notice: Initiating reload operation vm-alder_reload_0
> on vhbl05
> crmd[27037]:   notice: Transition aborted by deletion of
> lrm_rsc_op[@id='vm-alder_last_failure_0']: Resource operation removal
> crmd[27037]:   notice: Transition 1309 (Complete=10, Pending=0,
> Fired=0, Skipped=1, Incomplete=2,
> Source=/var/lib/pacemaker/pengine/pe-input-1033.bz2): Stopped
> pengine[27036]:   notice: Calculated transition 1310, saving inputs
> in /var/lib/pacemaker/pengine/pe-input-1034.bz2

Hmm, stop+reload is definitely a bug. Can you attach (or email it to me
privately, or file a bz with it attached) the above pe-input file with
any sensitive info removed?

> crmd[27037]:   notice: Initiating monitor operation vm-
> alder_monitor_60000 on vhbl05
> crmd[27037]:  warning: Action 228 (vm-alder_monitor_60000) on vhbl05
> failed (target: 0 vs. rc: 7): Error
> crmd[27037]:   notice: Transition aborted by operation vm-
> alder_monitor_60000 'create' on vhbl05: Event failed
> crmd[27037]:   notice: Transition 1310 (Complete=7, Pending=0,
> Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1034.bz2): Complete
> pengine[27036]:  warning: Processing failed op monitor for vm-alder
> on vhbl05: not running (7)
> pengine[27036]:   notice: Recover vm-alder#011(Started vhbl05)
> pengine[27036]:   notice: Calculated transition 1311, saving inputs
> in /var/lib/pacemaker/pengine/pe-input-1035.bz2
> pengine[27036]:  warning: Processing failed op monitor for vm-alder
> on vhbl05: not running (7)
> pengine[27036]:   notice: Recover vm-alder#011(Started vhbl05)
> pengine[27036]:   notice: Calculated transition 1312, saving inputs
> in /var/lib/pacemaker/pengine/pe-input-1036.bz2
> crmd[27037]:   notice: Initiating stop operation vm-alder_stop_0 on
> vhbl05
> crmd[27037]:   notice: Initiating start operation vm-alder_start_0 on
> vhbl05
> crmd[27037]:   notice: Initiating monitor operation vm-
> alder_monitor_60000 on vhbl05
> crmd[27037]:   notice: Transition 1312 (Complete=10, Pending=0,
> Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1036.bz2): Complete
> crmd[27037]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE
> 
> I've got info level logs as well, but those are rather long and maybe
> someone can pinpoint my problem without going through those.  I
> remember
> past discussions about "doing reload right", but I'm not sure what
> was
> implemented in the end, and I can't find anything in the changelog
> either.  So, what do I miss here?  Parallel reload and stop looks
> rather
> suspicious, though...

Nothing's been done about reload yet. It's waiting until we get around
to an overhaul of the OCF resource agent standard, so we can define the
semantics more clearly. It will involve replacing "unique" with
separate meta-data for reloadability and GUI hinting, and possibly
changes to the reload operation. Of course we'll try to stay backward-
compatible.
-- 
Ken Gaillot <kgaillot at redhat.com>







More information about the Users mailing list