[ClusterLabs] Doing reload right

Wed Jul 20 12:47:13 EDT 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> Hello all,
> 
> I've been meaning to address the implementation of "reload" in Pacemaker
> for a while now, and I think the next release will be a good time, as it
> seems to be coming up more frequently.

[snipped]

I don't want to comment directly on any of the excellent points which
have been raised in this thread, but it seems like a good time to make
a plea for easier reload / restart of individual instances of cloned
services, one node at a time.  Currently, if nodes are all managed by
a configuration management system (such as Chef in our case), when the
system wants to perform a configuration run on that node (e.g. when
updating a service's configuration file from a template), it is
necessary to place the entire node in maintenance mode before
reloading or restarting that service on that node.  It works OK, but
can result in ugly effects such as the node getting stuck in
maintenance mode if the chef-client run failed, without any easy way
to track down the original cause.

I went through several design iterations before settling on this
approach, and they are detailed in a lengthy comment here, which may
help you better understand the challenges we encountered:

  https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Similar challenges are posed during upgrade of Pacemaker-managed
OpenStack infrastructure.

Cheers,
Adam