[Pacemaker] Stopping/restarting pacemaker without stopping resources?

Mon Oct 27 02:40:50 EDT 2014

On Mon, Oct 27, 2014 at 6:34 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> On 27 Oct 2014, at 2:30 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>>
>> В Mon, 27 Oct 2014 11:09:08 +1100
>> Andrew Beekhof <andrew at beekhof.net> пишет:
>>
>>>
>>>> On 25 Oct 2014, at 12:38 am, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>>>>
>>>> On Fri, Oct 24, 2014 at 9:17 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>
>>>>>> On 16 Oct 2014, at 9:31 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>>>>>>
>>>>>> The primary goal is to transparently update software in cluster. I
>>>>>> just did HA suite update using simple RPM and observed that RPM
>>>>>> attempts to restart stack (rcopenais try-restart). So
>>>>>>
>>>>>> a) if it worked, it would mean resources had been migrated from this
>>>>>> node - interruption
>>>>>>
>>>>>> b) it did not work - apparently new versions of installed utils were
>>>>>> incompatible with running pacemaker so request to shutdown crm fails
>>>>>> and openais hung forever.
>>>>>>
>>>>>> The usual workflow with one cluster products I worked before was -
>>>>>> stop cluster processes without stopping resources; update; restart
>>>>>> cluster processes. They would detect that resources are started and
>>>>>> return to the same state as before stopping. Is something like this
>>>>>> possible with pacemaker?
>>>>>
>>>>> absolutely.  this should be of some help:
>>>>>
>>>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_disconnect_and_reattach.html
>>>>>
>>>>
>>>> Did not work. It ended up moving master to another node and leaving
>>>> slave on original node stopped after that.
>>>
>>> When you stopped the cluster or when you started it after an upgrade?
>>
>> When I started it
>>
>> crm_attribute -t crm_config -n is-managed-default -v false
>> rcopenais stop on both nodes
>> rcopenais start on both node; wait for them to stabilize
>> crm_attribute -t crm_config -n is-managed-default -v true
>>
>> It stopped running master/slave, moved master and left slave stopped.
>
> What did crm_mon say before you set is-managed-default back to true?
> Did the resource agent properly detect it as running in the master state?

You are right, it returned 0, not 8.

> Did the resource agent properly (re)set a preference for being promoted during the initial monitor operation?
>

It did, but it was too late - after it had already been demoted.

> Pacemaker can do it, but it is dependant on the resources behaving correctly.
>

I see.

Well, this would be a problem ... RA keeps track of current
promoted/demoted status in CIB as transient attribute which gets reset
after reboot. This would entail quite a bit of redesign ...

But what got me confused were these errors during initial probing, like

Oct 24 17:26:54 n1 crmd[32425]:  warning: status_from_rc: Action 9
(rsc_ip_VIP_monitor_0) on n2 failed (target: 7 vs. rc: 0): Error

This looks like pacemaker does expect resource to be in stopped state
and "running" state would be interpreted as error? I mean, normal
response to such monitor response would be to stop resource to bring
it in target state, no?