[Pacemaker] DRBD promotion timeout after pacemaker stop on other node

Vladislav Bogdanov bubble at hoster-ok.com
Mon Nov 11 01:00:59 EST 2013


11.11.2013 06:32, Vladislav Bogdanov wrote:
> 11.11.2013 02:30, Andrew Beekhof wrote:
>>
>> On 5 Nov 2013, at 2:22 am, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>
>>> Hi Andrew, David, all,
>>>
>>> Just found interesting fact, don't know is it a bug or not.
>>>
>>> When doing service pacemaker stop on a node which has drbd resource
>>> promoted, that resource does not promote on another node, and promote
>>> operation timeouts.
>>>
>>> This is related to drbd fence integration with pacemaker and to
>>> insufficient default (recommended) promote timeout for drbd resource.
>>>
>>> crm-fence-peer.sh places constraint to cib one second after promote
>>> operation timeouts (promote op has 90s timeout, and crm-fence-peer.sh
>>> uses that value as a timeout, and fully utilizes it if it cannot say for
>>> sure that peer node is in a "sane" state - online or cleanly offline).
>>>
>>> It seems like increasing promote op timeout helps, but, I'd expect that
>>> to complete almost immediately, instead of waiting extra 90 seconds for
>>> nothing.
>>>
>>> Looking at crm-fence-peer.sh script, it would determine peer state as
>>> offline immediately if node state (all of)
>>> * doesn't contain "expected" tag or has it set to "down"
>>> * has "in_ccm" tag set to false
>>> * has "crmd" tag set to anything except "online"
>>>
>>> On the other hand, crmd sets "expected" = "down" only after fencing is
>>> complete (probably the same for "in_ccm"?). Shouldn't is do the same (or
>>> may be just remove that tag) if clean shutdown about to be complete?
>>
>> That would make sense.  Are you using the plugin, cman or corosync 2?

Is this ok or I miss something?

>From a8398bb73a2b66103793c360d0081589f526acf2 Mon Sep 17 00:00:00 2001
From: Vladislav Bogdanov <bubble at hoster-ok.com>
Date: Mon, 11 Nov 2013 05:59:17 +0000
Subject: [PATCH] Update node values in cib on clean shutdown

---
 crmd/callbacks.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/crmd/callbacks.c b/crmd/callbacks.c
index 3dae17b..8cabffb 100644
--- a/crmd/callbacks.c
+++ b/crmd/callbacks.c
@@ -221,7 +221,9 @@ peer_update_callback(enum crm_status_type type, crm_node_t * node, const void *d
             crm_trace("Other %p", down);
         }
 
-        update = do_update_node_cib(node, node_update_peer, NULL, __FUNCTION__);
+        update = do_update_node_cib(node,
+            node_update_peer | node_update_cluster | node_update_join | node_update_expected,
+            NULL, __FUNCTION__);
         fsa_cib_anon_update(XML_CIB_TAG_STATUS, update,
                             cib_scope_local | cib_quorum_override | cib_can_create);
         free_xml(update);
-- 
1.7.1





More information about the Pacemaker mailing list