[Pacemaker] crmd was aborted at pacemaker 1.1.11

Kazunori INOUE kazunori.inoue3 at gmail.com
Mon Mar 17 05:51:11 EDT 2014


2014-03-17 16:37 GMT+09:00 Kazunori INOUE <kazunori.inoue3 at gmail.com>:
> 2014-03-15 4:08 GMT+09:00 David Vossel <dvossel at redhat.com>:
>>
>>
>> ----- Original Message -----
>>> From: "Kazunori INOUE" <kazunori.inoue3 at gmail.com>
>>> To: "pm" <pacemaker at oss.clusterlabs.org>
>>> Sent: Friday, March 14, 2014 5:52:38 AM
>>> Subject: [Pacemaker] crmd was aborted at pacemaker 1.1.11
>>>
>>> Hi,
>>>
>>> When specifying the node name in UPPER case and performing
>>> crm_resource, crmd was aborted.
>>> (The real node name is a LOWER case.)
>>
>> https://github.com/ClusterLabs/pacemaker/pull/462
>>
>> does that fix it?
>>
>
> Since behavior of glib is strange somehow, the result is NO.
> I tested this brunch.
> https://github.com/davidvossel/pacemaker/tree/lrm-segfault
> * Red Hat Enterprise Linux Server release 6.4 (Santiago)
> * glib2-2.22.5-7.el6.x86_64
>
> strcase_equal() is not called from g_hash_table_lookup().
>
> [x3650h ~]$ gdb /usr/libexec/pacemaker/crmd 17409
> ...snip...
> (gdb) b lrm.c:1232
> Breakpoint 1 at 0x4251d0: file lrm.c, line 1232.
> (gdb) b strcase_equal
> Breakpoint 2 at 0x429828: file lrm_state.c, line 95.
> (gdb) c
> Continuing.
>
> Breakpoint 1, do_lrm_invoke (action=288230376151711744,
> cause=C_IPC_MESSAGE, cur_state=S_NOT_DC, current_input=I_ROUTER,
> msg_data=0x7fff8d679540) at lrm.c:1232
> 1232        lrm_state = lrm_state_find(target_node);
> (gdb) s
> lrm_state_find (node_name=0x1d4c650 "X3650H") at lrm_state.c:267
> 267     {
> (gdb) n
> 268         if (!node_name) {
> (gdb) n
> 271         return g_hash_table_lookup(lrm_state_table, node_name);
> (gdb) p g_hash_table_size(lrm_state_table)
> $1 = 1
> (gdb) p (char*)((GList*)g_hash_table_get_keys(lrm_state_table))->data
> $2 = 0x1c791a0 "x3650h"
> (gdb) p node_name
> $3 = 0x1d4c650 "X3650H"
> (gdb) n
> 272     }
> (gdb) n
> do_lrm_invoke (action=288230376151711744, cause=C_IPC_MESSAGE,
> cur_state=S_NOT_DC, current_input=I_ROUTER, msg_data=0x7fff8d679540)
> at lrm.c:1234
> 1234        if (lrm_state == NULL && is_remote_node) {
> (gdb) n
> 1240        CRM_ASSERT(lrm_state != NULL);
> (gdb) n
>
> Program received signal SIGABRT, Aborted.
> 0x0000003787e328a5 in raise () from /lib64/libc.so.6
> (gdb)
>
>
> I wonder why... so I will continue investigation.
>
>

I read the code of g_hash_table_lookup().
Key is compared by the hash value generated by crm_str_hash before
strcase_equal() is performed.

*** This is quick-fix solution. ***

 crmd/lrm_state.c   |    4 ++--
 include/crm/crm.h  |    2 ++
 lib/common/utils.c |   11 +++++++++++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/crmd/lrm_state.c b/crmd/lrm_state.c
index d20d74a..ae036fd 100644
--- a/crmd/lrm_state.c
+++ b/crmd/lrm_state.c
@@ -234,13 +234,13 @@ lrm_state_init_local(void)
     }

     lrm_state_table =
-        g_hash_table_new_full(crm_str_hash, strcase_equal, NULL,
internal_lrm_state_destroy);
+        g_hash_table_new_full(crm_str_hash2, strcase_equal, NULL,
internal_lrm_state_destroy);
     if (!lrm_state_table) {
         return FALSE;
     }

     proxy_table =
-        g_hash_table_new_full(crm_str_hash, strcase_equal, NULL,
remote_proxy_free);
+        g_hash_table_new_full(crm_str_hash2, strcase_equal, NULL,
remote_proxy_free);
     if (!proxy_table) {
          g_hash_table_destroy(lrm_state_table);
         return FALSE;
diff --git a/include/crm/crm.h b/include/crm/crm.h
index b763cc0..46fe5df 100644
--- a/include/crm/crm.h
+++ b/include/crm/crm.h
@@ -195,7 +195,9 @@ typedef GList *GListPtr;
 #  include <crm/error.h>

 #  define crm_str_hash g_str_hash_traditional
+#  define crm_str_hash2 g_str_hash_traditional2

 guint g_str_hash_traditional(gconstpointer v);
+guint g_str_hash_traditional2(gconstpointer v);

 #endif
diff --git a/lib/common/utils.c b/lib/common/utils.c
index 29d7965..50fa6c0 100644
--- a/lib/common/utils.c
+++ b/lib/common/utils.c
@@ -2368,6 +2368,17 @@ g_str_hash_traditional(gconstpointer v)

     return h;
 }
+guint
+g_str_hash_traditional2(gconstpointer v)
+{
+    const signed char *p;
+    guint32 h = 0;
+
+    for (p = v; *p != '\0'; p++)
+        h = (h << 5) - h + g_ascii_tolower(*p);
+
+    return h;
+}

 void *
 find_library_function(void **handle, const char *lib, const char *fn,
gboolean fatal)


>>> # crm_resource -C -r p1 -N X3650H
>>> Cleaning up p1 on X3650H
>>> Waiting for 1 replies from the CRMdNo messages received in 60 seconds..
>>> aborting
>>>
>>> Mar 14 18:33:10 x3650h crmd[10718]:    error: crm_abort:
>>> do_lrm_invoke: Triggered fatal assert at lrm.c:1240 : lrm_state !=
>>> NULL
>>> ...snip...
>>> Mar 14 18:33:10 x3650h pacemakerd[10708]:    error: child_waitpid:
>>> Managed process 10718 (crmd) dumped core
>>>
>>>
>>> * The state before performing crm_resource.
>>> ----
>>> Stack: corosync
>>> Current DC: x3650g (3232261383) - partition with quorum
>>> Version: 1.1.10-38c5972
>>> 2 Nodes configured
>>> 3 Resources configured
>>>
>>>
>>> Online: [ x3650g x3650h ]
>>>
>>> Full list of resources:
>>>
>>> f-g     (stonith:external/ibmrsa-telnet):       Started x3650h
>>> f-h     (stonith:external/ibmrsa-telnet):       Started x3650g
>>> p1      (ocf::pacemaker:Dummy): Stopped
>>>
>>> Migration summary:
>>> * Node x3650g:
>>> * Node x3650h:
>>>    p1: migration-threshold=1 fail-count=1 last-failure='Fri Mar 14
>>> 18:32:48 2014'
>>>
>>> Failed actions:
>>>     p1_monitor_10000 on x3650h 'not running' (7): call=16,
>>> status=complete, last-rc-change='Fri Mar 14 18:32:48 2014',
>>> queued=0ms, exec=0ms
>>> ----
>>>
>>> Just for reference, similar phenomenon did not occur by crm_standby.
>>> $ crm_standby -U X3650H -v on
>>>
>>>
>>> Best Regards,
>>> Kazunori INOUE
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list