[Pacemaker] Pacemaker core dumps

Andrew Beekhof andrew at beekhof.net
Mon May 6 04:46:06 UTC 2013


It was tripping over the '\033' escape character in lrmd_rsc_output ("tomcat6 (pid 3199) is running...\033[60G[\033[0;32m  OK \033[0;39m]...")

I'll commit the following patch shortly, thanks for reporting this and following up!


diff --git a/lib/common/xml.c b/lib/common/xml.c
index b6df79f..7585c46 100644
--- a/lib/common/xml.c
+++ b/lib/common/xml.c
@@ -1011,6 +1011,15 @@ crm_xml_escape(const char *text)
                 copy = crm_xml_escape_shuffle(copy, index, &length, "&");
                 changes++;
                 break;
+            default:
+                /* Check for and replace non-printing characters with underscores */
+                if(copy[index] == 0) {
+                    break;
+                } else if(copy[index] < ' ') {
+                    copy = crm_xml_escape_shuffle(copy, index, &length, "_");
+                } else if(copy[index] > '~') {
+                    copy = crm_xml_escape_shuffle(copy, index, &length, "_");
+                }
         }
     }
 


On 03/05/2013, at 11:13 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:

> Here it is:
> 
> (gdb) bt
> #0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00007f81896ae085 in abort () at abort.c:92
> #2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650, 
>    assert_condition=0x7f818bbaa01a "String parsing error", do_core=<value optimized out>, do_fork=<value optimized out>) at utils.c:1073
> #3  0x00007f818bb933af in string2xml (
>    input=0x1e745f8 "<lrmd_notify lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_del"...) at xml.c:650
> #4  0x00007f818b76a2fc in lrmd_ipc_dispatch (buffer=<value optimized out>, length=<value optimized out>, userdata=0x1e72910) at lrmd_client.c:310
> #5  0x00007f818bba2e90 in mainloop_gio_callback (gio=<value optimized out>, condition=G_IO_IN, data=0x1e73be0) at mainloop.c:585
> #6  0x00007f8188fbbf0e in g_main_dispatch (context=0x1d4f120) at gmain.c:1960
> #7  IA__g_main_context_dispatch (context=0x1d4f120) at gmain.c:2513
> #8  0x00007f8188fbf938 in g_main_context_iterate (context=0x1d4f120, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2591
> #9  0x00007f8188fbfd55 in IA__g_main_loop_run (loop=0x1e734a0) at gmain.c:2799
> #10 0x00000000004052ce in crmd_init () at main.c:154
> #11 0x00000000004055cc in main (argc=1, argv=0x7fffe77a4f88) at main.c:120
> (gdb) up
> #1  0x00007f81896ae085 in abort () at abort.c:92
> 92            raise (SIGABRT);
> (gdb) up
> #2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650, 
>    assert_condition=0x7f818bbaa01a "String parsing error", do_core=<value optimized out>, do_fork=<value optimized out>) at utils.c:1073
> 1073                abort();
> (gdb) up
> #3  0x00007f818bb933af in string2xml (
>    input=0x1e745f8 "<lrmd_notify lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_del"...) at xml.c:650
> 650                 crm_abort(__FILE__, __PRETTY_FUNCTION__, __LINE__, "String parsing error", TRUE, TRUE);
> (gdb) print input
> $1 = 0x1e745f8 "<lrmd_notify lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_del"...
> (gdb) print input+100
> $2 = 0x1e7465c "rmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_deleted=\"0\" lrmd_run_time=\"0\" lrmd_rcchange_time=\"0\" lrmd_exec_time=\"0\" lrmd_queue_time=\"0\" lrmd_op=\"lr"...
> (gdb) print input+200
> $3 = 0x1e746c0 "eted=\"0\" lrmd_run_time=\"0\" lrmd_rcchange_time=\"0\" lrmd_exec_time=\"0\" lrmd_queue_time=\"0\" lrmd_op=\"lrmd_rsc_exec\" lrmd_rsc_id=\"res_tomcat6_1\" lrmd_rsc_action=\"monitor\" lrmd_rsc_userdata_str=\"4:664:0:59"...
> (gdb) print input+300
> $4 = 0x1e74724 "md_rsc_exec\" lrmd_rsc_id=\"res_tomcat6_1\" lrmd_rsc_action=\"monitor\" lrmd_rsc_userdata_str=\"4:664:0:596925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" lrmd_rsc_output=\"tomcat6 (pid 3199) is running...\033[60G[\033[0;32m  "...
> (gdb) print input+400
> $5 = 0x1e74788 "6925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" lrmd_rsc_output=\"tomcat6 (pid 3199) is running...\033[60G[\033[0;32m  OK  \033[0;39m]\r\n\"><attributes CRM_meta_OCF_CHECK_LEVEL=\"0\" CRM_meta_name=\"monitor\" crm_feature_set=\"3."...
> (gdb) print input+500
> $6 = 0x1e747ec "OK  \033[0;39m]\r\n\"><attributes CRM_meta_OCF_CHECK_LEVEL=\"0\" CRM_meta_name=\"monitor\" crm_feature_set=\"3.0.7\" OCF_CHECK_LEVEL=\"0\" CRM_meta_interval=\"15000\" CRM_meta_timeout=\"30000\" CRM_meta_start_delay=\"15"...
> (gdb) print input+600
> $7 = 0x1e74850 "0.7\" OCF_CHECK_LEVEL=\"0\" CRM_meta_interval=\"15000\" CRM_meta_timeout=\"30000\" CRM_meta_start_delay=\"15000\"/></lrmd_notify>"
> 
> Xavier Lashmar
> X2120
> 
> 
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net] 
> Sent: Thursday, May 2, 2013 7:38 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Pacemaker core dumps
> 
> 
> On 02/05/2013, at 11:37 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
> 
>> Ah, finally got it.
> 
> Can you go to frame 3 (up <ret> up <ret> up <ret>) and run print input print input+100 print input+200 ...etc...
> 
> until you reach the end of the string?
> 
> Then I'll be able to reproduce (and fix) locally.
> 
>> 
>> Core was generated by `/usr/libexec/pacemaker/crmd'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> 64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>> Missing separate debuginfos, use: debuginfo-install 
>> libtool-ltdl-2.2.6-15.5.el6.x86_64
>> (gdb) bt
>> #0  0x00007f81896ac8a5 in raise (sig=6) at 
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #1  0x00007f81896ae085 in abort () at abort.c:92
>> #2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650, 
>>   assert_condition=0x7f818bbaa01a "String parsing error", 
>> do_core=<value optimized out>, do_fork=<value optimized out>) at 
>> utils.c:1073
>> #3  0x00007f818bb933af in string2xml (
>>   input=0x1e745f8 "<lrmd_notify 
>> lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" 
>> lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" 
>> lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"2747\" 
>> lrmd_rsc_del"...) at xml.c:650
>> #4  0x00007f818b76a2fc in lrmd_ipc_dispatch (buffer=<value optimized 
>> out>, length=<value optimized out>, userdata=0x1e72910) at 
>> lrmd_client.c:310
>> #5  0x00007f818bba2e90 in mainloop_gio_callback (gio=<value optimized 
>> out>, condition=G_IO_IN, data=0x1e73be0) at mainloop.c:585
>> #6  0x00007f8188fbbf0e in g_main_dispatch (context=0x1d4f120) at 
>> gmain.c:1960
>> #7  IA__g_main_context_dispatch (context=0x1d4f120) at gmain.c:2513
>> #8  0x00007f8188fbf938 in g_main_context_iterate (context=0x1d4f120, 
>> block=1, dispatch=1, self=<value optimized out>) at gmain.c:2591
>> #9  0x00007f8188fbfd55 in IA__g_main_loop_run (loop=0x1e734a0) at 
>> gmain.c:2799
>> #10 0x00000000004052ce in crmd_init () at main.c:154
>> #11 0x00000000004055cc in main (argc=1, argv=0x7fffe77a4f88) at 
>> main.c:120
>> 
>> 
>> Xavier Lashmar
>> Analyste de Systèmes | Systems Analyst Service étudiants, service de 
>> l'informatique et des communications/Student services, computing and communications services.
>> 1 Nicholas Street (810)
>> Ottawa ON K1N 7B7
>> Tél. | Tel. 613-562-5800 (2120)
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>> Sent: Wednesday, May 1, 2013 7:07 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Pacemaker core dumps
>> 
>> 
>> On 01/05/2013, at 11:36 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
>> 
>>> I'm not sure if anyone has run into this issue but I can't seem to 
>>> find a debuginfo package for one of the libraries for CentOS 6.3 with 
>>> Kernel 2.6.32-279.9.1el6.x86_64 : libtool-ltdl
>>> 
>>> Here's what I get so far from the core dump, but I think it's incomplete:
>>> 
>>> ...
>>> ...
>>> ...
>>> Reading symbols from /lib64/libfreebl3.so...
>>> warning: the debug information found in "/usr/lib/debug//lib64/libfreebl3.so.debug" does not match "/lib64/libfreebl3.so" (CRC mismatch).
>>> 
>>> warning: the debug information found in "/usr/lib/debug/lib64/libfreebl3.so.debug" does not match "/lib64/libfreebl3.so" (CRC mismatch).
>>> 
>>> Missing separate debuginfo for /lib64/libfreebl3.so
>>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>>> /usr/lib/debug/.build-id/68/195872ecfb188389d29aaf01031a976fd18168.de
>>> b
>>> ug
>>> (no debugging symbols found)...done.
>>> Loaded symbols for /lib64/libfreebl3.so Reading symbols from 
>>> /lib64/libnss_files-2.12.so...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.12.so.debug...done.
>>> done.
>>> Loaded symbols for /lib64/libnss_files-2.12.so Core was generated by 
>>> `/usr/libexec/pacemaker/crmd'.
>>> Program terminated with signal 6, Aborted.
>>> #0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>>> 64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>>> Missing separate debuginfos, use: debuginfo-install
>>> libtool-ltdl-2.2.6-15.5.el6.x86_64
>>> 
>>> Any info about either finding the right debuginfo files, or about the error itself would be greatly appreciated.
>> 
>> The libtool parts aren't so interesting.
>> Were there no other frames? (lines starting with # and a number)
>> 
>>> 
>>> Xavier Lashmar
>>> Analyste de Systèmes | Systems Analyst Service étudiants, service de 
>>> l'informatique et des communications/Student services, computing and communications services.
>>> 1 Nicholas Street (810)
>>> Ottawa ON K1N 7B7
>>> Tél. | Tel. 613-562-5800 (2120)
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>>> Sent: Monday, April 29, 2013 11:00 PM
>>> To: The Pacemaker cluster resource manager
>>> Subject: Re: [Pacemaker] Pacemaker core dumps
>>> 
>>> 
>>> On 30/04/2013, at 1:32 AM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
>>> 
>>>> Hello Andrew,
>>>> 
>>>> Thanks for your help.  We've upgrade to pacemaker 1.1.9 and still have the same issue.  
>>> 
>>> Thats a disappointing but useful data point.
>>> 
>>>> 
>>>> We are trying to get the core information but we are missing some debuginfo files which we are trying to get our hands on.  I'll try to forward this information soon.   
>>> 
>>> Great
>>> 
>>>> 
>>>> Is there something we need to do to the CIB when we upgrade?
>>> 
>>> No, anything that needs to happen will be done under the hood.
>>> 
>>>> 
>>>> 
>>>> Xavier Lashmar
>>>> Analyste de Systèmes | Systems Analyst Service étudiants, service de 
>>>> l'informatique et des communications/Student services, computing and communications services.
>>>> 1 Nicholas Street (810)
>>>> Ottawa ON K1N 7B7
>>>> Tél. | Tel. 613-562-5800 (2120)
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>>>> Sent: Thursday, April 25, 2013 8:15 PM
>>>> To: The Pacemaker cluster resource manager
>>>> Subject: Re: [Pacemaker] Pacemaker core dumps
>>>> 
>>>> 
>>>> On 26/04/2013, at 10:06 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>> 
>>>>> 
>>>>> On 25/04/2013, at 11:59 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
>>>>> 
>>>>>> Following further investigation, we were able to determine that upgrading both nodes (in a two node cluster) from Pacemaker 1.1.7-6 to Pacemaker 1.1.8-7 (CentOS 6.3 or Centos 6.4) caused these errors to begin happening:
>>>>> 
>>>>> Would you be able to try the 1.1.9 packages from http://www.clusterlabs.org/rpm-next to see if they are also affected?
>>>>> 
>>>>>> 
>>>>>> We were able to replicate the initiation of the errors by upgrading another cluster in the same manner.  This other cluster is now experiencing the same core-dumping and errors as the previous cluster:
>>>>>> 
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : invalid character in attribute value
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : attributes construct error
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : Couldn't find end of Start Tag lrmd_notify line 1
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : Extra content at the end of the document
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:  warning: string2xml: Parsing 
>>>>>> failed (domain=1, level=3, code=5): Extra content at the end of 
>>>>>> the document Apr 25 09:46:22 xxxx crmd[1764]:  warning: string2xml: String start:
>>>>>> <lrmd_notify lrmd_origin="send_cmd_complete_notify
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:  warning: string2xml: String start+688: 0000" CRM_meta_start_delay="15000"/></lrmd_notify>
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_abort: string2xml: Forked child 4182 to record non-fatal assert at xml.c:605 : String parsing error
>>>> 
>>>> Also, it would be very useful if you could open up the core file for
>>>> 4182 and print the contents of the input passed to string2xml() 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org Getting started: 
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org Getting started: 
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list