[Pacemaker] Pengine assert in qb_log_from_external_source()

Angus Salkeld asalkeld at redhat.com
Thu Nov 29 01:36:39 EST 2012


On 29/11/12 10:15 +1100, Angus Salkeld wrote:
>On 27/11/12 09:34 +0300, Vladislav Bogdanov wrote:
>>22.11.2012 14:18, Angus Salkeld wrote:
>>>On 22/11/12 11:48 +1100, Andrew Beekhof wrote:
>>>>On Tue, Nov 20, 2012 at 5:32 PM, Vladislav Bogdanov
>>>><bubble at hoster-ok.com> wrote:
>>>>>Hi,
>>>>>
>>>>>Running 06229e9 with qb 0.14.3, and noticed following assert() in trace
>>>>>logging path:
>>>>>
>>>>>#0  0x00007f40451688a5 in raise () from /lib64/libc.so.6
>>>>>#1  0x00007f404516a085 in abort () from /lib64/libc.so.6
>>>>>#2  0x00007f4045161a1e in __assert_fail_base () from /lib64/libc.so.6
>>>>>#3  0x00007f4045161ae0 in __assert_fail () from /lib64/libc.so.6
>>>>>#4  0x00007f40445e918b in ?? () from /usr/lib64/libqb.so.0
>>>>>#5  0x00007f40445e9385 in qb_log_dcs_get () from /usr/lib64/libqb.so.0
>>>>>#6  0x00007f40445e7949 in qb_log_callsite_get () from
>>>>>/usr/lib64/libqb.so.0
>>>>>#7  0x00007f40445e7e4b in qb_log_from_external_source () from
>>>>>/usr/lib64/libqb.so.0
>>>>>#8  0x00007f4046fb12f5 in dump_node_scores_worker (level=9,
>>>>>file=0x7f4046d8e1bf "native.c", function=0x7f4046d90210
>>>>>"native_choose_node", line=148,
>>>>>    rsc=0x2411a70, comment=0x7f4046d8e4a1 "Post-utilization",
>>>>>nodes=0x26bede0) at utils.c:189
>>>>>#9  0x00007f4046d6ba65 in native_choose_node (rsc=0x2411a70,
>>>>>prefer=0x2c35b20, data_set=0x7fff24356dc0) at native.c:148
>>>>>#10 native_color (rsc=0x2411a70, prefer=0x2c35b20,
>>>>>data_set=0x7fff24356dc0) at native.c:531
>>>>>#11 0x00007f4046d7b40c in color_instance (rsc=0x2411a70,
>>>>>prefer=0x2c35b20, all_coloc=<value optimized out>,
>>>>>data_set=0x7fff24356dc0) at clone.c:430
>>>>>#12 0x00007f4046d7f459 in clone_color (rsc=0x25cde30, prefer=<value
>>>>>optimized out>, data_set=0x7fff24356dc0) at clone.c:578
>>>>>#13 0x00007f4046d6b020 in native_color (rsc=0x2624a50, prefer=0x0,
>>>>>data_set=0x7fff24356dc0) at native.c:459
>>>>>#14 0x00007f4046d5cc2f in stage5 (data_set=0x7fff24356dc0) at
>>>>>allocate.c:1130
>>>>>#15 0x00007f4046d53b3d in do_calculations (data_set=0x7fff24356dc0,
>>>>>xml_input=<value optimized out>, now=<value optimized out>) at
>>>>>pengine.c:247
>>>>>#16 0x00007f4046d54722 in process_pe_message (msg=0x2c24650,
>>>>>xml_data=0x2c08a50, sender=0x2337350) at pengine.c:126
>>>>>#17 0x000000000040124e in pe_ipc_dispatch (c=0x2337350, data=<value
>>>>>optimized out>, size=<value optimized out>) at main.c:75
>>>>>#18 0x00007f40445e3954 in ?? () from /usr/lib64/libqb.so.0
>>>>>#19 0x00007f40445e3ca4 in qb_ipcs_dispatch_connection_request () from
>>>>>/usr/lib64/libqb.so.0
>>>>>#20 0x00007f40471ef1c0 in gio_read_socket (gio=<value optimized out>,
>>>>>condition=G_IO_IN, data=0x2336b50) at mainloop.c:367
>>>>>#21 0x00007f4044a77f0e in g_main_context_dispatch () from
>>>>>/lib64/libglib-2.0.so.0
>>>>>#22 0x00007f4044a7b938 in ?? () from /lib64/libglib-2.0.so.0
>>>>>#23 0x00007f4044a7bd55 in g_main_loop_run () from
>>>>>/lib64/libglib-2.0.so.0
>>>>>#24 0x00000000004014c8 in main (argc=1, argv=0x7fff24357398) at
>>>>>main.c:159
>>>>>
>>>>>
>>>>>
>>>>>#4 should be in _log_dcs_new_cs(), I do not see another calls from
>>>>>qb_log_from_external_source() which have assert() inside.
>>>>>
>>>>>Is this pacemaker or qb problem?
>>>>
>>>>I'd be inclined to claim libqb at this point.
>>>
>>>You would wouldn't you;)
>>>
>>>So we have had a problem with ubuntu having some strange linking/stripping
>>>that has caused some problems with libqb logging.
>>>
>>>So to confirm (if this is a reproducible bug) rebuild with:
>>>
>>>    ./configure ac_cv_link_attribute_section=no
>>>
>>
>>Still wasn't able to do that, but looked at core dump more close.
>>(gdb) up 4
>>#4  0x00007fab9ceb418b in _log_dcs_new_cs (function=0x7fab9f65b210
>>"native_choose_node", filename=0x7fab9f6591bf "native.c",
>>   format=0x7fab9f88dc90 "%s: %s allocation score on %s: %s",
>>priority=<value optimized out>, lineno=148, tags=0) at log_dcs.c:70
>>70      log_dcs.c: No such file or directory.
>>       in log_dcs.c
>>(gdb) p callsite_arr_next
>>$3 = 65537
>>
>>so, qb_array_index() fails once idx spans uint16_t boundary (0xffff) and
>>(uint16_t)idx > 0.
>>IMHO this naturally means some kind of integer overflow.
>
>Well done, I'll have a closer look at it.

Patch here:
https://github.com/asalkeld/libqb/commit/30a7871646c1f5bbb602e0a01f5550a4516b36f8

-Angus

>
>>
>>Vladislav
>>
>>
>>_______________________________________________
>>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>Project Home: http://www.clusterlabs.org
>>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>Bugs: http://bugs.clusterlabs.org
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list