[Pacemaker] lrmd segfault at pacemaker 1.1.11-rc1

Kazunori INOUE kazunori.inoue3 at gmail.com
Tue Dec 17 06:43:53 EST 2013


Hi,

When repeated 'node standby' and 'node online', lrmd crashed with
SIGSEGV because "op->id" in cancel_recurring_action() was NULL.

Dec 17 19:01:21 vm3 crmd[2433]:     info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Dec 17 19:01:21 vm3 crmd[2433]:     info: do_te_invoke: Processing
graph 437 (ref=pe_calc-dc-1387274481-5672) derived from
/var/lib/pacemaker/pengine/pe-input-437.bz2
Dec 17 19:01:21 vm3 crmd[2433]:   notice: te_rsc_command: Initiating
action 17: stop prmStonith4_stop_0 on vm3 (local)
Dec 17 19:01:21 vm3 crmd[2433]:     info: do_lrm_rsc_op: Performing
key=17:437:0:40d7b9a2-c373-4459-a811-9c225d1a9555
op=prmStonith4_stop_0
Dec 17 19:01:21 vm3 lrmd[2430]:     info: log_execute: executing -
rsc:prmStonith4 action:stop call_id:3487
Dec 17 19:01:21 vm3 stonith-ng[2429]:     info: stonith_command:
Processed st_device_remove from lrmd.2430: OK (0)
Dec 17 19:01:21 vm3 lrmd[2430]:     info: log_finished: finished -
rsc:prmStonith4 action:stop call_id:3487  exit-code:0 exec-time:0ms
queue-time:0ms
Dec 17 19:01:21 vm3 pengine[2432]:   notice: process_pe_message:
Calculated Transition 437: /var/lib/pacemaker/pengine/pe-input-437.bz2
Dec 17 19:01:21 vm3 crmd[2433]:   notice: te_rsc_command: Initiating
action 33: stop prmPg_stop_0 on vm3 (local)
Dec 17 19:01:21 vm3 lrmd[2430]:     info: cancel_recurring_action:
Cancelling operation prmPg_monitor_10000
Dec 17 19:01:21 vm3 crmd[2433]:     info: do_lrm_rsc_op: Performing
key=33:437:0:40d7b9a2-c373-4459-a811-9c225d1a9555 op=prmPg_stop_0
Dec 17 19:01:21 vm3 lrmd[2430]:     info: log_execute: executing -
rsc:prmPg action:stop call_id:3489
Dec 17 19:01:21 vm3 crmd[2433]:     info: process_lrm_event: LRM
operation prmStonith4_monitor_3600000 (call=3473, status=1,
cib-update=0, confirmed=true) Cancelled
Dec 17 19:01:21 vm3 crmd[2433]:   notice: process_lrm_event: LRM
operation prmStonith4_stop_0 (call=3487, rc=0, cib-update=3090,
confirmed=true) ok
Dec 17 19:01:21 vm3 crmd[2433]:     info: process_lrm_event: LRM
operation prmPg_monitor_10000 (call=3485, status=1, cib-update=0,
confirmed=true) Cancelled
Dec 17 19:01:21 vm3 crmd[2433]:     info: match_graph_event: Action
prmStonith4_stop_0 (17) confirmed on vm3 (rc=0)
Dec 17 19:01:21 vm3 crmd[2433]:   notice: te_rsc_command: Initiating
action 40: stop prmPing_stop_0 on vm3 (local)
Dec 17 19:01:21 vm3 cib[2428]:     info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=local/crmd/3090, version=0.440.2)
Dec 17 19:01:21 vm3 stonith-ng[2429]:     info: crm_client_destroy:
Destroying 0 events
Dec 17 19:01:21 vm3 pacemakerd[2424]:    error: child_death_dispatch:
Managed process 2430 (lrmd) dumped core
Dec 17 19:01:21 vm3 pacemakerd[2424]:   notice: pcmk_child_exit: Child
process lrmd terminated with signal 11 (pid=2430, core=1)
Dec 17 19:01:21 vm3 pacemakerd[2424]:   notice: pcmk_process_exit:
Respawning failed child process: lrmd
Dec 17 19:01:21 vm3 pacemakerd[2424]:    error: pcmk_process_exit:
Rebooting system
Dec 17 19:10:40 vm3 root: Mark:pcmk:1387275040

$ gdb /usr/libexec/pacemaker/lrmd core.2430
(gdb) bt
#0  0x000000323f8480ac in vfprintf () from /lib64/libc.so.6
#1  0x000000323f86f9d2 in vsnprintf () from /lib64/libc.so.6
#2  0x0000003fcb81726d in qb_log_real_va_ (cs=0x3fcf208658,
ap=0x7ffff6f5fc80) at log.c:230
#3  0x0000003fcb8173ea in qb_log_real_ (cs=0x3fcf208658) at log.c:255
#4  0x0000003fcf003a9c in cancel_recurring_action (op=0xb9fae0) at
services.c:356
#5  0x0000003fcf003bc6 in services_action_cancel (name=0xb9f350
"prmPing", action=0xb9ee90 "monitor", interval=10000) at
services.c:381
#6  0x0000000000406595 in cancel_op (rsc_id=0xb9f350 "prmPing",
action=0xb9ee90 "monitor", interval=10000) at lrmd.c:1197
#7  0x00000000004067aa in process_lrmd_rsc_cancel (client=0xb926c0,
id=7030, request=0xb95ad0) at lrmd.c:1261
#8  0x0000000000406a51 in process_lrmd_message (client=0xb926c0,
id=7030, request=0xb95ad0) at lrmd.c:1300
#9  0x0000000000402a06 in lrmd_ipc_dispatch (c=0xb91af0,
data=0x7f9f30acbc08, size=362) at main.c:141
#10 0x0000003fcb8126f8 in _process_request_ (c=0xb91af0,
ms_timeout=10) at ipcs.c:698
#11 0x0000003fcb812ad5 in qb_ipcs_dispatch_connection_request (fd=5,
revents=1, data=0xb91af0) at ipcs.c:801
#12 0x0000003fcc0327b1 in gio_read_socket (gio=0xb92880,
condition=G_IO_IN, data=0xb91138) at mainloop.c:437
#13 0x0000003fc9c3feb2 in g_main_context_dispatch () from
/lib64/libglib-2.0.so.0
#14 0x0000003fc9c43d68 in ?? () from /lib64/libglib-2.0.so.0
#15 0x0000003fc9c44275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#16 0x00000000004030cc in main (argc=1, argv=0x7ffff6f606c8) at main.c:314

Although I'm investigating the cause, I have not discovered yet...

Because size was big, I put crm_report here.
https://drive.google.com/file/d/0B9eNn1AWfKD4WGY5bllMQW1BbDA/edit?usp=sharing

Best Regards,
Kazunori INOUE




More information about the Pacemaker mailing list