[Pacemaker] crm_mon -W crash

Thu Nov 29 02:56:00 UTC 2012

Hi David,

I confirmed that this problem was solved with your patch.
Thanks.

(12.11.29 01:06), David Vossel wrote:
>
>
> ----- Original Message -----
>> From: "Kazunori INOUE" <inouekazu at intellilink.co.jp>
>> To: "pacemaker at oss" <pacemaker at oss.clusterlabs.org>
>> Cc: shimazakik at intellilink.co.jp
>> Sent: Wednesday, November 28, 2012 2:54:56 AM
>> Subject: [Pacemaker] crm_mon -W crash
>>
>> Hi,
>>
>> I try to handle SNMP trap with crm_mon.
>> However, crm_mon crashes with SIGSEGV at the time of fencing.
>>
>> [environment]
>>     - Red Hat Enterprise Linux Server release 6.3 (Santiago)
>>     - ClusterLabs/pacemaker  9c13d14640(Nov 27, 2012)
>>     - corosync               92e0f9c7bb(Nov 07, 2012)
>>
>>     [root at dev1 ~]$ pacemakerd -F
>>     Pacemaker 1.1.8 (Build: 9c13d14)
>>      Supporting:  generated-manpages agent-manpages ascii-docs
>>      publican-docs ncurses libqb-logging libqb-ipc lha-fencing
>>       corosync-native snmp
>>
>>     [root at dev1 ~]$ rpm -qi net-snmp-libs
>>     Name        : net-snmp-libs              Relocations: (not
>>     relocatable)
>>     Version     : 5.5                             Vendor: Red Hat,
>>     Inc.
>>     Release     : 41.el6                      Build Date: Fri May 18
>>     19:20:24 2012
>>     Install Date: Mon Jul  2 14:15:53 2012       Build Host:
>>     x86-003.build.bos.redhat.com
>>     -snip-
>>
>>
>> [test case]
>> 1. set only STONITH resources and perform crm_mon with -W option.
>>
>>      [root at dev1 ~]$ crm_mon -S 192.168.133.148 -W
>>      Last updated: Wed Nov 28 11:44:46 2012
>>      Last change: Wed Nov 28 11:44:35 2012 via cibadmin on dev1
>>      Stack: corosync
>>      Current DC: dev1 (2506467520) - partition with quorum
>>      Version: 1.1.8-9c13d14
>>      2 Nodes configured, unknown expected votes
>>      2 Resources configured.
>>
>>
>>      Online: [ dev1 dev2 ]
>>
>>      prmStonith1     (stonith:external/libvirt):     Started dev2
>>      prmStonith2     (stonith:external/libvirt):     Started dev1
>>
>> 2. fence the dev2.
>>
>>      [root at dev1 ~]$ crm node fence dev2
>>      Do you really want to shoot dev2? y
>>
>> 3. then, crm_mon crashed.
>>
>>      [root at dev1 ~]$ crm_mon -S 192.168.133.148 -W
>>      Last updated: Wed Nov 28 11:45:32 2012
>>      Last change: Wed Nov 28 11:44:35 2012 via cibadmin on dev1
>>      Stack: corosync
>>      Current DC: dev1 (2506467520) - partition WITHOUT quorum
>>      Version: 1.1.8-9c13d14
>>      2 Nodes configured, unknown expected votes
>>      2 Resources configured.
>>
>>
>>      Node dev2 (2472913088): UNCLEAN (offline)
>>      Online: [ dev1 ]
>>
>>      prmStonith1     (stonith:external/libvirt):     Started dev2
>>      prmStonith2     (stonith:external/libvirt):     Started dev1
>>      Segmentation fault (core dumped)
>>      [root at dev1 ~]$
>>
>>
>> GDB shows this:
>>     [root at dev1 ~]$ gdb `which crm_mon` core.28326
>>     GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
>>     -snip-
>>     Core was generated by `crm_mon -S 192.168.133.148 -W'.
>>     Program terminated with signal 11, Segmentation fault.
>>     #0  0x0000003f808805a1 in __strlen_sse2 () from /lib64/libc.so.6
>>     -snip-
>>     (gdb) bt
>>     #0  0x0000003f808805a1 in __strlen_sse2 () from /lib64/libc.so.6
>>     #1  0x0000003f81c39481 in snmp_add_var () from
>>     /usr/lib64/libnetsnmp.so.20
>>     #2  0x00000000004099ba in send_snmp_trap (node=0x24408a0 "dev2",
>>     rsc=0x0, task=0x245c850 "st_notify_fence", target_rc=0, rc=0,
>>     status=0, desc=0x2462b80 "Operation st_notify_fence requested by
>>     dev1 for peer dev2: OK
>>     (ref=c520e07b-907f-48b9-a216-4786289b61da)") at crm_mon.c:1716
>>     #3  0x000000000040af6b in mon_st_callback (st=0x2409520,
>>     e=0x243aa30) at crm_mon.c:2241
>>     #4  0x00007fc23639598d in stonith_send_notification
>>     (data=0x2408390, user_data=0x7fff159e1410) at st_client.c:1960
>>     #5  0x000000364263688c in g_list_foreach () from
>>     /lib64/libglib-2.0.so.0
>>     #6  0x00007fc236396638 in stonith_dispatch_internal
>>     (buffer=0x2429a08 "<notify t=\"st_notify\"
>>     subt=\"st_notify_fence\" st_op=\"st_notify_fence\"
>>     st_rc=\"0\"><st_calldata><st_notify_fence state=\"2\" st_rc=\"0\"
>>     st_target=\"dev2\" st_device_action=\"reboot\"
>>     st_delegate=\"dev1\" st_remote"..., length=387,
>>     userdata=0x2409520) at st_client.c:2128
>>     #7  0x00007fc235d03391 in mainloop_gio_callback (gio=0x2433fe0,
>>     condition=G_IO_IN, data=0x240a2e0) at mainloop.c:565
>>     #8  0x0000003642638f0e in g_main_context_dispatch () from
>>     /lib64/libglib-2.0.so.0
>>     #9  0x000000364263c938 in ?? () from /lib64/libglib-2.0.so.0
>>     #10 0x000000364263cd55 in g_main_loop_run () from
>>     /lib64/libglib-2.0.so.0
>>     #11 0x0000000000404e23 in main (argc=4, argv=0x7fff159e1778) at
>>     crm_mon.c:590
>>     (gdb)
>
> Looking at the back trace, this might fix it.
>
>
> diff --git a/tools/crm_mon.c b/tools/crm_mon.c
> index 2e2ca16..5c2e687 100644
> --- a/tools/crm_mon.c
> +++ b/tools/crm_mon.c
> @@ -1713,7 +1713,9 @@ send_snmp_trap(const char *node, const char *rsc, const char *task, int target_r
>       }
>
>       /* Add extries to the trap */
> -    add_snmp_field(trap_pdu, snmp_crm_oid_rsc, rsc);
> +    if (rsc) {
> +        add_snmp_field(trap_pdu, snmp_crm_oid_rsc, rsc);
> +    }
>       add_snmp_field(trap_pdu, snmp_crm_oid_node, node);
>       add_snmp_field(trap_pdu, snmp_crm_oid_task, task);
>       add_snmp_field(trap_pdu, snmp_crm_oid_desc, desc);
>
>
>> Is this a known issue?
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org