[Pacemaker] node can't join cluster after reboot
Vladislav Bogdanov
bubble at hoster-ok.com
Sun Nov 4 10:54:12 UTC 2012
03.11.2012 18:22, Vladimir Elisseev wrote:
> Vladislav,
>
> Thanks for the hint! Upgrading glig from 2.30.3 to 2.32.4 triggers this
> behavior of corosync. Do you know where I can find more info regarding
> this problem?
That is not corosync but pacemaker, which heavily uses glib internally.
And glib is the only package in your list which may affect pacemaker.
I would say that is a regression in that specific glib version or build.
Library behavior changed without bumping major so-number.
You'd better talk to your distribution maintainers. And -r1 looks
suspicious in glib version you installed. Don't you know what does it mean?
One more note, cib exits with signal 6 (SIGABRT), which usually means
you hit some assert in code. That usually results in memory dump. Look
at /var/lib/heartbeat/cores or /var/lib/pacemaker/cores if you have
relevant core files for that. If not, then you need to enable coredumps.
Then install debuginfo packages for pacemaker and glib (that is very
distribution specific, so I cannot help with that). After that you can
analyze relevant core files with 'gdb <full_path_to_cib_binary>
<core_dump_file>'
Just run 'bt full' and that should be enough to find what exactly code
path caused SIGABRT.
Vladislav
>
> Vlad.
>
> On Sat, 2012-11-03 at 16:22 +0300, Vladislav Bogdanov wrote:
>> 03.11.2012 15:26, Vladimir Elisseev wrote:
>>> I've been able to reproduce the problem. Herewith I've attached
>>> crm_report tarballs from both nodes. Although I don't know what
>>> particular package triggers this problem, but below is the list of what
>>> has been updated. Hopefully this helps.
>>
>> I bet that is glib.
>>
>> Vladislav
>>
>>>
>>> Regards,
>>> Vlad.
>>>
>>> Sat Nov 3 12:15:40 2012 <<< sys-apps/busybox-1.20.2
>>> Sat Nov 3 12:15:42 2012 >>> sys-apps/busybox-1.20.2
>>> Sat Nov 3 12:15:50 2012 <<< sys-fs/dosfstools-3.0.9
>>> Sat Nov 3 12:15:52 2012 >>> sys-fs/dosfstools-3.0.12
>>> Sat Nov 3 12:16:00 2012 <<< dev-lang/nasm-2.10.01
>>> Sat Nov 3 12:16:02 2012 >>> dev-lang/nasm-2.10.05
>>> Sat Nov 3 12:16:11 2012 <<< dev-libs/libgamin-0.1.10-r2
>>> Sat Nov 3 12:16:13 2012 >>> dev-libs/libgamin-0.1.10-r3
>>> Sat Nov 3 12:16:40 2012 <<< media-fonts/droid-113-r1
>>> Sat Nov 3 12:16:46 2012 >>> media-fonts/droid-113-r2
>>> Sat Nov 3 12:16:54 2012 <<< media-libs/libpng-1.5.10
>>> Sat Nov 3 12:16:56 2012 >>> media-libs/libpng-1.5.13-r1
>>> Sat Nov 3 12:17:04 2012 <<< app-arch/unzip-6.0-r1
>>> Sat Nov 3 12:17:05 2012 >>> app-arch/unzip-6.0-r3
>>> Sat Nov 3 12:17:12 2012 <<< app-arch/rpm2targz-9.0.0.4g
>>> Sat Nov 3 12:17:14 2012 >>> app-arch/rpm2targz-9.0.0.5g
>>> Sat Nov 3 12:17:22 2012 <<< app-arch/pbzip2-1.1.5
>>> Sat Nov 3 12:17:24 2012 >>> app-arch/pbzip2-1.1.8
>>> Sat Nov 3 12:17:34 2012 <<< app-arch/zip-3.0
>>> Sat Nov 3 12:17:35 2012 >>> app-arch/zip-3.0-r1
>>> Sat Nov 3 12:17:43 2012 <<< sys-process/htop-1.0.1
>>> Sat Nov 3 12:17:45 2012 >>> sys-process/htop-1.0.1-r1
>>> Sat Nov 3 12:17:55 2012 <<< media-libs/tiff-4.0.2
>>> Sat Nov 3 12:17:57 2012 >>> media-libs/tiff-4.0.2-r1
>>> Sat Nov 3 12:18:04 2012 <<< net-ftp/tftp-hpa-5.1
>>> Sat Nov 3 12:18:06 2012 >>> net-ftp/tftp-hpa-5.2
>>> Sat Nov 3 12:18:18 2012 <<< media-video/ffmpeg-0.10.3
>>> Sat Nov 3 12:18:20 2012 >>> media-video/ffmpeg-0.10.3
>>> Sat Nov 3 12:18:35 2012 <<< sys-devel/gettext-0.18.1.1-r1
>>> Sat Nov 3 12:18:37 2012 >>> sys-devel/gettext-0.18.1.1-r3
>>> Sat Nov 3 12:18:44 2012 <<< app-admin/logrotate-3.8.1
>>> Sat Nov 3 12:18:46 2012 >>> app-admin/logrotate-3.8.2
>>> Sat Nov 3 12:18:54 2012 <<< media-libs/libwebp-0.1.3
>>> Sat Nov 3 12:18:55 2012 >>> media-libs/libwebp-0.2.0
>>> Sat Nov 3 12:19:03 2012 <<< dev-perl/Convert-ASN1-0.220.0
>>> Sat Nov 3 12:19:05 2012 >>> dev-perl/Convert-ASN1-0.260.0
>>> Sat Nov 3 12:19:13 2012 <<< dev-perl/net-server-0.97
>>> Sat Nov 3 12:19:15 2012 >>> dev-perl/net-server-2.6.0
>>> Sat Nov 3 12:19:24 2012 <<< dev-perl/Config-IniFiles-2.710.0
>>> Sat Nov 3 12:19:26 2012 >>> dev-perl/Config-IniFiles-2.760.0
>>> Sat Nov 3 12:19:33 2012 <<< dev-perl/HTTP-Date-6.0.0
>>> Sat Nov 3 12:19:35 2012 >>> dev-perl/HTTP-Date-6.20.0
>>> Sat Nov 3 12:19:44 2012 <<< sys-boot/syslinux-4.06_pre11
>>> Sat Nov 3 12:19:46 2012 >>> sys-boot/syslinux-4.06
>>> Sat Nov 3 12:20:05 2012 <<< dev-libs/glib-2.30.3
>>> Sat Nov 3 12:20:08 2012 >>> dev-libs/glib-2.32.4-r1
>>> Sat Nov 3 12:20:16 2012 <<< dev-util/pkgconfig-0.27
>>> Sat Nov 3 12:20:18 2012 >>> dev-util/pkgconfig-0.27.1
>>> Sat Nov 3 12:20:28 2012 <<< net-analyzer/jnettop-0.13.0-r1
>>> Sat Nov 3 12:20:29 2012 >>> net-analyzer/jnettop-0.13.0-r1
>>> Sat Nov 3 12:20:41 2012 <<< x11-libs/pango-1.29.4
>>> Sat Nov 3 12:20:43 2012 >>> x11-libs/pango-1.30.1
>>> Sat Nov 3 12:20:53 2012 <<< net-analyzer/rrdtool-1.4.5-r1
>>> Sat Nov 3 12:20:56 2012 >>> net-analyzer/rrdtool-1.4.7-r1
>>> Sat Nov 3 12:21:03 2012 <<< app-shells/gentoo-bashcomp-20101217
>>> Sat Nov 3 12:21:05 2012 >>> app-shells/gentoo-bashcomp-20101217-r1
>>> Sat Nov 3 12:21:12 2012 <<< dev-perl/MIME-tools-5.502.0
>>> Sat Nov 3 12:21:14 2012 >>> dev-perl/MIME-tools-5.503.0
>>> Sat Nov 3 12:21:24 2012 <<< dev-perl/Convert-TNEF-0.170.0
>>> Sat Nov 3 12:21:26 2012 >>> dev-perl/Convert-TNEF-0.180.0
>>> Sat Nov 3 12:21:35 2012 <<< net-misc/curl-7.25.0-r1
>>> Sat Nov 3 12:21:36 2012 >>> net-misc/curl-7.26.0
>>> Sat Nov 3 12:21:51 2012 <<< mail-mta/postfix-2.9.3
>>> Sat Nov 3 12:21:53 2012 >>> mail-mta/postfix-2.9.4
>>> Sat Nov 3 12:22:01 2012 <<< dev-perl/Net-SSLeay-1.360.0
>>> Sat Nov 3 12:22:03 2012 >>> dev-perl/Net-SSLeay-1.480.0-r1
>>> Sat Nov 3 12:22:12 2012 <<< sys-auth/nss_ldap-264-r1
>>> Sat Nov 3 12:22:14 2012 >>> sys-auth/nss_ldap-265-r1
>>> Sat Nov 3 12:22:25 2012 <<< net-mail/fetchmail-6.3.21
>>> Sat Nov 3 12:22:27 2012 >>> net-mail/fetchmail-6.3.22
>>> Sat Nov 3 12:22:37 2012 <<< net-misc/dhcp-4.2.4_p1
>>> Sat Nov 3 12:22:39 2012 >>> net-misc/dhcp-4.2.4_p2
>>> Sat Nov 3 12:22:48 2012 <<< net-analyzer/tcpdump-3.9.8-r1
>>> Sat Nov 3 12:22:50 2012 >>> net-analyzer/tcpdump-4.3.0
>>> Sat Nov 3 12:23:07 2012 <<< dev-util/cmake-2.8.8-r3
>>> Sat Nov 3 12:23:09 2012 >>> dev-util/cmake-2.8.9
>>> Sat Nov 3 12:23:21 2012 <<< dev-vcs/subversion-1.6.17-r7
>>> Sat Nov 3 12:23:24 2012 >>> dev-vcs/subversion-1.6.17-r7
>>> Sat Nov 3 12:27:56 2012 <<< media-gfx/imagemagick-6.7.8.7
>>> Sat Nov 3 12:27:58 2012 >>> media-gfx/imagemagick-6.7.8.7
>>>
>>>
>>>
>>> On Thu, 2012-11-01 at 07:08 +0100, Vladimir Elisseev wrote:
>>>> Yes, hb_report is there, thanks!
>>>>
>>>> On Thu, 2012-11-01 at 11:40 +1100, Andrew Beekhof wrote:
>>>>> On Tue, Oct 30, 2012 at 4:35 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
>>>>>> Thanks for trying to help! Currently I can't provide crm_report from the
>>>>>> failed node, as I've decided to restore the complete node from backup.
>>>>>> The versions I use are corosync-1.3.0 and pacemaker-1.0.10. Actually the
>>>>>> problem occurred after updating quiet a few system packages, but all the
>>>>>> cluster related software was untouched. I've found exactly the same
>>>>>> issue described in the mailing list earlier:
>>>>>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/77881?do=post_view_threaded#77881
>>>>>> At least symptoms are exactly the same as well as pasted log files. I've
>>>>>> tried enable debug logging as well and saw that crm tries to connect to
>>>>>> cib sockets (/var/run/crm_*) too early (IMO) and fails because cib
>>>>>> wasn't started yet.
>>>>>> I'm planning to repeat update of these system again, but I'll do this
>>>>>> more carefully in order to understand which particular package leads to
>>>>>> this behavior. BTW, how can I create crm_report? I can't find this
>>>>>> binary anywhere on the system.
>>>>>
>>>>> Its included in subsequent 1.0.x releases.
>>>>> You should have hb_report available though.
>>>>>
>>>>>> Let me know what kind of input you'll
>>>>>> need if I'll be able to reproduce this problem.
>>>>>>
>>>>>> Regards,
>>>>>> Vlad.
>>>>>>
>>>>>>
>>>>>> On Tue, 2012-10-30 at 16:00 +1100, Andrew Beekhof wrote:
>>>>>>> On Sun, Oct 28, 2012 at 9:05 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm having problem that after reboot one cluster node can't join cluster
>>>>>>>> anymore. Form the log file I can't understand what actually is going on.
>>>>>>>> I only can see, that cib and crm both are respawned frequently. I'd
>>>>>>>> appreciate any help. Below is relevant part of the log file:
>>>>>>>
>>>>>>> I appreciate that you're trying to keep it brief, but problems often
>>>>>>> originate much earlier than people suspect.
>>>>>>> Can you instead attach a crm_report tarball, that will have everything
>>>>>>> (from both nodes) that we need to be able to help.
>>>>>>>
>>>>>>> What version is this btw?
>>>>>>>
>>>>>>>>
>>>>>>>> Oct 28 10:52:22 srv2 cib: [10646]: info: cib_server_process_diff: Requesting re-sync from peer
>>>>>>>> Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_diff_notify: Local-only Change (client:crmd, call: 4770): -1.-1.-1 (Application of an update diff failed, requesting a full refresh)
>>>>>>>> Oct 28 10:52:22 srv2 cib: [10653]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.qJTUAV (digest: /var/lib/heartbeat/crm/cib.XwOKXQ)
>>>>>>>> Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_server_process_diff: Not applying diff 0.1298.5 -> 0.1299.1 (sync in progress)
>>>>>>>> Oct 28 10:52:22 srv2 cib: [10646]: info: cib_replace_notify: Local-only Replace: -1.-1.-1 from srv1
>>>>>>>> Oct 28 10:52:22 corosync [pcmk]: ] info: pcmk_ipc_exit: Client cib (conn=0x1837340, async-conn=0x1837340) left
>>>>>>>> Oct 28 10:52:22 corosync [pcmk]: ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=10646, core=true)
>>>>>>>> Oct 28 10:52:22 corosync [pcmk]: ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
>>>>>>>> Oct 28 10:52:22 corosync [pcmk]: ] info: spawn_child: Forked child 10656 for process cib
>>>>>>>> Oct 28 10:52:22 srv2 cib: [10656]: info: Invoked: /usr/lib64/heartbeat/cib
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vlad.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list