[Pacemaker] hangs pending
Andrew Beekhof
andrew at beekhof.net
Mon Jan 13 20:18:51 UTC 2014
On 13 Jan 2014, at 8:31 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>
>
> 13.01.2014, 02:51, "Andrew Beekhof" <andrew at beekhof.net>:
>> On 10 Jan 2014, at 9:55 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>
>>> 10.01.2014, 14:31, "Andrey Groshev" <greenx at yandex.ru>:
>>>> 10.01.2014, 14:01, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>> On 10 Jan 2014, at 5:03 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>> 10.01.2014, 05:29, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>> On 9 Jan 2014, at 11:11 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>> 08.01.2014, 06:22, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>> On 29 Nov 2013, at 7:17 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>> Hi, ALL.
>>>>>>>>>>
>>>>>>>>>> I'm still trying to cope with the fact that after the fence - node hangs in "pending".
>>>>>>>>> Please define "pending". Where did you see this?
>>>>>>>> In crm_mon:
>>>>>>>> ......
>>>>>>>> Node dev-cluster2-node2 (172793105): pending
>>>>>>>> ......
>>>>>>>>
>>>>>>>> The experiment was like this:
>>>>>>>> Four nodes in cluster.
>>>>>>>> On one of them kill corosync or pacemakerd (signal 4 or 6 oк 11).
>>>>>>>> Thereafter, the remaining start it constantly reboot, under various pretexts, "softly whistling", "fly low", "not a cluster member!" ...
>>>>>>>> Then in the log fell out "Too many failures ...."
>>>>>>>> All this time in the status in crm_mon is "pending".
>>>>>>>> Depending on the wind direction changed to "UNCLEAN"
>>>>>>>> Much time has passed and I can not accurately describe the behavior...
>>>>>>>>
>>>>>>>> Now I am in the following state:
>>>>>>>> I tried locate the problem. Came here with this.
>>>>>>>> I set big value in property stonith-timeout="600s".
>>>>>>>> And got the following behavior:
>>>>>>>> 1. pkill -4 corosync
>>>>>>>> 2. from node with DC call my fence agent "sshbykey"
>>>>>>>> 3. It sends reboot victim and waits until she comes to life again.
>>>>>>> Hmmm.... what version of pacemaker?
>>>>>>> This sounds like a timing issue that we fixed a while back
>>>>>> Was a version 1.1.11 from December 3.
>>>>>> Now try full update and retest.
>>>>> That should be recent enough. Can you create a crm_report the next time you reproduce?
>>>> Of course yes. Little delay.... :)
>>>>
>>>> ......
>>>> cc1: warnings being treated as errors
>>>> upstart.c: In function ‘upstart_job_property’:
>>>> upstart.c:264: error: implicit declaration of function ‘g_variant_lookup_value’
>>>> upstart.c:264: error: nested extern declaration of ‘g_variant_lookup_value’
>>>> upstart.c:264: error: assignment makes pointer from integer without a cast
>>>> gmake[2]: *** [libcrmservice_la-upstart.lo] Error 1
>>>> gmake[2]: Leaving directory `/root/ha/pacemaker/lib/services'
>>>> make[1]: *** [all-recursive] Error 1
>>>> make[1]: Leaving directory `/root/ha/pacemaker/lib'
>>>> make: *** [core] Error 1
>>>>
>>>> I'm trying to solve this a problem.
>>> Do not get solved quickly...
>>>
>>> https://developer.gnome.org/glib/2.28/glib-GVariant.html#g-variant-lookup-value
>>> g_variant_lookup_value () Since 2.28
>>>
>>> # yum list installed glib2
>>> Loaded plugins: fastestmirror, rhnplugin, security
>>> This system is receiving updates from RHN Classic or Red Hat Satellite.
>>> Loading mirror speeds from cached hostfile
>>> Installed Packages
>>> glib2.x86_64 2.26.1-3.el6 installed
>>>
>>> # cat /etc/issue
>>> CentOS release 6.5 (Final)
>>> Kernel \r on an \m
>>
>> Can you try this patch?
>> Upstart jobs wont work, but the code will compile
>>
>> diff --git a/lib/services/upstart.c b/lib/services/upstart.c
>> index 831e7cf..195c3a4 100644
>> --- a/lib/services/upstart.c
>> +++ b/lib/services/upstart.c
>> @@ -231,12 +231,21 @@ upstart_job_exists(const char *name)
>> static char *
>> upstart_job_property(const char *obj, const gchar * iface, const char *name)
>> {
>> + char *output = NULL;
>> +
>> +#if !GLIB_CHECK_VERSION(2,28,0)
>> + static bool err = TRUE;
>> +
>> + if(err) {
>> + crm_err("This version of glib is too old to support upstart jobs");
>> + err = FALSE;
>> + }
>> +#else
>> GError *error = NULL;
>> GDBusProxy *proxy;
>> GVariant *asv = NULL;
>> GVariant *value = NULL;
>> GVariant *_ret = NULL;
>> - char *output = NULL;
>>
>> crm_info("Calling GetAll on %s", obj);
>> proxy = get_proxy(obj, BUS_PROPERTY_IFACE);
>> @@ -272,6 +281,7 @@ upstart_job_property(const char *obj, const gchar * iface, const char *name)
>>
>> g_object_unref(proxy);
>> g_variant_unref(_ret);
>> +#endif
>> return output;
>> }
>>
>
> Ok :) I patch source.
> Type "make rc" - the same error.
Because its not building your local changes
> Make new copy via "fetch" - the same error.
> It seems that if not exist ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz, then download it.
> Otherwise use exist archive.
> Cutted log .......
>
> # make rc
> make TAG=Pacemaker-1.1.11-rc3 rpm
> make[1]: Entering directory `/root/ha/pacemaker'
> rm -f pacemaker-dirty.tar.* pacemaker-tip.tar.* pacemaker-HEAD.tar.*
> if [ ! -f ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz ]; then \
> rm -f pacemaker.tar.*; \
> if [ Pacemaker-1.1.11-rc3 = dirty ]; then \
> git commit -m "DO-NOT-PUSH" -a; \
> git archive --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ HEAD | gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
> git reset --mixed HEAD^; \
> else \
> git archive --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ Pacemaker-1.1.11-rc3 | gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
> fi; \
> echo `date`: Rebuilt ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
> else \
> echo `date`: Using existing tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
> fi
> Mon Jan 13 13:23:21 MSK 2014: Using existing tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz
> .......
>
> Well, "make rpm" - build rpms and I create cluster.
> I spent the same tests and confirmed the behavior.
> crm_reoprt log here - http://send2me.ru/crmrep.tar.bz2
Thanks!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140114/21e88251/attachment-0004.sig>
More information about the Pacemaker
mailing list