[Pacemaker] hangs pending
Andrew Beekhof
andrew at beekhof.net
Thu Feb 20 23:55:12 UTC 2014
On 20 Feb 2014, at 10:04 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>
>
> 20.02.2014, 13:57, "Andrew Beekhof" <andrew at beekhof.net>:
>> On 20 Feb 2014, at 5:33 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>
>>> 20.02.2014, 01:22, "Andrew Beekhof" <andrew at beekhof.net>:
>>>> On 20 Feb 2014, at 4:18 am, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>> 19.02.2014, 06:47, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>> On 18 Feb 2014, at 9:29 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>> Hi, ALL and Andrew!
>>>>>>>
>>>>>>> Today is a good day - I killed a lot, and a lot of shooting at me.
>>>>>>> In general - I am happy (almost like an elephant) :)
>>>>>>> Except resources on the node are important to me eight processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd.
>>>>>>> I killed them with different signals (4,6,11 and even 9).
>>>>>>> Behavior does not depend of number signal - it's good.
>>>>>>> If STONITH send reboot to the node - it rebooted and rejoined the cluster - too it's good.
>>>>>>> But the behavior is different from killing various demons.
>>>>>>>
>>>>>>> Turned four groups:
>>>>>>> 1. corosync,cib - STONITH work 100%.
>>>>>>> Kill via any signals - call STONITH and reboot.
>>>>>> excellent
>>>>>>> 3. stonithd,attrd,pengine - not need STONITH
>>>>>>> This daemons simple restart, resources - stay running.
>>>>>> right
>>>>>>> 2. lrmd,crmd - strange behavior STONITH.
>>>>>>> Sometimes called STONITH - and the corresponding reaction.
>>>>>>> Sometimes restart daemon
>>>>>> The daemon will always try to restart, the only variable is how long it takes the peer to notice and initiate fencing.
>>>>>> If the failure happens just before a they're due to receive totem token, the failure will be very quickly detected and the node fenced.
>>>>>> If the failure happens just after, then detection will take longer - giving the node longer to recover and not be fenced.
>>>>>>
>>>>>> So fence/not fence is normal and to be expected.
>>>>>>> and restart resources with large delay MS:pgsql.
>>>>>>> One time after restart crmd - pgsql don't restart.
>>>>>> I would not expect pgsql to ever restart - if the RA does its job properly anyway.
>>>>>> In the case the node is not fenced, the crmd will respawn and the the PE will request that it re-detect the state of all resources.
>>>>>>
>>>>>> If the agent reports "all good", then there is nothing more to do.
>>>>>> If the agent is not reporting "all good", you should really be asking why.
>>>>>>> 4. pacemakerd - nothing happens.
>>>>>> On non-systemd based machines, correct.
>>>>>>
>>>>>> On a systemd based machine pacemakerd is respawned and reattaches to the existing daemons.
>>>>>> Any subsequent daemon failure will be detected and the daemon respawned.
>>>>> And! I almost forgot about IT!
>>>>> Exist another (NORMAL) the variants, the methods, the ideas?
>>>>> Without this ... @$%#$%&$%^&$%^&##@#$$^$%& !!!!!
>>>>> Otherwise - it's a full epic fail ;)
>>>> -ENOPARSE
>>> OK, I remove my personal attitude to "systemd".
>>> Let me explain.
>>>
>>> Somewhere in the beginning of this topic, I wrote:
>>> A.G.:Who knows who runs lrmd?
>>> A.B.:Pacemakerd.
>>> That's one!
>>>
>>> Let's see the list of processes:
>>> #ps -axf
>>> .....
>>> 6067 ? Ssl 7:24 corosync
>>> 6092 ? S 0:25 pacemakerd
>>> 6094 ? Ss 116:13 \_ /usr/libexec/pacemaker/cib
>>> 6095 ? Ss 0:25 \_ /usr/libexec/pacemaker/stonithd
>>> 6096 ? Ss 1:27 \_ /usr/libexec/pacemaker/lrmd
>>> 6097 ? Ss 0:49 \_ /usr/libexec/pacemaker/attrd
>>> 6098 ? Ss 0:25 \_ /usr/libexec/pacemaker/pengine
>>> 6099 ? Ss 0:29 \_ /usr/libexec/pacemaker/crmd
>>> .....
>>> That's two!
>>
>> Whats two? I don't follow.
> In the sense that it creates other processes. But it does not matter.
>
>
>>> And more, more...
>>> Now you must understand - why I want this process to work always.
>>> Even I think, No need for anyone here to explain it!
>>>
>>> And Now you say about "pacemakerd nice work, but only on systemd distros" !!!
>>
>> No, I;m saying it works _better_ on systemd distros.
>> On non-systemd distros you still need quite a few unlikely-to-happen failures to trigger a situation in which the node still gets fenced and recovered (assuming no-one saw any of the error messages and didn't run "service pacemaker restart" prior to the additional failures).
>>
> Can you show me the place where:
> "On a systemd based machine pacemakerd is respawned and reattaches to the existing daemons."?
The code for it is in mcp/pacemaker.c, look for find_and_track_existing_processes()
The ps tree will look different though
6094 ? Ss 116:13 /usr/libexec/pacemaker/cib
6095 ? Ss 0:25 /usr/libexec/pacemaker/stonithd
6096 ? Ss 1:27 /usr/libexec/pacemaker/lrmd
6097 ? Ss 0:49 /usr/libexec/pacemaker/attrd
6098 ? Ss 0:25 /usr/libexec/pacemaker/pengine
6099 ? Ss 0:29 /usr/libexec/pacemaker/crmd
...
6666 ? S 0:25 pacemakerd
but pacemakerd will be watching the old children and respawning them on failure.
at which point you might see:
6094 ? Ss 116:13 /usr/libexec/pacemaker/cib
6096 ? Ss 1:27 /usr/libexec/pacemaker/lrmd
6097 ? Ss 0:49 /usr/libexec/pacemaker/attrd
6098 ? Ss 0:25 /usr/libexec/pacemaker/pengine
6099 ? Ss 0:29 /usr/libexec/pacemaker/crmd
...
6666 ? S 0:25 pacemakerd
6667 ? Ss 0:25 \_ /usr/libexec/pacemaker/stonithd
> If I respawn via upstart process pacemakerd - "reattaches to the existing daemons" ?
If upstart is capable of detecting the pacemakerd failure and automagically respawning it, then yes - the same process will happen.
>
>>> What should I do now?
>>> * Integrate systemd in CentOS?
>>> * Migrate to Fefora?
>>> * Buy RHEL7 !?
>>
>> Option 3 is particularly good :)
>
> It's too easy. Normal heroes are always going to bypass :)
>
>>> Each a variants is great, but don't fit for me.
>>>
>>> P.S. And I'm not talking distros which don't migrate to systemd (and will not do).
>>
>> Are there any? Even debian and ubuntu have raised the white flag.
>
> It certainly a lyrics, but potentially it can be any Unix-like system.
>
>
>>> Do not be offended! We also do so.
>>> We are building a secret military factory,
>>> large concrete fence around it,
>>> wall barbed wire, but forget to install the gates. :)
>>>>>>> And then I can kill any process of the third group. They do not restart.
>>>>>> Until they become needed.
>>>>>> Eg. if the DC goes to invoke the policy engine, that will fail causing the crmd to fail and the node to be fenced.
>>>>>>> Generaly don't touch corosync,cib and maybe lrmd,crmd.
>>>>>>>
>>>>>>> What do you think about this?
>>>>>>> The main question of this topic - we decided.
>>>>>>> But this varied behavior - another big problem.
>>>>>>>
>>>>>>> 17.02.2014, 08:52, "Andrey Groshev" <greenx at yandex.ru>:
>>>>>>>> 17.02.2014, 02:27, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>> With no quick follow-up, dare one hope that means the patch worked? :-)
>>>>>>>> Hi,
>>>>>>>> No, unfortunately the chief changed my plans on Friday and all day I was engaged in a parallel project.
>>>>>>>> I hope that today have time to carry out the necessary tests.
>>>>>>>>> On 14 Feb 2014, at 3:37 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>> Yes, of course. Now beginning build world and test )
>>>>>>>>>>
>>>>>>>>>> 14.02.2014, 04:41, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>> The previous patch wasn't quite right.
>>>>>>>>>>> Could you try this new one?
>>>>>>>>>>>
>>>>>>>>>>> http://paste.fedoraproject.org/77123/13923376/
>>>>>>>>>>>
>>>>>>>>>>> [11:23 AM] beekhof at f19 ~/Development/sources/pacemaker/devel ☺ # git diff
>>>>>>>>>>> diff --git a/crmd/callbacks.c b/crmd/callbacks.c
>>>>>>>>>>> index ac4b905..d49525b 100644
>>>>>>>>>>> --- a/crmd/callbacks.c
>>>>>>>>>>> +++ b/crmd/callbacks.c
>>>>>>>>>>> @@ -199,8 +199,7 @@ peer_update_callback(enum crm_status_type type, crm_node_t * node, const void *d
>>>>>>>>>>> stop_te_timer(down->timer);
>>>>>>>>>>>
>>>>>>>>>>> flags |= node_update_join | node_update_expected;
>>>>>>>>>>> - crm_update_peer_join(__FUNCTION__, node, crm_join_none);
>>>>>>>>>>> - crm_update_peer_expected(__FUNCTION__, node, CRMD_JOINSTATE_DOWN);
>>>>>>>>>>> + crmd_peer_down(node, FALSE);
>>>>>>>>>>> check_join_state(fsa_state, __FUNCTION__);
>>>>>>>>>>>
>>>>>>>>>>> update_graph(transition_graph, down);
>>>>>>>>>>> diff --git a/crmd/crmd_utils.h b/crmd/crmd_utils.h
>>>>>>>>>>> index bc472c2..1a2577a 100644
>>>>>>>>>>> --- a/crmd/crmd_utils.h
>>>>>>>>>>> +++ b/crmd/crmd_utils.h
>>>>>>>>>>> @@ -100,6 +100,7 @@ void crmd_join_phase_log(int level);
>>>>>>>>>>> const char *get_timer_desc(fsa_timer_t * timer);
>>>>>>>>>>> gboolean too_many_st_failures(void);
>>>>>>>>>>> void st_fail_count_reset(const char * target);
>>>>>>>>>>> +void crmd_peer_down(crm_node_t *peer, bool full);
>>>>>>>>>>>
>>>>>>>>>>> # define fsa_register_cib_callback(id, flag, data, fn) do { \
>>>>>>>>>>> fsa_cib_conn->cmds->register_callback( \
>>>>>>>>>>> diff --git a/crmd/te_actions.c b/crmd/te_actions.c
>>>>>>>>>>> index f31d4ec..3bfce59 100644
>>>>>>>>>>> --- a/crmd/te_actions.c
>>>>>>>>>>> +++ b/crmd/te_actions.c
>>>>>>>>>>> @@ -80,11 +80,8 @@ send_stonith_update(crm_action_t * action, const char *target, const char *uuid)
>>>>>>>>>>> crm_info("Recording uuid '%s' for node '%s'", uuid, target);
>>>>>>>>>>> peer->uuid = strdup(uuid);
>>>>>>>>>>> }
>>>>>>>>>>> - crm_update_peer_proc(__FUNCTION__, peer, crm_proc_none, NULL);
>>>>>>>>>>> - crm_update_peer_state(__FUNCTION__, peer, CRM_NODE_LOST, 0);
>>>>>>>>>>> - crm_update_peer_expected(__FUNCTION__, peer, CRMD_JOINSTATE_DOWN);
>>>>>>>>>>> - crm_update_peer_join(__FUNCTION__, peer, crm_join_none);
>>>>>>>>>>>
>>>>>>>>>>> + crmd_peer_down(peer, TRUE);
>>>>>>>>>>> node_state =
>>>>>>>>>>> do_update_node_cib(peer,
>>>>>>>>>>> node_update_cluster | node_update_peer | node_update_join |
>>>>>>>>>>> diff --git a/crmd/te_utils.c b/crmd/te_utils.c
>>>>>>>>>>> index ad7e573..0c92e95 100644
>>>>>>>>>>> --- a/crmd/te_utils.c
>>>>>>>>>>> +++ b/crmd/te_utils.c
>>>>>>>>>>> @@ -247,10 +247,7 @@ tengine_stonith_notify(stonith_t * st, stonith_event_t * st_event)
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> - crm_update_peer_proc(__FUNCTION__, peer, crm_proc_none, NULL);
>>>>>>>>>>> - crm_update_peer_state(__FUNCTION__, peer, CRM_NODE_LOST, 0);
>>>>>>>>>>> - crm_update_peer_expected(__FUNCTION__, peer, CRMD_JOINSTATE_DOWN);
>>>>>>>>>>> - crm_update_peer_join(__FUNCTION__, peer, crm_join_none);
>>>>>>>>>>> + crmd_peer_down(peer, TRUE);
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/crmd/utils.c b/crmd/utils.c
>>>>>>>>>>> index 3988cfe..2df53ab 100644
>>>>>>>>>>> --- a/crmd/utils.c
>>>>>>>>>>> +++ b/crmd/utils.c
>>>>>>>>>>> @@ -1077,3 +1077,13 @@ update_attrd_remote_node_removed(const char *host, const char *user_name)
>>>>>>>>>>> crm_trace("telling attrd to clear attributes for remote host %s", host);
>>>>>>>>>>> update_attrd_helper(host, NULL, NULL, user_name, TRUE, 'C');
>>>>>>>>>>> }
>>>>>>>>>>> +
>>>>>>>>>>> +void crmd_peer_down(crm_node_t *peer, bool full)
>>>>>>>>>>> +{
>>>>>>>>>>> + if(full && peer->state == NULL) {
>>>>>>>>>>> + crm_update_peer_state(__FUNCTION__, peer, CRM_NODE_LOST, 0);
>>>>>>>>>>> + crm_update_peer_proc(__FUNCTION__, peer, crm_proc_none, NULL);
>>>>>>>>>>> + }
>>>>>>>>>>> + crm_update_peer_join(__FUNCTION__, peer, crm_join_none);
>>>>>>>>>>> + crm_update_peer_expected(__FUNCTION__, peer, CRMD_JOINSTATE_DOWN);
>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>> On 16 Jan 2014, at 7:24 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>> 16.01.2014, 01:30, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>> On 16 Jan 2014, at 12:41 am, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>> 15.01.2014, 02:53, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>> On 15 Jan 2014, at 12:15 am, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>> 14.01.2014, 10:00, "Andrey Groshev" <greenx at yandex.ru>:
>>>>>>>>>>>>>>>>> 14.01.2014, 07:47, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>>>> Ok, here's what happens:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. node2 is lost
>>>>>>>>>>>>>>>>>> 2. fencing of node2 starts
>>>>>>>>>>>>>>>>>> 3. node2 reboots (and cluster starts)
>>>>>>>>>>>>>>>>>> 4. node2 returns to the membership
>>>>>>>>>>>>>>>>>> 5. node2 is marked as a cluster member
>>>>>>>>>>>>>>>>>> 6. DC tries to bring it into the cluster, but needs to cancel the active transition first.
>>>>>>>>>>>>>>>>>> Which is a problem since the node2 fencing operation is part of that
>>>>>>>>>>>>>>>>>> 7. node2 is in a transition (pending) state until fencing passes or fails
>>>>>>>>>>>>>>>>>> 8a. fencing fails: transition completes and the node joins the cluster
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thats in theory, except we automatically try again. Which isn't appropriate.
>>>>>>>>>>>>>>>>>> This should be relatively easy to fix.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 8b. fencing passes: the node is incorrectly marked as offline
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This I have no idea how to fix yet.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On another note, it doesn't look like this agent works at all.
>>>>>>>>>>>>>>>>>> The node has been back online for a long time and the agent is still timing out after 10 minutes.
>>>>>>>>>>>>>>>>>> So "Once the script makes sure that the victim will rebooted and again available via ssh - it exit with 0." does not seem true.
>>>>>>>>>>>>>>>>> Damn. Looks like you're right. At some time I broke my agent and had not noticed it. Who will understand.
>>>>>>>>>>>>>>>> I repaired my agent - after send reboot he is wait STDIN.
>>>>>>>>>>>>>>>> Returned "normally" a behavior - hangs "pending", until manually send reboot. :)
>>>>>>>>>>>>>>> Right. Now you're in case 8b.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you try this patch: http://paste.fedoraproject.org/68450/38973966
>>>>>>>>>>>>>> Killed all day experiences.
>>>>>>>>>>>>>> It turns out here that:
>>>>>>>>>>>>>> 1. Did cluster.
>>>>>>>>>>>>>> 2. On the node-2 send signal (-4) - killed corosink
>>>>>>>>>>>>>> 3. From node-1 (there DC) - stonith sent reboot
>>>>>>>>>>>>>> 4. Noda rebooted and resources start.
>>>>>>>>>>>>>> 5. Again. On the node-2 send signal (-4) - killed corosink
>>>>>>>>>>>>>> 6. Again. From node-1 (there DC) - stonith sent reboot
>>>>>>>>>>>>>> 7. Noda-2 rebooted and hangs in "pending"
>>>>>>>>>>>>>> 8. Waiting, waiting..... manually reboot.
>>>>>>>>>>>>>> 9. Noda-2 reboot and raised resources start.
>>>>>>>>>>>>>> 10. GOTO p.2
>>>>>>>>>>>>> Logs?
>>>>>>>>>>>> Yesterday I wrote an additional letter why not put the logs.
>>>>>>>>>>>> Read it please, it contains a few more questions.
>>>>>>>>>>>> Today again began to hang and continue along the same cycle.
>>>>>>>>>>>> Logs here http://send2me.ru/crmrep2.tar.bz2
>>>>>>>>>>>>>>>> New logs: http://send2me.ru/crmrep1.tar.bz2
>>>>>>>>>>>>>>>>>> On 14 Jan 2014, at 1:19 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>>>>>>>>>>>>>>> Apart from anything else, your timeout needs to be bigger:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( commands.c:1321 ) error: log_operation: Operation 'reboot' [11331] (call 2 from crmd.17227) for host 'dev-cluster2-node2.unix.tensor.ru' with device 'st1' returned: -62 (Timer expired)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 14 Jan 2014, at 7:18 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>>>>>>>>>>>>>>>> On 13 Jan 2014, at 8:31 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>>>> 13.01.2014, 02:51, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>>>>>>>> On 10 Jan 2014, at 9:55 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 10.01.2014, 14:31, "Andrey Groshev" <greenx at yandex.ru>:
>>>>>>>>>>>>>>>>>>>>>>>> 10.01.2014, 14:01, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>>>>>>>>>>> On 10 Jan 2014, at 5:03 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 10.01.2014, 05:29, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 9 Jan 2014, at 11:11 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 08.01.2014, 06:22, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29 Nov 2013, at 7:17 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, ALL.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still trying to cope with the fact that after the fence - node hangs in "pending".
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please define "pending". Where did you see this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> In crm_mon:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ......
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Node dev-cluster2-node2 (172793105): pending
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ......
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The experiment was like this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Four nodes in cluster.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On one of them kill corosync or pacemakerd (signal 4 or 6 oк 11).
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thereafter, the remaining start it constantly reboot, under various pretexts, "softly whistling", "fly low", "not a cluster member!" ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then in the log fell out "Too many failures ...."
>>>>>>>>>>>>>>>>>>>>>>>>>>>> All this time in the status in crm_mon is "pending".
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Depending on the wind direction changed to "UNCLEAN"
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Much time has passed and I can not accurately describe the behavior...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Now I am in the following state:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I tried locate the problem. Came here with this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I set big value in property stonith-timeout="600s".
>>>>>>>>>>>>>>>>>>>>>>>>>>>> And got the following behavior:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. pkill -4 corosync
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. from node with DC call my fence agent "sshbykey"
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. It sends reboot victim and waits until she comes to life again.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hmmm.... what version of pacemaker?
>>>>>>>>>>>>>>>>>>>>>>>>>>> This sounds like a timing issue that we fixed a while back
>>>>>>>>>>>>>>>>>>>>>>>>>> Was a version 1.1.11 from December 3.
>>>>>>>>>>>>>>>>>>>>>>>>>> Now try full update and retest.
>>>>>>>>>>>>>>>>>>>>>>>>> That should be recent enough. Can you create a crm_report the next time you reproduce?
>>>>>>>>>>>>>>>>>>>>>>>> Of course yes. Little delay.... :)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ......
>>>>>>>>>>>>>>>>>>>>>>>> cc1: warnings being treated as errors
>>>>>>>>>>>>>>>>>>>>>>>> upstart.c: In function ‘upstart_job_property’:
>>>>>>>>>>>>>>>>>>>>>>>> upstart.c:264: error: implicit declaration of function ‘g_variant_lookup_value’
>>>>>>>>>>>>>>>>>>>>>>>> upstart.c:264: error: nested extern declaration of ‘g_variant_lookup_value’
>>>>>>>>>>>>>>>>>>>>>>>> upstart.c:264: error: assignment makes pointer from integer without a cast
>>>>>>>>>>>>>>>>>>>>>>>> gmake[2]: *** [libcrmservice_la-upstart.lo] Error 1
>>>>>>>>>>>>>>>>>>>>>>>> gmake[2]: Leaving directory `/root/ha/pacemaker/lib/services'
>>>>>>>>>>>>>>>>>>>>>>>> make[1]: *** [all-recursive] Error 1
>>>>>>>>>>>>>>>>>>>>>>>> make[1]: Leaving directory `/root/ha/pacemaker/lib'
>>>>>>>>>>>>>>>>>>>>>>>> make: *** [core] Error 1
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm trying to solve this a problem.
>>>>>>>>>>>>>>>>>>>>>>> Do not get solved quickly...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> https://developer.gnome.org/glib/2.28/glib-GVariant.html#g-variant-lookup-value
>>>>>>>>>>>>>>>>>>>>>>> g_variant_lookup_value () Since 2.28
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> # yum list installed glib2
>>>>>>>>>>>>>>>>>>>>>>> Loaded plugins: fastestmirror, rhnplugin, security
>>>>>>>>>>>>>>>>>>>>>>> This system is receiving updates from RHN Classic or Red Hat Satellite.
>>>>>>>>>>>>>>>>>>>>>>> Loading mirror speeds from cached hostfile
>>>>>>>>>>>>>>>>>>>>>>> Installed Packages
>>>>>>>>>>>>>>>>>>>>>>> glib2.x86_64 2.26.1-3.el6 installed
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> # cat /etc/issue
>>>>>>>>>>>>>>>>>>>>>>> CentOS release 6.5 (Final)
>>>>>>>>>>>>>>>>>>>>>>> Kernel \r on an \m
>>>>>>>>>>>>>>>>>>>>>> Can you try this patch?
>>>>>>>>>>>>>>>>>>>>>> Upstart jobs wont work, but the code will compile
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> diff --git a/lib/services/upstart.c b/lib/services/upstart.c
>>>>>>>>>>>>>>>>>>>>>> index 831e7cf..195c3a4 100644
>>>>>>>>>>>>>>>>>>>>>> --- a/lib/services/upstart.c
>>>>>>>>>>>>>>>>>>>>>> +++ b/lib/services/upstart.c
>>>>>>>>>>>>>>>>>>>>>> @@ -231,12 +231,21 @@ upstart_job_exists(const char *name)
>>>>>>>>>>>>>>>>>>>>>> static char *
>>>>>>>>>>>>>>>>>>>>>> upstart_job_property(const char *obj, const gchar * iface, const char *name)
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>> + char *output = NULL;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +#if !GLIB_CHECK_VERSION(2,28,0)
>>>>>>>>>>>>>>>>>>>>>> + static bool err = TRUE;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> + if(err) {
>>>>>>>>>>>>>>>>>>>>>> + crm_err("This version of glib is too old to support upstart jobs");
>>>>>>>>>>>>>>>>>>>>>> + err = FALSE;
>>>>>>>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>>>>>>>> +#else
>>>>>>>>>>>>>>>>>>>>>> GError *error = NULL;
>>>>>>>>>>>>>>>>>>>>>> GDBusProxy *proxy;
>>>>>>>>>>>>>>>>>>>>>> GVariant *asv = NULL;
>>>>>>>>>>>>>>>>>>>>>> GVariant *value = NULL;
>>>>>>>>>>>>>>>>>>>>>> GVariant *_ret = NULL;
>>>>>>>>>>>>>>>>>>>>>> - char *output = NULL;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> crm_info("Calling GetAll on %s", obj);
>>>>>>>>>>>>>>>>>>>>>> proxy = get_proxy(obj, BUS_PROPERTY_IFACE);
>>>>>>>>>>>>>>>>>>>>>> @@ -272,6 +281,7 @@ upstart_job_property(const char *obj, const gchar * iface, const char *name)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> g_object_unref(proxy);
>>>>>>>>>>>>>>>>>>>>>> g_variant_unref(_ret);
>>>>>>>>>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>>>>>>>>>> return output;
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>> Ok :) I patch source.
>>>>>>>>>>>>>>>>>>>>> Type "make rc" - the same error.
>>>>>>>>>>>>>>>>>>>> Because its not building your local changes
>>>>>>>>>>>>>>>>>>>>> Make new copy via "fetch" - the same error.
>>>>>>>>>>>>>>>>>>>>> It seems that if not exist ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz, then download it.
>>>>>>>>>>>>>>>>>>>>> Otherwise use exist archive.
>>>>>>>>>>>>>>>>>>>>> Cutted log .......
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> # make rc
>>>>>>>>>>>>>>>>>>>>> make TAG=Pacemaker-1.1.11-rc3 rpm
>>>>>>>>>>>>>>>>>>>>> make[1]: Entering directory `/root/ha/pacemaker'
>>>>>>>>>>>>>>>>>>>>> rm -f pacemaker-dirty.tar.* pacemaker-tip.tar.* pacemaker-HEAD.tar.*
>>>>>>>>>>>>>>>>>>>>> if [ ! -f ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz ]; then \
>>>>>>>>>>>>>>>>>>>>> rm -f pacemaker.tar.*; \
>>>>>>>>>>>>>>>>>>>>> if [ Pacemaker-1.1.11-rc3 = dirty ]; then \
>>>>>>>>>>>>>>>>>>>>> git commit -m "DO-NOT-PUSH" -a; \
>>>>>>>>>>>>>>>>>>>>> git archive --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ HEAD | gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
>>>>>>>>>>>>>>>>>>>>> git reset --mixed HEAD^; \
>>>>>>>>>>>>>>>>>>>>> else \
>>>>>>>>>>>>>>>>>>>>> git archive --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ Pacemaker-1.1.11-rc3 | gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
>>>>>>>>>>>>>>>>>>>>> fi; \
>>>>>>>>>>>>>>>>>>>>> echo `date`: Rebuilt ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
>>>>>>>>>>>>>>>>>>>>> else \
>>>>>>>>>>>>>>>>>>>>> echo `date`: Using existing tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
>>>>>>>>>>>>>>>>>>>>> fi
>>>>>>>>>>>>>>>>>>>>> Mon Jan 13 13:23:21 MSK 2014: Using existing tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz
>>>>>>>>>>>>>>>>>>>>> .......
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Well, "make rpm" - build rpms and I create cluster.
>>>>>>>>>>>>>>>>>>>>> I spent the same tests and confirmed the behavior.
>>>>>>>>>>>>>>>>>>>>> crm_reoprt log here - http://send2me.ru/crmrep.tar.bz2
>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>> ,
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>>> ,
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>> ,
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>
>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>
>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>> ,
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>> ,
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> ,
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> ,
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> ,
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140221/186ad0b7/attachment-0004.sig>
More information about the Pacemaker
mailing list