[Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Fri Oct 10 07:45:26 CEST 2014
Hi Andrew,
I applied three corrections that you made and checked movement.
I picked all "abort" processing with g_source_remove() of services.c just to make sure.
* I set following "abort" in four places that carried out g_source_remove
>>> if (g_source_remove(op->opaque->repeat_timer) == FALSE)
> {
>>> abort();
>>> }
As a result, "abort" still occurred.
The problem does not seem to be yet settled by your correction.
(gdb) where
#0 0x00007fdd923e1f79 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fdd923e5388 in __GI_abort () at abort.c:89
#2 0x00007fdd92b9fe77 in crm_abort (file=file at entry=0x7fdd92bd352b "logging.c",
function=function at entry=0x7fdd92bd48c0 <__FUNCTION__.23262> "crm_glib_handler", line=line at entry=73,
assert_condition=assert_condition at entry=0xe20b80 "Source ID 40 was not found when attempting to remove it", do_core=do_core at entry=1,
do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
#3 0x00007fdd92bc7ca7 in crm_glib_handler (log_domain=0x7fdd92130b6e "GLib", flags=<optimized out>,
message=0xe20b80 "Source ID 40 was not found when attempting to remove it", user_data=<optimized out>) at logging.c:73
#4 0x00007fdd920f2ae1 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007fdd920f2d72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007fdd920eac5c in g_source_remove () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x00007fdd92984b55 in cancel_recurring_action (op=op at entry=0xe19b90) at services.c:365
#8 0x00007fdd92984bee in services_action_cancel (name=name at entry=0xe1d2d0 "dummy2", action=<optimized out>, interval=interval at entry=10000)
at services.c:387
#9 0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0xe1d2d0 "dummy2", action=action at entry=0xe10d90 "monitor", interval=10000)
at lrmd.c:1404
#10 0x000000000040614f in process_lrmd_rsc_cancel (client=0xe17290, id=74, request=0xe1be10) at lrmd.c:1468
#11 process_lrmd_message (client=client at entry=0xe17290, id=74, request=request at entry=0xe1be10) at lrmd.c:1507
#12 0x0000000000402bac in lrmd_ipc_dispatch (c=0xe169c0, data=<optimized out>, size=361) at main.c:148
#13 0x00007fdd91e4d4d9 in qb_ipcs_dispatch_connection_request () from /usr/lib/libqb.so.0
#14 0x00007fdd92bc409d in gio_read_socket (gio=<optimized out>, condition=G_IO_IN, data=0xe158a8) at mainloop.c:437
#15 0x00007fdd920ebce5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
#16 0x00007fdd920ec048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007fdd920ec30a in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x0000000000402774 in main (argc=<optimized out>, argv=0x7fff22cac268) at main.c:344
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Andrew Beekhof <andrew at beekhof.net>; The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Cc:
> Date: 2014/10/10, Fri 10:55
> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
>
> Hi Andrew,
>
> Okay!
>
> I test your patch.
> And I inform you of a result.
>
> Many thanks!
> Hideo Yamauchi.
>
>
>
> ----- Original Message -----
>> From: Andrew Beekhof <andrew at beekhof.net>
>> To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource manager
> <pacemaker at oss.clusterlabs.org>
>> Cc:
>> Date: 2014/10/10, Fri 10:47
>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of
> glib, g_source_remove fails.
>>
>> Perfect!
>>
>> Can you try this:
>>
>> diff --git a/lib/services/services.c b/lib/services/services.c
>> index 8590b56..cb0f0ae 100644
>> --- a/lib/services/services.c
>> +++ b/lib/services/services.c
>> @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char
> *action,
>> int interval /* ms */
>> free(id);
>>
>> if (op == NULL) {
>> + op->opaque->repeat_timer = 0;
>> return FALSE;
>> }
>>
>> @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char
> *action,
>> int interval /* ms */
>> } else {
>> if (op->opaque->repeat_timer) {
>> g_source_remove(op->opaque->repeat_timer);
>> + op->opaque->repeat_timer = 0;
>> }
>> recurring_action_timer(op);
>> return TRUE;
>> @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void
>> (*action_callback) (svc_actio
>> if (dup->pid != 0) {
>> if (op->opaque->repeat_timer) {
>> g_source_remove(op->opaque->repeat_timer);
>> + op->opaque->repeat_timer = 0;
>> }
>> recurring_action_timer(dup);
>> }
>>
>>
>> On 10 Oct 2014, at 12:16 pm, renayama19661014 at ybb.ne.jp wrote:
>>
>>> Hi Andrew,
>>>
>>> Setting of gdb of the Ubuntu environment does not yet go well and I
> touch
>> lrmd and cannot acquire trace.
>>> Please wait for this a little more.
>>>
>>>
>>> But.. I let lrmd terminate abnormally when g_source_remove() of
>> cancel_recurring_action() returned FALSE.
>>> -----
>>> gboolean
>>> cancel_recurring_action(svc_action_t * op)
>>> {
>>> crm_info("Cancelling operation %s", op->id);
>>>
>>> if (recurring_actions) {
>>> g_hash_table_remove(recurring_actions, op->id);
>>> }
>>>
>>> if (op->opaque->repeat_timer) {
>>> if (g_source_remove(op->opaque->repeat_timer) == FALSE)
> {
>>> abort();
>>> }
>>> (snip)
>>> -------core----
>>> #0 0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>>>
>>> 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or
> directory.
>>> (gdb) where
>>> #0 0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>>> #1 0x00007f30aa613388 in __GI_abort () at abort.c:89
>>> #2 0x00007f30aadcde77 in crm_abort (file=file at entry=0x7f30aae0152b
>> "logging.c",
>>> function=function at entry=0x7f30aae028c0 <__FUNCTION__.23262>
>> "crm_glib_handler", line=line at entry=73,
>>> assert_condition=assert_condition at entry=0x19d2ad0 "Source ID
> 63
>> was not found when attempting to remove it", do_core=do_core at entry=1,
>>> do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
>>> #3 0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e
>> "GLib", flags=<optimized out>,
>>> message=0x19d2ad0 "Source ID 63 was not found when attempting
> to
>> remove it", user_data=<optimized out>) at logging.c:73
>>> #4 0x00007f30aa320ae1 in g_logv () from
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #5 0x00007f30aa320d72 in g_log () from
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #6 0x00007f30aa318c5c in g_source_remove () from
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #7 0x00007f30aabb2b55 in cancel_recurring_action
> (op=op at entry=0x19caa90)
>> at services.c:363
>>> #8 0x00007f30aabb2bee in services_action_cancel
> (name=name at entry=0x19d0530
>> "dummy3", action=<optimized out>,
> interval=interval at entry=10000)
>>> at services.c:385
>>> #9 0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0x19d0530
>> "dummy3", action=action at entry=0x19cec10 "monitor",
>> interval=10000)
>>> at lrmd.c:1404
>>> #10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290,
> id=74,
>> request=0x19ca8a0) at lrmd.c:1468
>>> #11 process_lrmd_message (client=client at entry=0x19c8290, id=74,
>> request=request at entry=0x19ca8a0) at lrmd.c:1507
>>> #12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0,
>> data=<optimized out>, size=361) at main.c:148
>>> #13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from
>> /usr/lib/libqb.so.0
>>> #14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>,
>> condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437
>>> #15 0x00007f30aa319ce5 in g_main_context_dispatch () from
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> ---Type <return> to continue, or q <return> to quit---
>>> #16 0x00007f30aa31a048 in ?? () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #17 0x00007f30aa31a30a in g_main_loop_run () from
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #18 0x0000000000402774 in main (argc=<optimized out>,
>> argv=0x7fffcdd90b88) at main.c:344
>>> ---------
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "renayama19661014 at ybb.ne.jp"
>> <renayama19661014 at ybb.ne.jp>
>>>> To: Andrew Beekhof <andrew at beekhof.net>
>>>> Cc: The Pacemaker cluster resource manager
>> <pacemaker at oss.clusterlabs.org>
>>>> Date: 2014/10/7, Tue 11:15
>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new
> version of
>> glib, g_source_remove fails.
>>>>
>>>> Hi Andrew,
>>>>
>>>>> Not quite. Returning FALSE from the callback also removes the
>> source from
>>>> glib.
>>>>> So your test case effectively removes t1 twice: once
> implicitly by
>>>> returning
>>>>> FALSE in timer_func1() and then again explicitly in
> timer_func3()
>>>>
>>>>
>>>> Your opinion is right.
>>>>
>>>>
>>>> If Pacemaker repeats and does not remove the resources which timer
>
>> concluded in
>>>> FALSE, glib does not return the error.
>>>>
>>>>
>>>> Many Thanks,
>>>> Hideo Yamauchi.
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: Andrew Beekhof <andrew at beekhof.net>
>>>>> To: renayama19661014 at ybb.ne.jp
>>>>> Cc: The Pacemaker cluster resource manager
>>>> <pacemaker at oss.clusterlabs.org>
>>>>> Date: 2014/10/7, Tue 11:06
>>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new
> version
>> of
>>>> glib, g_source_remove fails.
>>>>>
>>>>>
>>>>> On 7 Oct 2014, at 1:03 pm, renayama19661014 at ybb.ne.jp wrote:
>>>>>
>>>>>> Hi Andrew,
>>>>>>
>>>>>>>> These problems seem to be due to a correction of
> next
>> glib
>>>> somehow
>>>>> or
>>>>>>> other.
>>>>>>>> *
>>>>>>>
>>>>>
>>>>
>>
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>>
>>>>>>> The glib behaviour on unbuntu seems reasonable,
> removing
>> a source
>>>>> multiple times
>>>>>>> IS a valid error.
>>>>>>> I need the stack trace to know where/how this
> situation
>> can occur
>>>> in
>>>>> pacemaker.
>>>>>>
>>>>>>
>>>>>> Pacemaker does not remove resources several times as far
> as I
>>
>>>> confirmed it.
>>>>>> In Ubuntu(glib2.40), an error occurs just to remove
> resources
>> first.
>>>>>
>>>>> Not quite. Returning FALSE from the callback also removes the
>> source from
>>>> glib.
>>>>> So your test case effectively removes t1 twice: once
> implicitly by
>>>> returning
>>>>> FALSE in timer_func1() and then again explicitly in
> timer_func3()
>>>>>
>>>>>>
>>>>>> Confirmation and the deletion of resources seem to be
>> necessary not to
>>>>
>>>>> produce an error in Ubuntu.
>>>>>> And this works well in glib of RHEL6.x.(and RHEL7.0)
>>>>>>
>>>>>> if (g_main_context_find_source_by_id (NULL, t1)
> !=
>> NULL) {
>>>>>> g_source_remove(t1);
>>>>>> }
>>>>>>
>>>>>> I send it to you after acquiring stack trace.
>>>>>>
>>>>>> Many Thanks!
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: Andrew Beekhof <andrew at beekhof.net>
>>>>>>> To: renayama19661014 at ybb.ne.jp; The Pacemaker
> cluster
>> resource
>>>> manager
>>>>> <pacemaker at oss.clusterlabs.org>
>>>>>>> Cc:
>>>>>>> Date: 2014/10/7, Tue 09:44
>>>>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker
> uses a
>> new
>>>> version of
>>>>> glib, g_source_remove fails.
>>>>>>>
>>>>>>>
>>>>>>> On 6 Oct 2014, at 4:09 pm,
> renayama19661014 at ybb.ne.jp
>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> When I move the next sample in
>> RHEL6.5(glib2-2.22.5-7.el6) and
>>>>
>>>>>>> Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement
> is
>> different.
>>>>>>>>
>>>>>>>> * Sample : test2.c
>>>>>>>> {{{
>>>>>>>> #include <stdio.h>
>>>>>>>> #include <stdlib.h>
>>>>>>>> #include <glib.h>
>>>>>>>> #include <sys/times.h>
>>>>>>>> guint t1, t2, t3;
>>>>>>>> gboolean timer_func2(gpointer data){
>>>>>>>> printf("TIMER
> EXPIRE!2\n");
>>>>>>>> fflush(stdout);
>>>>>>>> return FALSE;
>>>>>>>> }
>>>>>>>> gboolean timer_func1(gpointer data){
>>>>>>>> clock_t ret;
>>>>>>>> struct tms buff;
>>>>>>>>
>>>>>>>> ret = times(&buff);
>>>>>>>> printf("TIMER EXPIRE!1
> %d\n",
>>>> (int)ret);
>>>>>>>> fflush(stdout);
>>>>>>>> return FALSE;
>>>>>>>> }
>>>>>>>> gboolean timer_func3(gpointer data){
>>>>>>>> printf("TIMER EXPIRE
> 3!\n");
>>>>>>>> fflush(stdout);
>>>>>>>> printf("remove
> timer1!\n");
>>>>>>>>
>>>>>>>> fflush(stdout);
>>>>>>>> g_source_remove(t1);
>>>>>>>> printf("remove
> timer2!\n");
>>>>>>>> fflush(stdout);
>>>>>>>> g_source_remove(t2);
>>>>>>>> printf("remove
> timer3!\n");
>>>>>>>> fflush(stdout);
>>>>>>>> g_source_remove(t3);
>>>>>>>> return FALSE;
>>>>>>>> }
>>>>>>>> int main(int argc, char** argv){
>>>>>>>> GMainLoop *m;
>>>>>>>> clock_t ret;
>>>>>>>> struct tms buff;
>>>>>>>> gint64 t;
>>>>>>>> m = g_main_new(FALSE);
>>>>>>>> t1 = g_timeout_add(1000, timer_func1,
> NULL);
>>>>>>>> t2 = g_timeout_add(60000, timer_func2,
> NULL);
>>>>>>>> t3 = g_timeout_add(5000, timer_func3,
> NULL);
>>>>>>>> ret = times(&buff);
>>>>>>>> printf("START! %d\n",
>> (int)ret);
>>>>>>>> g_main_run(m);
>>>>>>>> }
>>>>>>>>
>>>>>>>> }}}
>>>>>>>> * Result
>>>>>>>> ---- RHEL6.5(glib2-2.22.5-7.el6) ----
>>>>>>>> [root at snmp1 ~]# ./test2
>>>>>>>> START! 429576012
>>>>>>>> TIMER EXPIRE!1 429576112
>>>>>>>> TIMER EXPIRE 3!
>>>>>>>> remove timer1!
>>>>>>>> remove timer2!
>>>>>>>> remove timer3!
>>>>>>>>
>>>>>>>> ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2)
> ----
>>>>>>>> root at a1be102:~# ./test2
>>>>>>>> START! 1718163089
>>>>>>>> TIMER EXPIRE!1 1718163189
>>>>>>>> TIMER EXPIRE 3!
>>>>>>>> remove timer1!
>>>>>>>>
>>>>>>>> (process:1410): GLib-CRITICAL **: Source ID 1
> was not
>> found
>>>> when
>>>>> attempting
>>>>>>> to remove it
>>>>>>>> remove timer2!
>>>>>>>> remove timer3!
>>>>>>>>
>>>>>>>>
>>>>>>>> These problems seem to be due to a correction of
> next
>> glib
>>>> somehow
>>>>> or
>>>>>>> other.
>>>>>>>> *
>>>>>>>
>>>>>
>>>>
>>
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>>
>>>>>>> The glib behaviour on unbuntu seems reasonable,
> removing
>> a source
>>>>> multiple times
>>>>>>> IS a valid error.
>>>>>>> I need the stack trace to know where/how this
> situation
>> can occur
>>>> in
>>>>> pacemaker.
>>>>>>>
>>>>>>>>
>>>>>>>> In g_source_remove() until before change, the
>> deletion of the
>>>> timer
>>>>> which
>>>>>>> practice completed is possible, but
> g_source_remove()
>> after the
>>>> change
>>>>> causes an
>>>>>>> error.
>>>>>>>>
>>>>>>>> Under this influence, we get the following crit
> error
>> in the
>>>>> environment of
>>>>>>> Pacemaker using a new version of glib.
>>>>>>>>
>>>>>>>> lrmd[1632]: error: crm_abort:
> crm_glib_handler:
>> Forked
>>>> child
>>>>> 1840 to
>>>>>>>> record non-fatal assert at logging.c:73 : Source
> ID
>> 51 was not
>>>>
>>>>> found when
>>>>>>>> attempting to remove it
>>>>>>>> lrmd[1632]: crit: crm_glib_handler: GLib:
> Source
>> ID 51 was
>>>> not
>>>>> found
>>>>>>>> when attempting to remove it
>>>>>>>>
>>>>>>>> It seems that some kind of coping is necessary
> in
>> Pacemaker
>>>> when I
>>>>> think
>>>>>>> about next.
>>>>>>>> * Distribution using a new version of glib
> including
>> Ubuntu.
>>>>>>>> * Version up of future glib of RHEL.
>>>>>>>>
>>>>>>>> A similar problem is reported in the ML.
>>>>>>>> *
>>>>>
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333
>>>>>>>> *
>>>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Hideo Yamauchi.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list:
> Pacemaker at oss.clusterlabs.org
>>>>>>>>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list