[Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Fri Oct 10 07:45:26 CEST 2014


Hi Andrew,

I applied three corrections that you made and checked movement.
I picked all "abort" processing with g_source_remove() of services.c just to make sure.
 * I set following "abort" in four places that carried out g_source_remove

>>>          if (g_source_remove(op->opaque->repeat_timer) == FALSE)  
> {
>>>                  abort();
>>>          }


As a result, "abort" still occurred.


The problem does not seem to be yet settled by your correction.


(gdb) where
#0  0x00007fdd923e1f79 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fdd923e5388 in __GI_abort () at abort.c:89
#2  0x00007fdd92b9fe77 in crm_abort (file=file at entry=0x7fdd92bd352b "logging.c", 
    function=function at entry=0x7fdd92bd48c0 <__FUNCTION__.23262> "crm_glib_handler", line=line at entry=73, 
    assert_condition=assert_condition at entry=0xe20b80 "Source ID 40 was not found when attempting to remove it", do_core=do_core at entry=1, 
    do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
#3  0x00007fdd92bc7ca7 in crm_glib_handler (log_domain=0x7fdd92130b6e "GLib", flags=<optimized out>, 
    message=0xe20b80 "Source ID 40 was not found when attempting to remove it", user_data=<optimized out>) at logging.c:73
#4  0x00007fdd920f2ae1 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007fdd920f2d72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6  0x00007fdd920eac5c in g_source_remove () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x00007fdd92984b55 in cancel_recurring_action (op=op at entry=0xe19b90) at services.c:365
#8  0x00007fdd92984bee in services_action_cancel (name=name at entry=0xe1d2d0 "dummy2", action=<optimized out>, interval=interval at entry=10000)
    at services.c:387
#9  0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0xe1d2d0 "dummy2", action=action at entry=0xe10d90 "monitor", interval=10000)
    at lrmd.c:1404
#10 0x000000000040614f in process_lrmd_rsc_cancel (client=0xe17290, id=74, request=0xe1be10) at lrmd.c:1468
#11 process_lrmd_message (client=client at entry=0xe17290, id=74, request=request at entry=0xe1be10) at lrmd.c:1507
#12 0x0000000000402bac in lrmd_ipc_dispatch (c=0xe169c0, data=<optimized out>, size=361) at main.c:148
#13 0x00007fdd91e4d4d9 in qb_ipcs_dispatch_connection_request () from /usr/lib/libqb.so.0
#14 0x00007fdd92bc409d in gio_read_socket (gio=<optimized out>, condition=G_IO_IN, data=0xe158a8) at mainloop.c:437
#15 0x00007fdd920ebce5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
#16 0x00007fdd920ec048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007fdd920ec30a in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x0000000000402774 in main (argc=<optimized out>, argv=0x7fff22cac268) at main.c:344

Best Regards,
Hideo Yamauchi.


----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Andrew Beekhof <andrew at beekhof.net>; The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Cc: 
> Date: 2014/10/10, Fri 10:55
> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
> 
> Hi Andrew,
> 
> Okay!
> 
> I test your patch.
> And I inform you of a result.
> 
> Many thanks!
> Hideo Yamauchi.
> 
> 
> 
> ----- Original Message -----
>>  From: Andrew Beekhof <andrew at beekhof.net>
>>  To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource manager 
> <pacemaker at oss.clusterlabs.org>
>>  Cc: 
>>  Date: 2014/10/10, Fri 10:47
>>  Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of 
> glib, g_source_remove fails.
>> 
>>  Perfect!
>> 
>>  Can you try this:
>> 
>>  diff --git a/lib/services/services.c b/lib/services/services.c
>>  index 8590b56..cb0f0ae 100644
>>  --- a/lib/services/services.c
>>  +++ b/lib/services/services.c
>>  @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char 
> *action, 
>>  int interval /* ms */
>>       free(id);
>> 
>>       if (op == NULL) {
>>  +        op->opaque->repeat_timer = 0;
>>           return FALSE;
>>       }
>> 
>>  @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char 
> *action, 
>>  int interval /* ms */
>>       } else {
>>           if (op->opaque->repeat_timer) {
>>               g_source_remove(op->opaque->repeat_timer);
>>  +            op->opaque->repeat_timer = 0;
>>           }
>>           recurring_action_timer(op);
>>           return TRUE;
>>  @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void 
>>  (*action_callback) (svc_actio
>>           if (dup->pid != 0) {
>>               if (op->opaque->repeat_timer) {
>>                   g_source_remove(op->opaque->repeat_timer);
>>  +                op->opaque->repeat_timer = 0;
>>               }
>>               recurring_action_timer(dup);
>>           }
>> 
>> 
>>  On 10 Oct 2014, at 12:16 pm, renayama19661014 at ybb.ne.jp wrote:
>> 
>>>   Hi Andrew,
>>> 
>>>   Setting of gdb of the Ubuntu environment does not yet go well and I 
> touch 
>>  lrmd and cannot acquire trace.
>>>   Please wait for this a little more.
>>> 
>>> 
>>>   But.. I let lrmd terminate abnormally when g_source_remove() of 
>>  cancel_recurring_action() returned FALSE.
>>>   -----
>>>   gboolean
>>>   cancel_recurring_action(svc_action_t * op)
>>>   {
>>>       crm_info("Cancelling operation %s", op->id);
>>> 
>>>       if (recurring_actions) {
>>>           g_hash_table_remove(recurring_actions, op->id);
>>>       }
>>> 
>>>       if (op->opaque->repeat_timer) {
>>>           if (g_source_remove(op->opaque->repeat_timer) == FALSE)  
> {
>>>                   abort();
>>>           }
>>>   (snip)
>>>   -------core----
>>>   #0  0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at 
>>  ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>>> 
>>>   56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or 
> directory.
>>>   (gdb) where
>>>   #0  0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at 
>>  ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>>>   #1  0x00007f30aa613388 in __GI_abort () at abort.c:89
>>>   #2  0x00007f30aadcde77 in crm_abort (file=file at entry=0x7f30aae0152b 
>>  "logging.c", 
>>>       function=function at entry=0x7f30aae028c0 <__FUNCTION__.23262> 
>>  "crm_glib_handler", line=line at entry=73, 
>>>       assert_condition=assert_condition at entry=0x19d2ad0 "Source ID 
> 63 
>>  was not found when attempting to remove it", do_core=do_core at entry=1, 
>>>       do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
>>>   #3  0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e 
>>  "GLib", flags=<optimized out>, 
>>>       message=0x19d2ad0 "Source ID 63 was not found when attempting 
> to 
>>  remove it", user_data=<optimized out>) at logging.c:73
>>>   #4  0x00007f30aa320ae1 in g_logv () from 
>>  /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>   #5  0x00007f30aa320d72 in g_log () from 
>>  /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>   #6  0x00007f30aa318c5c in g_source_remove () from 
>>  /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>   #7  0x00007f30aabb2b55 in cancel_recurring_action 
> (op=op at entry=0x19caa90) 
>>  at services.c:363
>>>   #8  0x00007f30aabb2bee in services_action_cancel 
> (name=name at entry=0x19d0530 
>>  "dummy3", action=<optimized out>, 
> interval=interval at entry=10000)
>>>       at services.c:385
>>>   #9  0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0x19d0530 
>>  "dummy3", action=action at entry=0x19cec10 "monitor", 
>>  interval=10000)
>>>       at lrmd.c:1404
>>>   #10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290, 
> id=74, 
>>  request=0x19ca8a0) at lrmd.c:1468
>>>   #11 process_lrmd_message (client=client at entry=0x19c8290, id=74, 
>>  request=request at entry=0x19ca8a0) at lrmd.c:1507
>>>   #12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0, 
>>  data=<optimized out>, size=361) at main.c:148
>>>   #13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from 
>>  /usr/lib/libqb.so.0
>>>   #14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>, 
>>  condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437
>>>   #15 0x00007f30aa319ce5 in g_main_context_dispatch () from 
>>  /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>   ---Type <return> to continue, or q <return> to quit---
>>>   #16 0x00007f30aa31a048 in ?? () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>   #17 0x00007f30aa31a30a in g_main_loop_run () from 
>>  /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>   #18 0x0000000000402774 in main (argc=<optimized out>, 
>>  argv=0x7fffcdd90b88) at main.c:344
>>>   ---------
>>> 
>>>   Best Regards,
>>>   Hideo Yamauchi.
>>> 
>>> 
>>> 
>>>   ----- Original Message -----
>>>>   From: "renayama19661014 at ybb.ne.jp" 
>>  <renayama19661014 at ybb.ne.jp>
>>>>   To: Andrew Beekhof <andrew at beekhof.net>
>>>>   Cc: The Pacemaker cluster resource manager 
>>  <pacemaker at oss.clusterlabs.org>
>>>>   Date: 2014/10/7, Tue 11:15
>>>>   Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new 
> version of 
>>  glib, g_source_remove fails.
>>>> 
>>>>   Hi Andrew,
>>>> 
>>>>>   Not quite. Returning FALSE from the callback also removes the 
>>  source from 
>>>>   glib.
>>>>>   So your test case effectively removes t1 twice: once 
> implicitly by 
>>>>   returning 
>>>>>   FALSE in timer_func1() and then again explicitly in 
> timer_func3()
>>>> 
>>>> 
>>>>   Your opinion is right.
>>>> 
>>>> 
>>>>   If Pacemaker repeats and does not remove the resources which timer 
> 
>>  concluded in 
>>>>   FALSE, glib does not return the error.
>>>> 
>>>> 
>>>>   Many Thanks,
>>>>   Hideo Yamauchi.
>>>> 
>>>> 
>>>>   ----- Original Message -----
>>>>>   From: Andrew Beekhof <andrew at beekhof.net>
>>>>>   To: renayama19661014 at ybb.ne.jp
>>>>>   Cc: The Pacemaker cluster resource manager 
>>>>   <pacemaker at oss.clusterlabs.org>
>>>>>   Date: 2014/10/7, Tue 11:06
>>>>>   Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new 
> version 
>>  of 
>>>>   glib, g_source_remove fails.
>>>>> 
>>>>> 
>>>>>   On 7 Oct 2014, at 1:03 pm, renayama19661014 at ybb.ne.jp wrote:
>>>>> 
>>>>>>     Hi Andrew,
>>>>>> 
>>>>>>>>     These problems seem to be due to a correction of 
> next 
>>  glib 
>>>>   somehow 
>>>>>   or 
>>>>>>>     other.
>>>>>>>>      * 
>>>>>>> 
>>>>> 
>>>> 
>> 
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>>    
>>>>>>>     The glib behaviour on unbuntu seems reasonable, 
> removing 
>>  a source 
>>>>>   multiple times 
>>>>>>>     IS a valid error.
>>>>>>>     I need the stack trace to know where/how this 
> situation 
>>  can occur 
>>>>   in 
>>>>>   pacemaker.
>>>>>> 
>>>>>> 
>>>>>>     Pacemaker does not remove resources several times as far 
> as I 
>> 
>>>>   confirmed it.
>>>>>>     In Ubuntu(glib2.40), an error occurs just to remove 
> resources 
>>  first.
>>>>> 
>>>>>   Not quite. Returning FALSE from the callback also removes the 
>>  source from 
>>>>   glib.
>>>>>   So your test case effectively removes t1 twice: once 
> implicitly by 
>>>>   returning 
>>>>>   FALSE in timer_func1() and then again explicitly in 
> timer_func3()
>>>>> 
>>>>>> 
>>>>>>     Confirmation and the deletion of resources seem to be 
>>  necessary not to 
>>>> 
>>>>>   produce an error in Ubuntu.
>>>>>>     And this works well in glib of RHEL6.x.(and RHEL7.0)
>>>>>> 
>>>>>>            if (g_main_context_find_source_by_id (NULL, t1) 
> != 
>>  NULL) {
>>>>>>                    g_source_remove(t1);
>>>>>>            }
>>>>>> 
>>>>>>     I send it to you after acquiring stack trace.
>>>>>> 
>>>>>>     Many Thanks!
>>>>>>     Hideo Yamauchi.
>>>>>> 
>>>>>>     ----- Original Message -----
>>>>>>>     From: Andrew Beekhof <andrew at beekhof.net>
>>>>>>>     To: renayama19661014 at ybb.ne.jp; The Pacemaker 
> cluster 
>>  resource 
>>>>   manager 
>>>>>   <pacemaker at oss.clusterlabs.org>
>>>>>>>     Cc: 
>>>>>>>     Date: 2014/10/7, Tue 09:44
>>>>>>>     Subject: Re: [Pacemaker] [Problem]When Pacemaker 
> uses a 
>>  new 
>>>>   version of 
>>>>>   glib, g_source_remove fails.
>>>>>>> 
>>>>>>> 
>>>>>>>     On 6 Oct 2014, at 4:09 pm, 
> renayama19661014 at ybb.ne.jp 
>>  wrote:
>>>>>>> 
>>>>>>>>     Hi All,
>>>>>>>> 
>>>>>>>>     When I move the next sample in 
>>  RHEL6.5(glib2-2.22.5-7.el6) and 
>>>> 
>>>>>>>     Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement 
> is 
>>  different.
>>>>>>>> 
>>>>>>>>      * Sample : test2.c
>>>>>>>>     {{{
>>>>>>>>     #include <stdio.h>
>>>>>>>>     #include <stdlib.h>
>>>>>>>>     #include <glib.h>
>>>>>>>>     #include <sys/times.h>
>>>>>>>>     guint t1, t2, t3;
>>>>>>>>     gboolean timer_func2(gpointer data){
>>>>>>>>             printf("TIMER 
> EXPIRE!2\n");
>>>>>>>>             fflush(stdout);
>>>>>>>>             return FALSE;
>>>>>>>>     }
>>>>>>>>     gboolean timer_func1(gpointer data){
>>>>>>>>             clock_t         ret;
>>>>>>>>             struct tms buff;
>>>>>>>> 
>>>>>>>>             ret = times(&buff);
>>>>>>>>             printf("TIMER EXPIRE!1 
> %d\n", 
>>>>   (int)ret);
>>>>>>>>             fflush(stdout);
>>>>>>>>             return FALSE;
>>>>>>>>     }
>>>>>>>>     gboolean timer_func3(gpointer data){
>>>>>>>>             printf("TIMER EXPIRE 
> 3!\n");
>>>>>>>>             fflush(stdout);
>>>>>>>>             printf("remove 
> timer1!\n");
>>>>>>>> 
>>>>>>>>             fflush(stdout);
>>>>>>>>             g_source_remove(t1);
>>>>>>>>             printf("remove 
> timer2!\n");
>>>>>>>>             fflush(stdout);
>>>>>>>>             g_source_remove(t2);
>>>>>>>>             printf("remove 
> timer3!\n");
>>>>>>>>             fflush(stdout);
>>>>>>>>             g_source_remove(t3);
>>>>>>>>             return FALSE;
>>>>>>>>     }
>>>>>>>>     int main(int argc, char** argv){
>>>>>>>>             GMainLoop *m;
>>>>>>>>             clock_t         ret;
>>>>>>>>             struct tms buff;
>>>>>>>>             gint64 t;
>>>>>>>>             m = g_main_new(FALSE);
>>>>>>>>             t1 = g_timeout_add(1000, timer_func1, 
> NULL);
>>>>>>>>             t2 = g_timeout_add(60000, timer_func2, 
> NULL);
>>>>>>>>             t3 = g_timeout_add(5000, timer_func3, 
> NULL);
>>>>>>>>             ret = times(&buff);
>>>>>>>>             printf("START! %d\n", 
>>  (int)ret);
>>>>>>>>             g_main_run(m);
>>>>>>>>     }
>>>>>>>> 
>>>>>>>>     }}}
>>>>>>>>      * Result
>>>>>>>>     ---- RHEL6.5(glib2-2.22.5-7.el6) ---- 
>>>>>>>>     [root at snmp1 ~]# ./test2
>>>>>>>>     START! 429576012
>>>>>>>>     TIMER EXPIRE!1 429576112
>>>>>>>>     TIMER EXPIRE 3!
>>>>>>>>     remove timer1!
>>>>>>>>     remove timer2!
>>>>>>>>     remove timer3!
>>>>>>>> 
>>>>>>>>     ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2) 
> ----
>>>>>>>>     root at a1be102:~# ./test2
>>>>>>>>     START! 1718163089
>>>>>>>>     TIMER EXPIRE!1 1718163189
>>>>>>>>     TIMER EXPIRE 3!
>>>>>>>>     remove timer1!
>>>>>>>> 
>>>>>>>>     (process:1410): GLib-CRITICAL **: Source ID 1 
> was not 
>>  found 
>>>>   when 
>>>>>   attempting 
>>>>>>>     to remove it
>>>>>>>>     remove timer2!
>>>>>>>>     remove timer3!
>>>>>>>> 
>>>>>>>> 
>>>>>>>>     These problems seem to be due to a correction of 
> next 
>>  glib 
>>>>   somehow 
>>>>>   or 
>>>>>>>     other.
>>>>>>>>      * 
>>>>>>> 
>>>>> 
>>>> 
>> 
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>> 
>>>>>>>     The glib behaviour on unbuntu seems reasonable, 
> removing 
>>  a source 
>>>>>   multiple times 
>>>>>>>     IS a valid error.
>>>>>>>     I need the stack trace to know where/how this 
> situation 
>>  can occur 
>>>>   in 
>>>>>   pacemaker.
>>>>>>> 
>>>>>>>> 
>>>>>>>>     In g_source_remove() until before change, the 
>>  deletion of the 
>>>>   timer 
>>>>>   which 
>>>>>>>     practice completed is possible, but 
> g_source_remove() 
>>  after the 
>>>>   change 
>>>>>   causes an 
>>>>>>>     error.
>>>>>>>> 
>>>>>>>>     Under this influence, we get the following crit 
> error 
>>  in the 
>>>>>   environment of 
>>>>>>>     Pacemaker using a new version of glib.
>>>>>>>> 
>>>>>>>>     lrmd[1632]:    error: crm_abort: 
> crm_glib_handler: 
>>  Forked 
>>>>   child 
>>>>>   1840 to 
>>>>>>>>     record non-fatal assert at logging.c:73 : Source 
> ID 
>>  51 was not 
>>>> 
>>>>>   found when 
>>>>>>>>     attempting to remove it
>>>>>>>>     lrmd[1632]:    crit: crm_glib_handler: GLib: 
> Source 
>>  ID 51 was 
>>>>   not 
>>>>>   found 
>>>>>>>>     when attempting to remove it
>>>>>>>> 
>>>>>>>>     It seems that some kind of coping is necessary 
> in 
>>  Pacemaker 
>>>>   when I 
>>>>>   think 
>>>>>>>     about next.
>>>>>>>>      * Distribution using a new version of glib 
> including 
>>  Ubuntu.
>>>>>>>>      * Version up of future glib of RHEL.
>>>>>>>> 
>>>>>>>>     A similar problem is reported in the ML.
>>>>>>>>      * 
>>>>>   
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333
>>>>>>>>      * 
>>>>   http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408
>>>>>>>> 
>>>>>>>>     Best Regards,
>>>>>>>>     Hideo Yamauchi.
>>>>>>>> 
>>>>>>>>     _______________________________________________
>>>>>>>>     Pacemaker mailing list: 
> Pacemaker at oss.clusterlabs.org
>>>>>>>>    
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>> 
>>>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>>>     Getting started: 
>>>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>     Bugs: http://bugs.clusterlabs.org
>>>>>>> 
>>>>> 
>>>> 
>>>>   _______________________________________________
>>>>   Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: 
>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>>> 
>>> 
>>>   _______________________________________________
>>>   Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>>   Project Home: http://www.clusterlabs.org
>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>   Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 



More information about the Pacemaker mailing list