[Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Fri Oct 10 03:55:00 CEST 2014


Hi Andrew,

Okay!

I test your patch.
And I inform you of a result.

Many thanks!
Hideo Yamauchi.



----- Original Message -----
> From: Andrew Beekhof <andrew at beekhof.net>
> To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Cc: 
> Date: 2014/10/10, Fri 10:47
> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
> 
> Perfect!
> 
> Can you try this:
> 
> diff --git a/lib/services/services.c b/lib/services/services.c
> index 8590b56..cb0f0ae 100644
> --- a/lib/services/services.c
> +++ b/lib/services/services.c
> @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char *action, 
> int interval /* ms */
>      free(id);
> 
>      if (op == NULL) {
> +        op->opaque->repeat_timer = 0;
>          return FALSE;
>      }
> 
> @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char *action, 
> int interval /* ms */
>      } else {
>          if (op->opaque->repeat_timer) {
>              g_source_remove(op->opaque->repeat_timer);
> +            op->opaque->repeat_timer = 0;
>          }
>          recurring_action_timer(op);
>          return TRUE;
> @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void 
> (*action_callback) (svc_actio
>          if (dup->pid != 0) {
>              if (op->opaque->repeat_timer) {
>                  g_source_remove(op->opaque->repeat_timer);
> +                op->opaque->repeat_timer = 0;
>              }
>              recurring_action_timer(dup);
>          }
> 
> 
> On 10 Oct 2014, at 12:16 pm, renayama19661014 at ybb.ne.jp wrote:
> 
>>  Hi Andrew,
>> 
>>  Setting of gdb of the Ubuntu environment does not yet go well and I touch 
> lrmd and cannot acquire trace.
>>  Please wait for this a little more.
>> 
>> 
>>  But.. I let lrmd terminate abnormally when g_source_remove() of 
> cancel_recurring_action() returned FALSE.
>>  -----
>>  gboolean
>>  cancel_recurring_action(svc_action_t * op)
>>  {
>>      crm_info("Cancelling operation %s", op->id);
>> 
>>      if (recurring_actions) {
>>          g_hash_table_remove(recurring_actions, op->id);
>>      }
>> 
>>      if (op->opaque->repeat_timer) {
>>          if (g_source_remove(op->opaque->repeat_timer) == FALSE)  {
>>                  abort();
>>          }
>>  (snip)
>>  -------core----
>>  #0  0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> 
>>  56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>  (gdb) where
>>  #0  0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>>  #1  0x00007f30aa613388 in __GI_abort () at abort.c:89
>>  #2  0x00007f30aadcde77 in crm_abort (file=file at entry=0x7f30aae0152b 
> "logging.c", 
>>      function=function at entry=0x7f30aae028c0 <__FUNCTION__.23262> 
> "crm_glib_handler", line=line at entry=73, 
>>      assert_condition=assert_condition at entry=0x19d2ad0 "Source ID 63 
> was not found when attempting to remove it", do_core=do_core at entry=1, 
>>      do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
>>  #3  0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e 
> "GLib", flags=<optimized out>, 
>>      message=0x19d2ad0 "Source ID 63 was not found when attempting to 
> remove it", user_data=<optimized out>) at logging.c:73
>>  #4  0x00007f30aa320ae1 in g_logv () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>  #5  0x00007f30aa320d72 in g_log () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>  #6  0x00007f30aa318c5c in g_source_remove () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>  #7  0x00007f30aabb2b55 in cancel_recurring_action (op=op at entry=0x19caa90) 
> at services.c:363
>>  #8  0x00007f30aabb2bee in services_action_cancel (name=name at entry=0x19d0530 
> "dummy3", action=<optimized out>, interval=interval at entry=10000)
>>      at services.c:385
>>  #9  0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0x19d0530 
> "dummy3", action=action at entry=0x19cec10 "monitor", 
> interval=10000)
>>      at lrmd.c:1404
>>  #10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290, id=74, 
> request=0x19ca8a0) at lrmd.c:1468
>>  #11 process_lrmd_message (client=client at entry=0x19c8290, id=74, 
> request=request at entry=0x19ca8a0) at lrmd.c:1507
>>  #12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0, 
> data=<optimized out>, size=361) at main.c:148
>>  #13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from 
> /usr/lib/libqb.so.0
>>  #14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>, 
> condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437
>>  #15 0x00007f30aa319ce5 in g_main_context_dispatch () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>  ---Type <return> to continue, or q <return> to quit---
>>  #16 0x00007f30aa31a048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>  #17 0x00007f30aa31a30a in g_main_loop_run () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>  #18 0x0000000000402774 in main (argc=<optimized out>, 
> argv=0x7fffcdd90b88) at main.c:344
>>  ---------
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>> 
>>  ----- Original Message -----
>>>  From: "renayama19661014 at ybb.ne.jp" 
> <renayama19661014 at ybb.ne.jp>
>>>  To: Andrew Beekhof <andrew at beekhof.net>
>>>  Cc: The Pacemaker cluster resource manager 
> <pacemaker at oss.clusterlabs.org>
>>>  Date: 2014/10/7, Tue 11:15
>>>  Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of 
> glib, g_source_remove fails.
>>> 
>>>  Hi Andrew,
>>> 
>>>>  Not quite. Returning FALSE from the callback also removes the 
> source from 
>>>  glib.
>>>>  So your test case effectively removes t1 twice: once implicitly by 
>>>  returning 
>>>>  FALSE in timer_func1() and then again explicitly in timer_func3()
>>> 
>>> 
>>>  Your opinion is right.
>>> 
>>> 
>>>  If Pacemaker repeats and does not remove the resources which timer 
> concluded in 
>>>  FALSE, glib does not return the error.
>>> 
>>> 
>>>  Many Thanks,
>>>  Hideo Yamauchi.
>>> 
>>> 
>>>  ----- Original Message -----
>>>>  From: Andrew Beekhof <andrew at beekhof.net>
>>>>  To: renayama19661014 at ybb.ne.jp
>>>>  Cc: The Pacemaker cluster resource manager 
>>>  <pacemaker at oss.clusterlabs.org>
>>>>  Date: 2014/10/7, Tue 11:06
>>>>  Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version 
> of 
>>>  glib, g_source_remove fails.
>>>> 
>>>> 
>>>>  On 7 Oct 2014, at 1:03 pm, renayama19661014 at ybb.ne.jp wrote:
>>>> 
>>>>>    Hi Andrew,
>>>>> 
>>>>>>>    These problems seem to be due to a correction of next 
> glib 
>>>  somehow 
>>>>  or 
>>>>>>    other.
>>>>>>>     * 
>>>>>> 
>>>> 
>>> 
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>   
>>>>>>    The glib behaviour on unbuntu seems reasonable, removing 
> a source 
>>>>  multiple times 
>>>>>>    IS a valid error.
>>>>>>    I need the stack trace to know where/how this situation 
> can occur 
>>>  in 
>>>>  pacemaker.
>>>>> 
>>>>> 
>>>>>    Pacemaker does not remove resources several times as far as I 
> 
>>>  confirmed it.
>>>>>    In Ubuntu(glib2.40), an error occurs just to remove resources 
> first.
>>>> 
>>>>  Not quite. Returning FALSE from the callback also removes the 
> source from 
>>>  glib.
>>>>  So your test case effectively removes t1 twice: once implicitly by 
>>>  returning 
>>>>  FALSE in timer_func1() and then again explicitly in timer_func3()
>>>> 
>>>>> 
>>>>>    Confirmation and the deletion of resources seem to be 
> necessary not to 
>>> 
>>>>  produce an error in Ubuntu.
>>>>>    And this works well in glib of RHEL6.x.(and RHEL7.0)
>>>>> 
>>>>>           if (g_main_context_find_source_by_id (NULL, t1) != 
> NULL) {
>>>>>                   g_source_remove(t1);
>>>>>           }
>>>>> 
>>>>>    I send it to you after acquiring stack trace.
>>>>> 
>>>>>    Many Thanks!
>>>>>    Hideo Yamauchi.
>>>>> 
>>>>>    ----- Original Message -----
>>>>>>    From: Andrew Beekhof <andrew at beekhof.net>
>>>>>>    To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster 
> resource 
>>>  manager 
>>>>  <pacemaker at oss.clusterlabs.org>
>>>>>>    Cc: 
>>>>>>    Date: 2014/10/7, Tue 09:44
>>>>>>    Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a 
> new 
>>>  version of 
>>>>  glib, g_source_remove fails.
>>>>>> 
>>>>>> 
>>>>>>    On 6 Oct 2014, at 4:09 pm, renayama19661014 at ybb.ne.jp 
> wrote:
>>>>>> 
>>>>>>>    Hi All,
>>>>>>> 
>>>>>>>    When I move the next sample in 
> RHEL6.5(glib2-2.22.5-7.el6) and 
>>> 
>>>>>>    Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement is 
> different.
>>>>>>> 
>>>>>>>     * Sample : test2.c
>>>>>>>    {{{
>>>>>>>    #include <stdio.h>
>>>>>>>    #include <stdlib.h>
>>>>>>>    #include <glib.h>
>>>>>>>    #include <sys/times.h>
>>>>>>>    guint t1, t2, t3;
>>>>>>>    gboolean timer_func2(gpointer data){
>>>>>>>            printf("TIMER EXPIRE!2\n");
>>>>>>>            fflush(stdout);
>>>>>>>            return FALSE;
>>>>>>>    }
>>>>>>>    gboolean timer_func1(gpointer data){
>>>>>>>            clock_t         ret;
>>>>>>>            struct tms buff;
>>>>>>> 
>>>>>>>            ret = times(&buff);
>>>>>>>            printf("TIMER EXPIRE!1 %d\n", 
>>>  (int)ret);
>>>>>>>            fflush(stdout);
>>>>>>>            return FALSE;
>>>>>>>    }
>>>>>>>    gboolean timer_func3(gpointer data){
>>>>>>>            printf("TIMER EXPIRE 3!\n");
>>>>>>>            fflush(stdout);
>>>>>>>            printf("remove timer1!\n");
>>>>>>> 
>>>>>>>            fflush(stdout);
>>>>>>>            g_source_remove(t1);
>>>>>>>            printf("remove timer2!\n");
>>>>>>>            fflush(stdout);
>>>>>>>            g_source_remove(t2);
>>>>>>>            printf("remove timer3!\n");
>>>>>>>            fflush(stdout);
>>>>>>>            g_source_remove(t3);
>>>>>>>            return FALSE;
>>>>>>>    }
>>>>>>>    int main(int argc, char** argv){
>>>>>>>            GMainLoop *m;
>>>>>>>            clock_t         ret;
>>>>>>>            struct tms buff;
>>>>>>>            gint64 t;
>>>>>>>            m = g_main_new(FALSE);
>>>>>>>            t1 = g_timeout_add(1000, timer_func1, NULL);
>>>>>>>            t2 = g_timeout_add(60000, timer_func2, NULL);
>>>>>>>            t3 = g_timeout_add(5000, timer_func3, NULL);
>>>>>>>            ret = times(&buff);
>>>>>>>            printf("START! %d\n", 
> (int)ret);
>>>>>>>            g_main_run(m);
>>>>>>>    }
>>>>>>> 
>>>>>>>    }}}
>>>>>>>     * Result
>>>>>>>    ---- RHEL6.5(glib2-2.22.5-7.el6) ---- 
>>>>>>>    [root at snmp1 ~]# ./test2
>>>>>>>    START! 429576012
>>>>>>>    TIMER EXPIRE!1 429576112
>>>>>>>    TIMER EXPIRE 3!
>>>>>>>    remove timer1!
>>>>>>>    remove timer2!
>>>>>>>    remove timer3!
>>>>>>> 
>>>>>>>    ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2) ----
>>>>>>>    root at a1be102:~# ./test2
>>>>>>>    START! 1718163089
>>>>>>>    TIMER EXPIRE!1 1718163189
>>>>>>>    TIMER EXPIRE 3!
>>>>>>>    remove timer1!
>>>>>>> 
>>>>>>>    (process:1410): GLib-CRITICAL **: Source ID 1 was not 
> found 
>>>  when 
>>>>  attempting 
>>>>>>    to remove it
>>>>>>>    remove timer2!
>>>>>>>    remove timer3!
>>>>>>> 
>>>>>>> 
>>>>>>>    These problems seem to be due to a correction of next 
> glib 
>>>  somehow 
>>>>  or 
>>>>>>    other.
>>>>>>>     * 
>>>>>> 
>>>> 
>>> 
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>> 
>>>>>>    The glib behaviour on unbuntu seems reasonable, removing 
> a source 
>>>>  multiple times 
>>>>>>    IS a valid error.
>>>>>>    I need the stack trace to know where/how this situation 
> can occur 
>>>  in 
>>>>  pacemaker.
>>>>>> 
>>>>>>> 
>>>>>>>    In g_source_remove() until before change, the 
> deletion of the 
>>>  timer 
>>>>  which 
>>>>>>    practice completed is possible, but g_source_remove() 
> after the 
>>>  change 
>>>>  causes an 
>>>>>>    error.
>>>>>>> 
>>>>>>>    Under this influence, we get the following crit error 
> in the 
>>>>  environment of 
>>>>>>    Pacemaker using a new version of glib.
>>>>>>> 
>>>>>>>    lrmd[1632]:    error: crm_abort: crm_glib_handler: 
> Forked 
>>>  child 
>>>>  1840 to 
>>>>>>>    record non-fatal assert at logging.c:73 : Source ID 
> 51 was not 
>>> 
>>>>  found when 
>>>>>>>    attempting to remove it
>>>>>>>    lrmd[1632]:    crit: crm_glib_handler: GLib: Source 
> ID 51 was 
>>>  not 
>>>>  found 
>>>>>>>    when attempting to remove it
>>>>>>> 
>>>>>>>    It seems that some kind of coping is necessary in 
> Pacemaker 
>>>  when I 
>>>>  think 
>>>>>>    about next.
>>>>>>>     * Distribution using a new version of glib including 
> Ubuntu.
>>>>>>>     * Version up of future glib of RHEL.
>>>>>>> 
>>>>>>>    A similar problem is reported in the ML.
>>>>>>>     * 
>>>>  http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333
>>>>>>>     * 
>>>  http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408
>>>>>>> 
>>>>>>>    Best Regards,
>>>>>>>    Hideo Yamauchi.
>>>>>>> 
>>>>>>>    _______________________________________________
>>>>>>>    Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> 
>>>>>>>    Project Home: http://www.clusterlabs.org
>>>>>>>    Getting started: 
>>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>    Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>> 
>>> 
>>>  _______________________________________________
>>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>>  _______________________________________________
>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
> 



More information about the Pacemaker mailing list