[Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.

Andrew Beekhof andrew at beekhof.net
Fri Oct 10 03:47:29 CEST 2014


Perfect!

Can you try this:

diff --git a/lib/services/services.c b/lib/services/services.c
index 8590b56..cb0f0ae 100644
--- a/lib/services/services.c
+++ b/lib/services/services.c
@@ -417,6 +417,7 @@ services_action_kick(const char *name, const char *action, int interval /* ms */
     free(id);
 
     if (op == NULL) {
+        op->opaque->repeat_timer = 0;
         return FALSE;
     }
 
@@ -425,6 +426,7 @@ services_action_kick(const char *name, const char *action, int interval /* ms */
     } else {
         if (op->opaque->repeat_timer) {
             g_source_remove(op->opaque->repeat_timer);
+            op->opaque->repeat_timer = 0;
         }
         recurring_action_timer(op);
         return TRUE;
@@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void (*action_callback) (svc_actio
         if (dup->pid != 0) {
             if (op->opaque->repeat_timer) {
                 g_source_remove(op->opaque->repeat_timer);
+                op->opaque->repeat_timer = 0;
             }
             recurring_action_timer(dup);
         }


On 10 Oct 2014, at 12:16 pm, renayama19661014 at ybb.ne.jp wrote:

> Hi Andrew,
> 
> Setting of gdb of the Ubuntu environment does not yet go well and I touch lrmd and cannot acquire trace.
> Please wait for this a little more.
> 
> 
> But.. I let lrmd terminate abnormally when g_source_remove() of cancel_recurring_action() returned FALSE.
> -----
> gboolean
> cancel_recurring_action(svc_action_t * op)
> {
>     crm_info("Cancelling operation %s", op->id);
> 
>     if (recurring_actions) {
>         g_hash_table_remove(recurring_actions, op->id);
>     }
> 
>     if (op->opaque->repeat_timer) {
>         if (g_source_remove(op->opaque->repeat_timer) == FALSE)  {
>                 abort();
>         }
> (snip)
> -------core----
> #0  0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 
> 56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) where
> #0  0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f30aa613388 in __GI_abort () at abort.c:89
> #2  0x00007f30aadcde77 in crm_abort (file=file at entry=0x7f30aae0152b "logging.c", 
>     function=function at entry=0x7f30aae028c0 <__FUNCTION__.23262> "crm_glib_handler", line=line at entry=73, 
>     assert_condition=assert_condition at entry=0x19d2ad0 "Source ID 63 was not found when attempting to remove it", do_core=do_core at entry=1, 
>     do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
> #3  0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e "GLib", flags=<optimized out>, 
>     message=0x19d2ad0 "Source ID 63 was not found when attempting to remove it", user_data=<optimized out>) at logging.c:73
> #4  0x00007f30aa320ae1 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #5  0x00007f30aa320d72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #6  0x00007f30aa318c5c in g_source_remove () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #7  0x00007f30aabb2b55 in cancel_recurring_action (op=op at entry=0x19caa90) at services.c:363
> #8  0x00007f30aabb2bee in services_action_cancel (name=name at entry=0x19d0530 "dummy3", action=<optimized out>, interval=interval at entry=10000)
>     at services.c:385
> #9  0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0x19d0530 "dummy3", action=action at entry=0x19cec10 "monitor", interval=10000)
>     at lrmd.c:1404
> #10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290, id=74, request=0x19ca8a0) at lrmd.c:1468
> #11 process_lrmd_message (client=client at entry=0x19c8290, id=74, request=request at entry=0x19ca8a0) at lrmd.c:1507
> #12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0, data=<optimized out>, size=361) at main.c:148
> #13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from /usr/lib/libqb.so.0
> #14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>, condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437
> #15 0x00007f30aa319ce5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> ---Type <return> to continue, or q <return> to quit---
> #16 0x00007f30aa31a048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #17 0x00007f30aa31a30a in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #18 0x0000000000402774 in main (argc=<optimized out>, argv=0x7fffcdd90b88) at main.c:344
> ---------
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
>> To: Andrew Beekhof <andrew at beekhof.net>
>> Cc: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
>> Date: 2014/10/7, Tue 11:15
>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
>> 
>> Hi Andrew,
>> 
>>> Not quite. Returning FALSE from the callback also removes the source from 
>> glib.
>>> So your test case effectively removes t1 twice: once implicitly by 
>> returning 
>>> FALSE in timer_func1() and then again explicitly in timer_func3()
>> 
>> 
>> Your opinion is right.
>> 
>> 
>> If Pacemaker repeats and does not remove the resources which timer concluded in 
>> FALSE, glib does not return the error.
>> 
>> 
>> Many Thanks,
>> Hideo Yamauchi.
>> 
>> 
>> ----- Original Message -----
>>> From: Andrew Beekhof <andrew at beekhof.net>
>>> To: renayama19661014 at ybb.ne.jp
>>> Cc: The Pacemaker cluster resource manager 
>> <pacemaker at oss.clusterlabs.org>
>>> Date: 2014/10/7, Tue 11:06
>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of 
>> glib, g_source_remove fails.
>>> 
>>> 
>>> On 7 Oct 2014, at 1:03 pm, renayama19661014 at ybb.ne.jp wrote:
>>> 
>>>>   Hi Andrew,
>>>> 
>>>>>>   These problems seem to be due to a correction of next glib 
>> somehow 
>>> or 
>>>>>   other.
>>>>>>    * 
>>>>> 
>>> 
>> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>   
>>>>>   The glib behaviour on unbuntu seems reasonable, removing a source 
>>> multiple times 
>>>>>   IS a valid error.
>>>>>   I need the stack trace to know where/how this situation can occur 
>> in 
>>> pacemaker.
>>>> 
>>>> 
>>>>   Pacemaker does not remove resources several times as far as I 
>> confirmed it.
>>>>   In Ubuntu(glib2.40), an error occurs just to remove resources first.
>>> 
>>> Not quite. Returning FALSE from the callback also removes the source from 
>> glib.
>>> So your test case effectively removes t1 twice: once implicitly by 
>> returning 
>>> FALSE in timer_func1() and then again explicitly in timer_func3()
>>> 
>>>> 
>>>>   Confirmation and the deletion of resources seem to be necessary not to 
>> 
>>> produce an error in Ubuntu.
>>>>   And this works well in glib of RHEL6.x.(and RHEL7.0)
>>>> 
>>>>          if (g_main_context_find_source_by_id (NULL, t1) != NULL) {
>>>>                  g_source_remove(t1);
>>>>          }
>>>> 
>>>>   I send it to you after acquiring stack trace.
>>>> 
>>>>   Many Thanks!
>>>>   Hideo Yamauchi.
>>>> 
>>>>   ----- Original Message -----
>>>>>   From: Andrew Beekhof <andrew at beekhof.net>
>>>>>   To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource 
>> manager 
>>> <pacemaker at oss.clusterlabs.org>
>>>>>   Cc: 
>>>>>   Date: 2014/10/7, Tue 09:44
>>>>>   Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new 
>> version of 
>>> glib, g_source_remove fails.
>>>>> 
>>>>> 
>>>>>   On 6 Oct 2014, at 4:09 pm, renayama19661014 at ybb.ne.jp wrote:
>>>>> 
>>>>>>   Hi All,
>>>>>> 
>>>>>>   When I move the next sample in RHEL6.5(glib2-2.22.5-7.el6) and 
>> 
>>>>>   Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement is different.
>>>>>> 
>>>>>>    * Sample : test2.c
>>>>>>   {{{
>>>>>>   #include <stdio.h>
>>>>>>   #include <stdlib.h>
>>>>>>   #include <glib.h>
>>>>>>   #include <sys/times.h>
>>>>>>   guint t1, t2, t3;
>>>>>>   gboolean timer_func2(gpointer data){
>>>>>>           printf("TIMER EXPIRE!2\n");
>>>>>>           fflush(stdout);
>>>>>>           return FALSE;
>>>>>>   }
>>>>>>   gboolean timer_func1(gpointer data){
>>>>>>           clock_t         ret;
>>>>>>           struct tms buff;
>>>>>> 
>>>>>>           ret = times(&buff);
>>>>>>           printf("TIMER EXPIRE!1 %d\n", 
>> (int)ret);
>>>>>>           fflush(stdout);
>>>>>>           return FALSE;
>>>>>>   }
>>>>>>   gboolean timer_func3(gpointer data){
>>>>>>           printf("TIMER EXPIRE 3!\n");
>>>>>>           fflush(stdout);
>>>>>>           printf("remove timer1!\n");
>>>>>> 
>>>>>>           fflush(stdout);
>>>>>>           g_source_remove(t1);
>>>>>>           printf("remove timer2!\n");
>>>>>>           fflush(stdout);
>>>>>>           g_source_remove(t2);
>>>>>>           printf("remove timer3!\n");
>>>>>>           fflush(stdout);
>>>>>>           g_source_remove(t3);
>>>>>>           return FALSE;
>>>>>>   }
>>>>>>   int main(int argc, char** argv){
>>>>>>           GMainLoop *m;
>>>>>>           clock_t         ret;
>>>>>>           struct tms buff;
>>>>>>           gint64 t;
>>>>>>           m = g_main_new(FALSE);
>>>>>>           t1 = g_timeout_add(1000, timer_func1, NULL);
>>>>>>           t2 = g_timeout_add(60000, timer_func2, NULL);
>>>>>>           t3 = g_timeout_add(5000, timer_func3, NULL);
>>>>>>           ret = times(&buff);
>>>>>>           printf("START! %d\n", (int)ret);
>>>>>>           g_main_run(m);
>>>>>>   }
>>>>>> 
>>>>>>   }}}
>>>>>>    * Result
>>>>>>   ---- RHEL6.5(glib2-2.22.5-7.el6) ---- 
>>>>>>   [root at snmp1 ~]# ./test2
>>>>>>   START! 429576012
>>>>>>   TIMER EXPIRE!1 429576112
>>>>>>   TIMER EXPIRE 3!
>>>>>>   remove timer1!
>>>>>>   remove timer2!
>>>>>>   remove timer3!
>>>>>> 
>>>>>>   ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2) ----
>>>>>>   root at a1be102:~# ./test2
>>>>>>   START! 1718163089
>>>>>>   TIMER EXPIRE!1 1718163189
>>>>>>   TIMER EXPIRE 3!
>>>>>>   remove timer1!
>>>>>> 
>>>>>>   (process:1410): GLib-CRITICAL **: Source ID 1 was not found 
>> when 
>>> attempting 
>>>>>   to remove it
>>>>>>   remove timer2!
>>>>>>   remove timer3!
>>>>>> 
>>>>>> 
>>>>>>   These problems seem to be due to a correction of next glib 
>> somehow 
>>> or 
>>>>>   other.
>>>>>>    * 
>>>>> 
>>> 
>> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>> 
>>>>>   The glib behaviour on unbuntu seems reasonable, removing a source 
>>> multiple times 
>>>>>   IS a valid error.
>>>>>   I need the stack trace to know where/how this situation can occur 
>> in 
>>> pacemaker.
>>>>> 
>>>>>> 
>>>>>>   In g_source_remove() until before change, the deletion of the 
>> timer 
>>> which 
>>>>>   practice completed is possible, but g_source_remove() after the 
>> change 
>>> causes an 
>>>>>   error.
>>>>>> 
>>>>>>   Under this influence, we get the following crit error in the 
>>> environment of 
>>>>>   Pacemaker using a new version of glib.
>>>>>> 
>>>>>>   lrmd[1632]:    error: crm_abort: crm_glib_handler: Forked 
>> child 
>>> 1840 to 
>>>>>>   record non-fatal assert at logging.c:73 : Source ID 51 was not 
>> 
>>> found when 
>>>>>>   attempting to remove it
>>>>>>   lrmd[1632]:    crit: crm_glib_handler: GLib: Source ID 51 was 
>> not 
>>> found 
>>>>>>   when attempting to remove it
>>>>>> 
>>>>>>   It seems that some kind of coping is necessary in Pacemaker 
>> when I 
>>> think 
>>>>>   about next.
>>>>>>    * Distribution using a new version of glib including Ubuntu.
>>>>>>    * Version up of future glib of RHEL.
>>>>>> 
>>>>>>   A similar problem is reported in the ML.
>>>>>>    * 
>>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333
>>>>>>    * 
>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408
>>>>>> 
>>>>>>   Best Regards,
>>>>>>   Hideo Yamauchi.
>>>>>> 
>>>>>>   _______________________________________________
>>>>>>   Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>>   Project Home: http://www.clusterlabs.org
>>>>>>   Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>   Bugs: http://bugs.clusterlabs.org
>>>>> 
>>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20141010/9c7a7442/attachment.sig>


More information about the Pacemaker mailing list