[Pacemaker] lrmd segfault at pacemaker 1.1.11-rc1
Kazunori INOUE
kazunori.inoue3 at gmail.com
Fri Jan 10 11:23:04 UTC 2014
2014/1/9 Andrew Beekhof <andrew at beekhof.net>:
>
> On 8 Jan 2014, at 9:15 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>
>> 2014/1/8 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 18 Dec 2013, at 9:50 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>>
>>>> Hi David,
>>>>
>>>> 2013/12/18 David Vossel <dvossel at redhat.com>:
>>>>>
>>>>> That's a really weird one... I don't see how it is possible for op->id to be NULL there. You might need to give valgrind a shot to detect whatever is really going on here.
>>>>>
>>>>> -- Vossel
>>>>>
>>>> Thank you for advice. I try it.
>>>
>>> Any update on this?
>>>
>>
>> We are still investigating a cause. It was not reproduced when I gave valgrind..
>> And it was reproduced in RC3.
>
> So it happened RC3 - valgrind, but not RC3 + valgrind?
> Thats concerning.
>
> Nothing in the valgrind output?
>
The cause was found.
230 gboolean
231 operation_finalize(svc_action_t * op)
232 {
233 int recurring = 0;
234
235 if (op->interval) {
236 if (op->cancel) {
237 op->status = PCMK_LRM_OP_CANCELLED;
238 cancel_recurring_action(op);
239 } else {
240 recurring = 1;
241 op->opaque->repeat_timer = g_timeout_add(op->interval,
242
recurring_action_timer, (void *)op);
243 }
244 }
245
246 if (op->opaque->callback) {
247 op->opaque->callback(op);
248 }
249
250 op->pid = 0;
251
252 if (!recurring) {
253 /*
254 * If this is a recurring action, do not free explicitly.
255 * It will get freed whenever the action gets cancelled.
256 */
257 services_action_free(op);
258 return TRUE;
259 }
260 return FALSE;
261 }
When op->id is not 0, in cancel_recurring_action function (l.238), op
is not removed from hash table.
However, op is freed in services_action_free function (l.257). That
is, the freed data remains in hash table.
Then, g_hash_table_lookup function may look up the freed data.
Therefore, when g_hash_table_replace function was called (in
services_action_async function), I added change so that
g_hash_table_remove function might certainly be called.
As of now, segfault has not happened.
diff --git a/lib/services/services.c b/lib/services/services.c
index cb02511..73d0492 100644
--- a/lib/services/services.c
+++ b/lib/services/services.c
@@ -347,9 +347,9 @@ services_action_free(svc_action_t * op)
}
gboolean
-cancel_recurring_action(svc_action_t * op)
+cancel_recurring_action(svc_action_t * op, gboolean force_remove)
{
- if (op->pid) {
+ if (force_remove == FALSE && op->pid) {
return FALSE;
}
@@ -378,7 +378,7 @@ services_action_cancel(const char *name, const
char *action, int interval)
return FALSE;
}
- if (cancel_recurring_action(op)) {
+ if (cancel_recurring_action(op, FALSE)) {
op->status = PCMK_LRM_OP_CANCELLED;
if (op->opaque->callback) {
op->opaque->callback(op);
diff --git a/lib/services/services_linux.c b/lib/services/services_linux.c
index 7060be0..a02f8d9 100644
--- a/lib/services/services_linux.c
+++ b/lib/services/services_linux.c
@@ -235,7 +235,7 @@ operation_finalize(svc_action_t * op)
if (op->interval) {
if (op->cancel) {
op->status = PCMK_LRM_OP_CANCELLED;
- cancel_recurring_action(op);
+ cancel_recurring_action(op, TRUE);
} else {
recurring = 1;
op->opaque->repeat_timer = g_timeout_add(op->interval,
diff --git a/lib/services/services_private.h b/lib/services/services_private.h
index dd759e3..72dc1ba 100644
--- a/lib/services/services_private.h
+++ b/lib/services/services_private.h
@@ -45,7 +45,7 @@ GList *resources_os_list_ocf_agents(const char *provider);
GList *resources_os_list_nagios_agents(void);
-gboolean cancel_recurring_action(svc_action_t * op);
+gboolean cancel_recurring_action(svc_action_t * op, gboolean force_remove);
gboolean recurring_action_timer(gpointer data);
gboolean operation_finalize(svc_action_t * op);
>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list