[Pacemaker] Action from a different CRMD transition results in
Latrous, Youssef
YLatrous at BroadViewNet.com
Mon Dec 17 14:39:18 UTC 2012
Hi Andrew,
Thank you for following up.
I still don't see what went wrong. From the logs, RabbitMQ was working
just fine around that time until it was ordered to shut down by CRM (for
the failed monitor?).
Moreover, I assume that transitions are ordered monotonically, which
means that Transition ID 16048 happened before Transition ID 18014:
16048 << 18014
According to the logs, Transition ID 16048 wasn't present in the logs
dating several days before transition ID 18014 was generated. I'll then
assume that it was generated several days ago (if not true, please give
me a way of finding out when did this transition happen - I still
believe that time is of essence in this case). Our monitor command
timers are expressed in seconds.
In that case, how can we say:
" It hasn't only just acted now. Its been repeating over and over for
the last few weeks or so."
My understanding is that a transition happens once and only once: it
succeeds, fails or is aborted altogether. Corresponding events can
repeat over and over, but each time can only be part a new transition.
Am I missing something fundamental here?
Sorry to insist, but I have to answer this very simple question: "What
did happen here?"
I'm sure you can understand my situation here.
Thank you in advance for your help,
Regards,
Youssef
-----Original Message-----
From: pacemaker-request at oss.clusterlabs.org
[mailto:pacemaker-request at oss.clusterlabs.org]
Sent: Friday, December 14, 2012 5:37 AM
To: pacemaker at oss.clusterlabs.org
Subject: Pacemaker Digest, Vol 61, Issue 37
Send Pacemaker mailing list submissions to
pacemaker at oss.clusterlabs.org
To subscribe or unsubscribe via the World Wide Web, visit
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
or, via email, send a message with subject or body 'help' to
pacemaker-request at oss.clusterlabs.org
You can reach the person managing the list at
pacemaker-owner at oss.clusterlabs.org
When replying, please edit your Subject line so it is more specific than
"Re: Contents of Pacemaker digest..."
Today's Topics:
1. Re: Action from a different CRMD transition results in
restarting services (Andrew Beekhof)
2. Re: problem with float IP with pacemaker (Andrew Beekhof)
3. cman+qdisk+pacemaker - pacemaker qdisk node offline (Rob)
4. Re: booth is the state of "started" on pacemaker before booth
write ticket info in cib. (Jiaju Zhang)
5. Pacemaker stop behaviour when underlying resource is
unavailable (pavan tc)
----------------------------------------------------------------------
Message: 1
Date: Fri, 14 Dec 2012 13:32:32 +1100
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
<pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] Action from a different CRMD transition
results in restarting services
Message-ID:
<CAEDLWG0gzrt0w__tsZKbeELXwdaOHi9KGj_Oxm0877kMxgP=BA at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Dec 14, 2012 at 1:33 AM, Latrous, Youssef
<YLatrous at broadviewnet.com> wrote:
>
> Andrew Beekhof <andrew at beekhof.net> wrote:
>> 18014 is where we're up to now, 16048 is the (old) one that scheduled
> the recurring monitor operation.
>> I suspect you'll find the action failed earlier in the logs and thats
> why it needed to be restarted.
>>
>> Not the best log message though :(
>
> Thanks Andrew for the quick answer. I still need more info if
possible.
>
> I searched everywhere for transaction 16048 and I couldn't find a
> trace of it (looked for up to 5 days of logs prior to transaction
18014).
> It would have been good if we had timestamps for each transaction
> involved in this situation :-)
>
> Is there a way to find about this old transaction in any other logs (I
> looked into /var/log/messages on both nodes involved in this cluster)?
Its not really relevant.
The only important thing is that its not one we're currently executing.
What you should care about is any logs that hopefully show you why the
resource failed at around Dec 6 22:55:05.
>
> To give you an idea of how many transactions happened during this
> period:
> TR_ID 18010 @ 21:52:16
> ...
> TR_ID 18018 @ 22:55:25
>
> Over an hour between these two events.
>
> Given this, how come such a (very) old transaction (~2000 transactions
> before current one) only acts now? Could it be stale information in
> pacemaker?
No. It hasn't only just acted now. Its been repeating over and over for
the last few weeks or so.
The difference is that this time it failed.
>
> Thanks in advance.
>
> Youssef
_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
End of Pacemaker Digest, Vol 61, Issue 37
*****************************************
More information about the Pacemaker
mailing list