[Pacemaker] Action from a different CRMD transition results in restarting services
Latrous, Youssef
YLatrous at BroadViewNet.com
Thu Dec 13 14:33:56 UTC 2012
Andrew Beekhof <andrew at beekhof.net> wrote:
> 18014 is where we're up to now, 16048 is the (old) one that scheduled
the recurring monitor operation.
> I suspect you'll find the action failed earlier in the logs and thats
why it needed to be restarted.
>
> Not the best log message though :(
Thanks Andrew for the quick answer. I still need more info if possible.
I searched everywhere for transaction 16048 and I couldn't find a trace
of it (looked for up to 5 days of logs prior to transaction 18014).
It would have been good if we had timestamps for each transaction
involved in this situation :-)
Is there a way to find about this old transaction in any other logs (I
looked into /var/log/messages on both nodes involved in this cluster)?
To give you an idea of how many transactions happened during this
period:
TR_ID 18010 @ 21:52:16
...
TR_ID 18018 @ 22:55:25
Over an hour between these two events.
Given this, how come such a (very) old transaction (~2000 transactions
before current one) only acts now? Could it be stale information in
pacemaker?
Thanks in advance.
Youssef
Message: 4 from Pacemaker Digest, Vol 61, Issue 34
---------------------------------------------------------------
Date: Thu, 13 Dec 2012 10:52:42 +1100
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
<pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] Action from a different CRMD transition
results in restarting services
Message-ID:
<CAEDLWG2LtrPuxTRrd=JbV1SxTiLbG3SB0nu0fEyF3yRGrNc9BA at mail.gmail.com>
Content-Type: text/plain; charset=windows-1252
On Thu, Dec 13, 2012 at 6:31 AM, Latrous, Youssef
<YLatrous at broadviewnet.com> wrote:
> Hi,
>
>
>
> I run into the following issue and I couldn?t find what it really
means:
>
>
>
> Detected action msgbroker_monitor_10000 from a different
transition:
> 16048 vs. 18014
18014 is where we're up to now, 16048 is the (old) one that scheduled
the recurring monitor operation.
I suspect you'll find the action failed earlier in the logs and thats
why it needed to be restarted.
Not the best log message though :(
>
>
>
> I can see that its impact is to stop/start a service but I?d like to
> understand it a bit more.
>
>
>
> Thank you in advance for any information.
>
>
>
>
>
> Logs about this issue:
>
> ?
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: process_graph_event:
> Detected action msgbroker_monitor_10000 from a different transition:
> 16048 vs. 18014
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph:
> process_graph_event:477 - Triggered transition abort (complete=1,
> tag=lrm_rsc_op, id=msgbroker_monitor_10000,
> magic=0:7;104:16048:0:5fb57f01-3397-45a8-905f-c48cecdc8692,
cib=0.971.5) :
> Old event
>
> Dec 6 22:55:05 Node1 crmd: [5235]: WARN: update_failcount: Updating
> failcount for msgbroker on Node0 after failed monitor: rc=7
> (update=value++,
> time=1354852505)
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: do_state_transition: All 2
> cluster nodes are eligible to run resources.
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28069:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph:
> te_update_diff:142 - Triggered transition abort (complete=1,
> tag=nvpair, id=status-Node0-fail-count-msgbroker, magic=NA,
> cib=0.971.6) : Transient
> attribute: update
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28070:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph:
> te_update_diff:142 - Triggered transition abort (complete=1,
> tag=nvpair, id=status-Node0-last-failure-msgbroker, magic=NA,
> cib=0.971.7) : Transient
> attribute: update
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28071:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Dec 6 22:55:05 Node1 attrd: [5232]: info: find_hash_entry: Creating
> hash entry for last-failure-msgbroker
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke_callback:
> Invoking the PE: query=28071, ref=pe_calc-dc-1354852505-39407, seq=12,
> quorate=1
>
> Dec 6 22:55:05 Node1 pengine: [5233]: notice: unpack_config: On loss
> of CCM
> Quorum: Ignore
>
> Dec 6 22:55:05 Node1 pengine: [5233]: notice: unpack_rsc_op:
> Operation
> txpublisher_monitor_0 found resource txpublisher active on Node1
>
> Dec 6 22:55:05 Node1 pengine: [5233]: WARN: unpack_rsc_op: Processing
> failed op msgbroker_monitor_10000 on Node0: not running (7)
>
> ?
>
> Dec 6 22:55:05 Node1 pengine: [5233]: notice:
common_apply_stickiness:
> msgbroker can fail 999999 more times on Node0 before being forced off
>
> ?
>
> Dec 6 22:55:05 Node1 pengine: [5233]: notice: RecurringOp: Start
> recurring monitor (10s) for msgbroker on Node0
>
> ?
>
> Dec 6 22:55:05 Node1 pengine: [5233]: notice: LogActions: Recover
> msgbroker (Started Node0)
>
> ?
>
> Dec 6 22:55:05 Node1 crmd: [5235]: info: te_rsc_command: Initiating
> action
> 37: stop msgbroker_stop_0 on Node0
>
>
>
>
>
> Transition 18014 details:
>
>
>
> Dec 6 22:52:18 Node1 pengine: [5233]: notice: process_pe_message:
> Transition 18014: PEngine Input stored in:
> /var/lib/pengine/pe-input-3270.bz2
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: unpack_graph: Unpacked
> transition
> 18014: 0 actions in 0 synapses
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: do_te_invoke: Processing
> graph
> 18014 (ref=pe_calc-dc-1354852338-39406) derived from
> /var/lib/pengine/pe-input-3270.bz2
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: run_graph:
> ====================================================
>
> Dec 6 22:52:18 Node1 crmd: [5235]: notice: run_graph: Transition
> 18014 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-3270.bz2): Complete
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: te_graph_trigger: Transition
> 18014 is now complete
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: notify_crmd: Transition
> 18014
> status: done - <null>
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> Dec 6 22:52:18 Node1 crmd: [5235]: info: do_state_transition:
> Starting PEngine Recheck Timer
>
>
>
>
>
> Youssef
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
------------------------------
Message: 5
Date: Thu, 13 Dec 2012 01:17:17 +0000
From: Xavier Lashmar <xlashmar at uottawa.ca>
To: The Pacemaker cluster resource manager
<pacemaker at oss.clusterlabs.org>
Subject: Re: [Pacemaker] gfs2 / dlm on centos 6.2
Message-ID:
<CC445C0CEB8B8A4C87297D880D8F903BBCC0FA95 at CMS-P04.uottawa.o.univ>
Content-Type: text/plain; charset="windows-1252"
I see, thanks very much for pointing me in the right direction!
Xavier Lashmar
Universit? d'Ottawa / University of Ottawa Analyste de Syst?mes |
Systems Analyst Service ?tudiants, service de l'informatique et des
communications | Student services, computing and communications
services.
1 Nicholas Street (810)
Ottawa ON K1N 7B7
T?l. | Tel. 613-562-5800 (2120)
________________________________
From: Andrew Beekhof [andrew at beekhof.net]
Sent: Tuesday, December 11, 2012 9:30 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] gfs2 / dlm on centos 6.2
On Wed, Dec 12, 2012 at 1:29 AM, Xavier Lashmar
<xlashmar at uottawa.ca<mailto:xlashmar at uottawa.ca>> wrote:
Hello,
We are attempting to mount gfs2 partitions on CentOS using DRBD +
COROSYNC + PACEMAKER. Unfortunately we consistently get the following
error:
You'll need to configure pacemaker to use cman for this.
See:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from
_Scratch/ch08s02.html
# mount /dev/vg_data/lv_data /webdata/ -t gfs2 -v mount /dev/dm-2
/webdata
parse_opts: opts = "rw"
clear flag 1 for "rw", flags = 0
parse_opts: flags = 0
parse_opts: extra = ""
parse_opts: hostdata = ""
parse_opts: lockproto = ""
parse_opts: locktable = ""
gfs_controld join connect error: Connection refused error mounting
lockproto lock_dlm
We are trying to find out where to get the lock_dlm libraries and
packages for Centos 6.2 and 6.3
Also, I found that the document ?Pacemaker 1.1 - Clusters from Scratch?
the Fedora 17 version is a bit problematic. I?m also running a Fedora
17 system and found no package ?dlm? as per the instructions in section
8.1.1
yum install -y gfs2-utils dlm kernel-modules-extra
Any idea if an external repository is needed? If so, which one ? and
which package do we need to install for CentOS 6+ ?
Thanks very much
[Description: Description: cid:D85E51EA-D618-4CBC-9F88-34F696123DED]
Xavier Lashmar
Analyste de Syst?mes | Systems Analyst
Service ?tudiants, service de l'informatique et des
communications/Student services, computing and communications services.
1 Nicholas Street (810)
Ottawa ON K1N 7B7
T?l. | Tel. 613-562-5800 (2120)
_______________________________________________
Pacemaker mailing list:
Pacemaker at oss.clusterlabs.org<mailto:Pacemaker at oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23
bdf24/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 916 bytes
Desc: image003.png
URL:
<http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23
bdf24/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 989 bytes
Desc: image001.png
URL:
<http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23
bdf24/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 4219 bytes
Desc: image002.png
URL:
<http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121213/d23
bdf24/attachment-0002.png>
------------------------------
_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
End of Pacemaker Digest, Vol 61, Issue 34
*****************************************
More information about the Pacemaker
mailing list