[Pacemaker] "Election Timeout" and node became the "Pending" state.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Tue Oct 5 04:44:11 UTC 2010
Hi,
We tested complicated node trouble.
An error of "Election Timeout" occurred then.
* Pacemaker:pacemaker-1.0.9.1
* heartbeat-3.0.3-2.3.el5
* cluster-glue:cluster-glue-1.0.6-1.6.el5
* resource-agents-1.0.3-1.0.dev.b7a3b1973ba7
We tested it in the next procedure.
Step1) Start all nodes.
Step2) In a cgl49 node, we generate a monitor error of prmApPostgreSQLDB1.
Step3) A cgl49 node is done STONITH of by a cgl54 node.
Step4) With Step3, we do kill of the master process of the cgl54 node.
Step5) A cgl54 node reboots.
Step6) A cgl49 node is done STONITH.
Step7) A cgl53 node is promoted to a DC node.
Step8) A cgl49 node is done STONITH of again.
However, because the cgl49 node has STONITH only from a cgl54 node, STONITH does time-out and
does a loop.
============
Last updated: Mon Aug 30 14:40:58 2010
Stack: Heartbeat
Current DC: cgl53 (a07bcfc0-7aee-4382-9a2b-711b9c93e7e9) - partition WITHOUT quorum
Version: 1.0.9-74392a28b7f3 stable-1.0 tip
4 Nodes configured, unknown expected votes
16 Resources configured.
============
Node cgl49 (979c05ea-442b-4f53-9ba7-6cb7e82f30ac): UNCLEAN (offline)
Node cgl54 (9bea1025-3cbe-481f-830d-a24dfc7f0374): UNCLEAN (offline)
Online: [ cgl50 cgl53 ]
Step9) When a cgl54 node restores, the election of the DC is performed, but an error occurs here.
* cgl50 node
crmd: [32110]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
crmd: [32110]: info: update_dc: Unset DC cgl53
(snip)
cgl50 crmd: [32110]: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped!
* cgl53 node
crmd: [1325]: info: do_state_transition: State transition S_INTEGRATION -> S_ELECTION [
input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]
cgl53 crmd: [1325]: info: update_dc: Unset DC cgl53
(snip)
crmd: [1325]: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped!
(snip)
crmd: [1325]: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped!
(siip)
crmd: [1325]: info: crmd_ha_msg_filter: Another DC detected: cgl50 (op=join_offer)
Step10) A cgl53 node becomes the "Pending" state.
And a cgl53 node becomes the "online" state after STONITH of the wait state did time-out.
Why is it that "Election Timeout" occurred?
Why is it that a cgl53 node became the "Pending" state?
Possibly this may be a problem of ccm.
In addition, the same problem may be already reported.
* Because a log file was big, I registered the same contents with Bugzilla.
* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2502
Best Regards,
Hideo Yamauchi.
More information about the Pacemaker
mailing list