[Pacemaker] pre_notify_demote is issued twice

Keisuke MORI keisuke.mori+ha at gmail.com
Thu Feb 6 03:45:49 EST 2014


Hi,

I observed that pre_notify_demote is issued twice when a master
resource is migrating.
I'm wondering if this is the correct behavior.

Steps to reproduce:

 - Start up 2 nodes cluster configured for the PostgreSQL streaming
replication using pgsql RA as  a master/slave resource.
 - kill the postgresql process on the master node to induce a fail-over.
 - The fail-over succeeds as expected, but pre_notify_demote was
executed twice on each node before demoting on the master resource.

100% reproducible on my cluster.

Pacemaker version: 1.1.11-rc4 (source build from the repo)
OS: RHEL6.4

I have never seen this on Pacemaker-1.0.* cluster with the same configuration.

The relevant logs and pe-inputs are attached.


Diagnostics:

(1) The first transition caused by the process failure (pe-input-160)
initiates pre_notify_demote on both nodes and cancelling slave monitor
on the slave node.
{{{
171 Jan 30 16:08:59 rhel64-1 crmd[8143]:   notice: te_rsc_command:
Initiating action 9: cancel prmPostgresql_cancel_10000 on rhel64-2
172 Jan 30 16:08:59 rhel64-1 crmd[8143]:   notice: te_rsc_command:
Initiating action 79: notify prmPostgresql_pre_notify_demote_0 on
rhel64-1 (local)

175 Jan 30 16:08:59 rhel64-1 crmd[8143]:   notice: te_rsc_command:
Initiating action 81: notify prmPostgresql_pre_notify_demote_0 on
rhel64-2
}}}

(2) When cancelling slave monitor completes, the transition is aborted
by Resource op removal.
{{{
176 Jan 30 16:08:59 rhel64-1 crmd[8143]:     info: match_graph_event:
Action prmPostgresql_monitor_10000 (9) confirmed on rhel64-2 (rc=0)
177 Jan 30 16:08:59 rhel64-1 cib[8138]:     info: cib_process_request:
Completed cib_delete operation for section status: OK (rc=0,
origin=rhel64-2/crmd/21, version=0.37.9)
178 Jan 30 16:08:59 rhel64-1 crmd[8143]:     info:
abort_transition_graph: te_update_diff:258 - Triggered transition
abort (complete=0, node=rhel64-2, tag=lrm_rsc_op,
id=prmPostgresql_monitor_10000,
magic=0:0;26:12:0:acf9a2a3-307c-460b-b786-fc20e6b8aad5, cib=0.37.9) :
Resource op removal
}}}

(3) The second transition is calculated by the abort (pe-input-161)
which results initiating pre_notify_demote again.
{{{
227 Jan 30 16:09:01 rhel64-1 pengine[8142]:   notice:
process_pe_message: Calculated Transition 15:
/var/lib/pacemaker/pengine/pe-input-161.bz2
229 Jan 30 16:09:01 rhel64-1 crmd[8143]:   notice: te_rsc_command:
Initiating action 78: notify prmPostgresql_pre_notify_demote_0 on
rhel64-1 (local)
232 Jan 30 16:09:01 rhel64-1 crmd[8143]:   notice: te_rsc_command:
Initiating action 80: notify prmPostgresql_pre_notify_demote_0 on
rhel64-2
}}}

I think that the transition abort at (2) should not happen.

Regards,
-- 
Keisuke MORI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs-pre-notify-20140206.tar.bz2
Type: application/x-bzip2
Size: 45972 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140206/63abe1d1/attachment-0002.bz2>


More information about the Pacemaker mailing list