[Pacemaker] [Problem] The timer which does not stop is discarded.

Thu Feb 20 00:39:54 EST 2014

Hi All,

The timer which is not stopped at the time of the stop of the monitor of the master slave resource of the local node runs.
Therefore, warning to cancel outputs a timer when crmd handles the transition that is in a new state.

I confirm it in the next procedure.

Step1) Constitute a cluster.

[root at srv01 ~]# crm_mon -1 -Af
Last updated: Thu Feb 20 22:57:09 2014
Last change: Thu Feb 20 22:56:32 2014 via cibadmin on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-c1a326d
2 Nodes configured
6 Resources configured

Online: [ srv01 srv02 ]

 vip-master     (ocf::heartbeat:Dummy2):        Started srv01 
 vip-rep        (ocf::heartbeat:Dummy): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ srv01 ]
     Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
     Started: [ srv01 srv02 ]

Node Attributes:
* Node srv01:
    + default_ping_set                  : 100       
    + master-pgsql                      : 10        
* Node srv02:
    + default_ping_set                  : 100       
    + master-pgsql                      : 5         

Migration summary:
* Node srv01: 
* Node srv02: 

Step2) Cause trouble.
[root at srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state 

Step3) Warning is displayed by log.
(snip)
Feb 20 22:57:46 srv01 crmd[12107]:   notice: te_rsc_command: Initiating action 5: cancel pgsql_cancel_9000 on srv01 (local)
Feb 20 22:57:46 srv01 lrmd[12104]:     info: cancel_recurring_action: Cancelling operation pgsql_monitor_9000
Feb 20 22:57:46 srv01 crmd[12107]:     info: match_graph_event: Action pgsql_monitor_9000 (5) confirmed on srv01 (rc=0)
(snip)
Feb 20 22:57:46 srv01 pengine[12106]:     info: LogActions: Leave   prmPingd:1#011(Started srv02)Feb 20 22:57:46 srv01 crmd[12107]:     info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Feb 20 22:57:46 srv01 crmd[12107]:  warning: destroy_action: Cancelling timer for action 5 (src=139)
(snip)

The time-out monitoring with the timer thinks like an unnecessary at the time of the stop of the monitor of the master slave resource of the local node.

I registered these contents with Bugzilla.

 * http://bugs.clusterlabs.org/show_bug.cgi?id=5199

Best Regards,
Hideo Yamauchi.