[Pacemaker] Problem: monitor timeout causes cluster resource unmanaged and stopped on both nodes.

Thu Dec 17 03:18:20 EST 2009

On Wed, Dec 16, 2009 at 5:55 PM, Oscar Remírez de Ganuza Satrústegui
<oscarrdg at unav.es> wrote:

[snip]

> 2. The CRM decided to stop the service.
> Dec 15 20:12:55 herculespre crmd: [8562]: info: do_lrm_rsc_op: Performing
> key=4:1379:0:ae99a943-f4b7-4979-b0c9-09c7f9dd0f9f
> op=mysql-horde-service_stop_0 )
> Dec 15 20:12:55 herculespre lrmd: [8559]: info: rsc:mysql-horde-service:38:
> stop
>
> 3. The MySQL service received the order and shutted down properly. From
> mysql.log:
> 091215 20:13:14 [Note] /usr/local/etc2/mysql-horde/libexec/mysqld: Normal
> shutdown
> ...
> 091215 20:13:17 [Note] /usr/local/etc2/mysql-horde/libexec/mysqld: Shutdown
> complete
>
> 4. Here comes the problem: the cluster did not received the confirmation
> that the service was properly shutted down:
> Dec 15 20:13:17 herculespre lrmd: [8559]: WARN: mysql-horde-service:stop
> process (PID 12270) timed out (try 1). Killing with signal SIGTERM (15).
> Dec 15 20:13:17 herculespre lrmd: [8559]: WARN: operation stop[38] on
> lsb::mysql-horde::mysql-horde-service for client 8562, its parameters:
> CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] : pid [12270] timed out
> Dec 15 20:13:17 herculespre crmd: [8562]: ERROR: process_lrm_event: LRM
> operation mysql-horde-service_stop_0 (38) Timed Out (timeout=20000ms)
>
> What is happening here?? As it appears in the log, the timeout is suposed to
> be 20s (20000ms), and the service jsut took 3s to shutdown.
> Is it a problem with lrmd?

Looks like it.
Given the time of year, it would probably be a good idea to create a
bugzilla entry so that this doesn't get lost.