[Pacemaker] 1.1.10 problems on CentOS 6.5

David Vossel dvossel at redhat.com
Thu Dec 12 09:48:56 EST 2013


----- Original Message -----
> From: "Diego Remolina" <diego.remolina at physics.gatech.edu>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, December 12, 2013 7:39:37 AM
> Subject: [Pacemaker] 1.1.10 problems on CentOS 6.5
> 
> I was successfully running 1.1.8 on a pair of CentOS 6.4 servers and
> after updating to CentOS 6.5 and 1.1.10, pacemaker miss-behaves.
> 
> The first symptoms appeared with the 1.1.10-14.el6 packages. About 20
> hours after the upgrade, the first drbd_monitor issues came out.
> 
> Dec 09 18:50:12 Updated: pacemaker-libs-1.1.10-14.el6.x86_64
> Dec 09 18:50:13 Updated: pacemaker-cli-1.1.10-14.el6.x86_64
> Dec 09 18:50:13 Updated: pacemaker-cluster-libs-1.1.10-14.el6.x86_64
> Dec 09 18:50:13 Updated: pacemaker-1.1.10-14.el6.x86_64
> 
> Dec 10 15:27:55 ysmha01 lrmd[3076]:  warning: child_timeout_callback:
> drbd_export_monitor_29000 process (PID 19608) timed out
> Dec 10 15:27:55 ysmha01 lrmd[3076]:  warning: operation_finished:
> drbd_export_monitor_29000:19608 - timed out after 20000ms
> Dec 10 15:27:55 ysmha01 crmd[3079]:    error: process_lrm_event: LRM
> operation drbd_export_monitor_29000 (77) Timed Out (timeout=20000ms)
> Dec 10 15:27:56 ysmha01 crmd[3079]:   notice: process_lrm_event: LRM
> operation drbd_export_notify_0 (call=99, rc=0, cib-update=0,
> confirmed=true) ok

These errors look like a real resource failure.  Pacemaker appears to be doing its job here. In this case the drbd script is being called, but never exiting (which results in the timeout).  Your update of pacemaker likely has nothing to do with this. An update of anything DRBD related would make more sense.

> At this point, I tried taking the node to standby and back to online and
> cleaning the resources to no avail. I tried stopping pacemaker without
> luck. I rebooted both servers and on Dec 11, the failure started with
> failure to monitor pingd, then drbd_monitor.
> 
> Dec 11 16:12:10 ysmha01 lrmd[3060]:  warning: child_timeout_callback:
> pingd_monitor_20000 process (PID 26237) timed out
> Dec 11 16:12:10 ysmha01 lrmd[3060]:  warning: operation_finished:
> pingd_monitor_20000:26237 - timed out after 15000ms
> Dec 11 16:12:10 ysmha01 crmd[3063]:    error: process_lrm_event: LRM
> operation pingd_monitor_20000 (35) Timed Out (timeout=15000ms)
> 
> Dec 11 16:12:19 ysmha01 lrmd[3060]:  warning: child_timeout_callback:
> drbd_export_monitor_29000 process (PID 26268) timed out
> Dec 11 16:12:19 ysmha01 lrmd[3060]:  warning: operation_finished:
> drbd_export_monitor_29000:26268 - timed out after 20000ms
> Dec 11 16:12:19 ysmha01 crmd[3063]:    error: process_lrm_event: LRM
> operation drbd_export_monitor_29000 (62) Timed Out (timeout=20000ms)
>
> I upgraded to the latest rpms yesterday afternoon (1.1.10-14.el6_5.1).
> Right before 1 am, there were issues again.
> 
> Dec 12 00:49:39 ysmha01 pengine[3149]:   notice: process_pe_message:
> Calculated Transition 41: /var/lib/pacemaker/pengine/pe-input-173.bz2
> Dec 12 00:50:03 ysmha01 lrmd[3147]:  warning: child_timeout_callback:
> drbd_export_monitor_29000 process (PID 18496) timed out
> Dec 12 00:50:03 ysmha01 lrmd[3147]:  warning: operation_finished:
> drbd_export_monitor_29000:18496 - timed out after 20000ms
> Dec 12 00:50:03 ysmha01 crmd[3150]:    error: process_lrm_event: LRM
> operation drbd_export_monitor_29000 (60) Timed Out (timeout=20000ms)
> 
> I am for now manually running the machines without pacemaker. What
> suggestions do you have for me? What should I try first?

Manually running the commands works? Something weird is going on.
> 
> - Revert to 1.1.8?
> - Could be something related to drbd in the new kernel? Downgrade kernel
> rpm?
> 
> I can post logs on request, where would be a good place to do that?

make a crm_report, attach the crm_report file here.

> 
> Thanks,
> 
> Diego
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list