[Pacemaker] 1.1.10 problems on CentOS 6.5
Diego Remolina
diego.remolina at physics.gatech.edu
Thu Dec 12 13:39:37 UTC 2013
I was successfully running 1.1.8 on a pair of CentOS 6.4 servers and
after updating to CentOS 6.5 and 1.1.10, pacemaker miss-behaves.
The first symptoms appeared with the 1.1.10-14.el6 packages. About 20
hours after the upgrade, the first drbd_monitor issues came out.
Dec 09 18:50:12 Updated: pacemaker-libs-1.1.10-14.el6.x86_64
Dec 09 18:50:13 Updated: pacemaker-cli-1.1.10-14.el6.x86_64
Dec 09 18:50:13 Updated: pacemaker-cluster-libs-1.1.10-14.el6.x86_64
Dec 09 18:50:13 Updated: pacemaker-1.1.10-14.el6.x86_64
Dec 10 15:27:55 ysmha01 lrmd[3076]: warning: child_timeout_callback:
drbd_export_monitor_29000 process (PID 19608) timed out
Dec 10 15:27:55 ysmha01 lrmd[3076]: warning: operation_finished:
drbd_export_monitor_29000:19608 - timed out after 20000ms
Dec 10 15:27:55 ysmha01 crmd[3079]: error: process_lrm_event: LRM
operation drbd_export_monitor_29000 (77) Timed Out (timeout=20000ms)
Dec 10 15:27:56 ysmha01 crmd[3079]: notice: process_lrm_event: LRM
operation drbd_export_notify_0 (call=99, rc=0, cib-update=0,
confirmed=true) ok
At this point, I tried taking the node to standby and back to online and
cleaning the resources to no avail. I tried stopping pacemaker without
luck. I rebooted both servers and on Dec 11, the failure started with
failure to monitor pingd, then drbd_monitor.
Dec 11 16:12:10 ysmha01 lrmd[3060]: warning: child_timeout_callback:
pingd_monitor_20000 process (PID 26237) timed out
Dec 11 16:12:10 ysmha01 lrmd[3060]: warning: operation_finished:
pingd_monitor_20000:26237 - timed out after 15000ms
Dec 11 16:12:10 ysmha01 crmd[3063]: error: process_lrm_event: LRM
operation pingd_monitor_20000 (35) Timed Out (timeout=15000ms)
Dec 11 16:12:19 ysmha01 lrmd[3060]: warning: child_timeout_callback:
drbd_export_monitor_29000 process (PID 26268) timed out
Dec 11 16:12:19 ysmha01 lrmd[3060]: warning: operation_finished:
drbd_export_monitor_29000:26268 - timed out after 20000ms
Dec 11 16:12:19 ysmha01 crmd[3063]: error: process_lrm_event: LRM
operation drbd_export_monitor_29000 (62) Timed Out (timeout=20000ms)
I upgraded to the latest rpms yesterday afternoon (1.1.10-14.el6_5.1).
Right before 1 am, there were issues again.
Dec 12 00:49:39 ysmha01 pengine[3149]: notice: process_pe_message:
Calculated Transition 41: /var/lib/pacemaker/pengine/pe-input-173.bz2
Dec 12 00:50:03 ysmha01 lrmd[3147]: warning: child_timeout_callback:
drbd_export_monitor_29000 process (PID 18496) timed out
Dec 12 00:50:03 ysmha01 lrmd[3147]: warning: operation_finished:
drbd_export_monitor_29000:18496 - timed out after 20000ms
Dec 12 00:50:03 ysmha01 crmd[3150]: error: process_lrm_event: LRM
operation drbd_export_monitor_29000 (60) Timed Out (timeout=20000ms)
I am for now manually running the machines without pacemaker. What
suggestions do you have for me? What should I try first?
- Revert to 1.1.8?
- Could be something related to drbd in the new kernel? Downgrade kernel
rpm?
I can post logs on request, where would be a good place to do that?
Thanks,
Diego
More information about the Pacemaker
mailing list