[Pacemaker] Resource "ping" fails on passive node after upgrading to second nic

Mon Jan 9 11:59:55 UTC 2012

Stefan,

sorry, your report triggers a complete -EPARSE in my brain.

On Mon, Jan 9, 2012 at 10:38 AM, Senftleben, Stefan (itsc)
<Stefan.Senftleben at itsc.de> wrote:
> Hello everybody,
>
> last week I installed and configured in each cluster node a second network interface.
> After configuring the corosync.cfg the passive node stops the primative ping (three ping targets).

The Corosync config shouldn't affect the ping resource at all.

> Such errors are in the corosync.log:
>
> Jan 09 10:12:28 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Jan 09 10:12:28 corosync [MAIN  ] Completed service synchronization, ready to provide service.
> Jan 09 10:12:30 corosync [TOTEM ] ring 1 active with no faults
> Jan 09 10:12:37 lxds05 crmd: [1347]: info: process_lrm_event: LRM operation pri_ping:1_start_0 (call=11, rc=0, cib-update=17, confirmed=true) ok
> Jan 09 10:12:42 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: pingd (3000)
> Jan 09 10:13:37 lxds05 crmd: [1347]: WARN: cib_rsc_callback: Resource update 17 failed: (rc=-41) Remote node did not respond
> Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
> Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 22: master-pri_drbd_omd:0=10000
> Jan 09 10:19:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 22 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond
> Jan 09 10:22:08 lxds05 cib: [1343]: info: cib_stats: Processed 67 operations (1044.00us average, 0% utilization) in the last 10min
> Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
> Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 24: master-pri_drbd_omd:0=10000
> Jan 09 10:24:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 24 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond
> Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
> Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 26: master-pri_drbd_omd:0=10000
> Jan 09 10:29:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 26 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond
> Jan 09 10:32:08 lxds05 cib: [1343]: info: cib_stats: Processed 6 operations (1666.00us average, 0% utilization) in the last 10min
> Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush op to all hosts for: master-pri_drbd_omd:0 (10000)
> Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 28: master-pri_drbd_omd:0=10000
> Jan 09 10:34:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 28 for master-pri_drbd_omd:0=10000 failed: Remote node did not respond

Not a single message from any ping resource here.

> The check with corosync-cfg -s runs without errors on both nodes.

Does "corosync-objctl | grep member" yield two members or one?

> I do not know, what is wrong, because the targets used in the crm config can be pinged successfully.
> Can someone help me, please? Thanks in advance.

Unlikely, you didn't give an awful lot of useful information, even
your resource config is missing. "cibadmin -Q" dump posted to
pastebin, and the URL shared here, might help.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now