[Pacemaker] Cannot use ocf::heartbeat:IPsrcaddr (RTNETLINK answers: No such process)
David Vossel
dvossel at redhat.com
Wed Nov 6 23:46:41 UTC 2013
----- Original Message -----
> From: "Mathieu Peltier" <mathieu.peltier at gmail.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Wednesday, November 6, 2013 11:27:50 AM
> Subject: [Pacemaker] Cannot use ocf::heartbeat:IPsrcaddr (RTNETLINK answers: No such process)
>
> Hi,
> I am trying to set up a simple cluster of 2 machines on CentOS 6.4:
> pacemaker-cli-1.1.10-1.el6_4.4.x86_64
> pacemaker-1.1.10-1.el6_4.4.x86_64
> pacemaker-libs-1.1.10-1.el6_4.4.x86_64
> pacemaker-cluster-libs-1.1.10-1.el6_4.4.x86_64
> corosync-1.4.1-15.el6_4.1.x86_64
> corosynclib-1.4.1-15.el6_4.1.x86_64
> pcs-0.9.90-1.el6_4.noarch
> cman-3.0.12.1-49.el6_4.2.x86_64
> resource-agents-3.9.2-21.el6_4.8.x86_64
>
> I am using the following script to configure the cluster:
> --------------------------------------------------
> #!/bin/bash
>
> CLUSTER_NAME=test
> CONFIG_FILE=/etc/cluster/cluster.conf
> NODE1_EM1=node1
> NODE2_EM1=node2
> NODE1_EM2=node1-priv
> NODE2_EM2=node2-priv
> VIP=192.168.0.6
> MONITOR_INTERVAL=60s
>
> # Make sure that pacemaker is stopped on both nodes
> # NOT INCLUDED HERE
>
> # Delete existing configuration
> rm -rf /var/log/cluster/*
> ssh root@$NODE2_EM2 'rm -rf /var/log/cluster/*'
> rm -rf /var/lib/pacemaker/cib/* /var/lib/pacemaker/cores/*
> /var/lib/pacemaker/pengine/* /var/lib/corosync/* /var/lib/cluster/*
> ssh root@$NODE2_EM2 'rm -rf /var/lib/pacemaker/cib/*
> /var/lib/pacemaker/cores/* /var/lib/pacemaker/pengine/*
> /var/lib/corosync/* /var/lib/cluster/*'
>
> # Create the cluster
> ccs -f $CONFIG_FILE --createcluster $CLUSTER_NAME
>
> # Add nodes to the cluster
> ccs -f $CONFIG_FILE --addnode $NODE1_EM1
> ccs -f $CONFIG_FILE --addnode $NODE2_EM1
> ccs -f $CONFIG_FILE --setcman two_node="1" expected_votes="1"
>
> # Add alternative nodes name so that both network interfaces are used
> ccs -f $CONFIG_FILE --addalt $NODE1_EM1 $NODE1_EM2
> ccs -f $CONFIG_FILE --addalt $NODE2_EM1 $NODE2_EM2
> ccs -f $CONFIG_FILE --setdlm protocol="sctp"
>
> # Teach CMAN how to send it's fencing requests to Pacemaker
> ccs -f $CONFIG_FILE --addfencedev pcmk agent=fence_pcmk
> ccs -f $CONFIG_FILE --addmethod pcmk-redirect $NODE1_EM1
> ccs -f $CONFIG_FILE --addmethod pcmk-redirect $NODE2_EM1
> ccs -f $CONFIG_FILE --addfenceinst pcmk $NODE1_EM1 pcmk-redirect
> port=$NODE1_EM1
> ccs -f $CONFIG_FILE --addfenceinst pcmk $NODE2_EM1 pcmk-redirect
> port=$NODE2_EM1
>
> # Deploy configuration to node2
> scp /etc/cluster/cluster.conf root@$NODE2_EM2:/etc/cluster/cluster.conf
>
> # Start pacemaker on main node
> /etc/init.d/pacemaker start
> sleep 30
>
> # Disable stonith
> pcs property set stonith-enabled=false
>
> # Disable quorum
> pcs property set no-quorum-policy=ignore
>
> # Define ressources
> pcs resource create VIP_EM1 ocf:heartbeat:IPaddr params nic=em1
> ip=$VIP_EM1 cidr_netmask=24 op monitor interval=$MONITOR_INTERVAL
> pcs resource create PREFERRED_SRC_IP ocf:heartbeat:IPsrcaddr params
> ipaddress=$VIP_EM1 op monitor interval=$MONITOR_INTERVAL
>
> # Define initial location and prevent ressources to go back to initial
> server after a failure
> pcs resource defaults resource-stickiness=100
> pcs constraint location VIP_EM1 prefers $NODE1_EM1=50
> --------------------------------------------------
>
> After running this script from node1:
>
> root at node1# pcs status
> Cluster name:
> Last updated: Wed Nov 6 17:17:30 2013
> Last change: Wed Nov 6 17:06:20 2013 via crm_attribute on node1
> Stack: cman
> Current DC: node1 - partition with quorum
> Version: 1.1.10-1.el6_4.4-368c726
> 2 Nodes configured
> 2 Resources configured
>
> Online: [ node1 ]
> OFFLINE: [ node2 ]
>
> Full list of resources:
>
> VIP_EM1 (ocf::heartbeat:IPaddr): Stopped
> PREFERRED_SRC_IP (ocf::heartbeat:IPsrcaddr): Stopped
>
> Failed actions:
> PREFERRED_SRC_IP_start_0 on node1 'unknown error' (1): call=19,
> status=complete, last-rc-change='Wed Nov 6 17:06:20 2013',
> queued=67ms, exec=0ms
>
> PCSD Status:
> Error: no nodes found in corosync.conf
>
> root at node1# ip route show
> 192.168.8.0/24 dev em2 proto kernel scope link src 192.168.8.1
> default via 192.168.0.1 dev em1
>
> Error in /var/log/cluster/corosync.log:
> ...
> IPsrcaddr(PREFERRED_SRC_IP)[638]: 2013/11/06_16:50:32 ERROR:
> command 'ip route change to default via 192.168.0.1 dev em1 src
> 192.168.0.6' failed
> Nov 06 16:50:32 [32461] node1.domain.org lrmd: notice:
> operation_finished: PREFERRED_SRC_IP_start_0:638:stderr [
> RTNETLINK answers: No such process ]
> ...
>
> If I run the command manually when pacemaker is not started (after
> rebooting the machine for example), the default route is modified as
> expected (I use 192.168.0.106 because the alias 192.168.0.6 is not
> started)
>
> # ip route show
> 192.168.0.0/24 dev em1 proto kernel scope link src 192.168.0.106
> 192.168.8.0/24 dev em3 proto kernel scope link src 192.168.8.1
> default via 192.168.0.1 dev em1
>
> # ip route change to default via 192.168.0.1 dev em1 src 192.168.0.106
>
> # ip route show
> 192.168.0.0/24 dev em1 proto kernel scope link src 192.168.0.106
> 192.168.8.0/24 dev em3 proto kernel scope link src 192.168.8.1
> default via 192.168.0.1 dev em1 src 192.168.0.106
>
> If I run the same configure script without defining the
> PREFERRED_SRC_IP resource, I can check that the resource is started as
> expected:
>
> # pcs status
> ...
> Online: [ node1 ]
> OFFLINE: [ node2 ]
>
> Full list of resources:
> VIP_EM1 (ocf::heartbeat:IPaddr): Started node1
> ...
>
> # ip addr show em1
> 6: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
> link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
> inet 192.168.0.106/24 brd 192.168.0.255 scope global em1
> inet 192.168.0.6/24 brd 192.168.0.255 scope global secondary em1
>
> But when I create the PREFERRED_SRC_IP resource, I get the same error:
>
> # pcs resource create PREFERRED_SRC_IP ocf:heartbeat:IPsrcaddr params
> ipaddress=192.168.0.6 op monitor interval=60s
I noticed you didn't create a order constraint between the IPaddr and the IPsrcaddr resources. You'll want to guarantee the IP address starts before setting it as the IPsrcaddr.
pcs constraint order VIP_EM1 then PREFERRED_SRC_IP
If that doesn't help anything, we'll need some debug information. After defining the src ip and watching it fail, run this and provide the debug info it provides.
crm_resource -r PREFERRED_SRC_IP --force-start -VV
Thanks,
-- Vossel
>
> # pcs status
> ...
> Online: [ node1 ]
> OFFLINE: [ node2 ]
>
> Full list of resources:
> VIP_EM1 (ocf::heartbeat:IPaddr): Started node1
> PREFERRED_SRC_IP (ocf::heartbeat:IPsrcaddr): Stopped
>
> Failed actions:
> PREFERRED_SRC_IP_start_0 on node1 'unknown error' (1): call=24,
> status=complete, last-rc-change='Wed Nov 6 18:00:09 2013',
> queued=47ms, exec=0ms
>
> Error in corosync.log:
>
> IPsrcaddr(PREFERRED_SRC_IP)[10035]: 2013/11/06_18:00:09 ERROR:
> command 'ip route change to default via 192.168.0.1 dev em1 src
> 192.168.0.6' failed
> Nov 06 18:00:09 [9172] node1.domain.org lrmd: notice:
> operation_finished: PREFERRED_SRC_IP_start_0:10035:stderr [
> RTNETLINK answers: No such process ]
>
> Thanks in advance,
> Mathieu
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list