[Pacemaker] Cannot use ocf::heartbeat:IPsrcaddr (RTNETLINK answers: No such process)

Wed Nov 6 23:46:41 UTC 2013

----- Original Message -----
> From: "Mathieu Peltier" <mathieu.peltier at gmail.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Wednesday, November 6, 2013 11:27:50 AM
> Subject: [Pacemaker] Cannot use ocf::heartbeat:IPsrcaddr (RTNETLINK answers:	No such process)
> 
> Hi,
> I am trying to set up a simple cluster of 2 machines on CentOS 6.4:
>  pacemaker-cli-1.1.10-1.el6_4.4.x86_64
>  pacemaker-1.1.10-1.el6_4.4.x86_64
>  pacemaker-libs-1.1.10-1.el6_4.4.x86_64
>  pacemaker-cluster-libs-1.1.10-1.el6_4.4.x86_64
>  corosync-1.4.1-15.el6_4.1.x86_64
>  corosynclib-1.4.1-15.el6_4.1.x86_64
>  pcs-0.9.90-1.el6_4.noarch
>  cman-3.0.12.1-49.el6_4.2.x86_64
>  resource-agents-3.9.2-21.el6_4.8.x86_64
> 
> I am using the following script to configure the cluster:
> --------------------------------------------------
> #!/bin/bash
> 
> CLUSTER_NAME=test
> CONFIG_FILE=/etc/cluster/cluster.conf
> NODE1_EM1=node1
> NODE2_EM1=node2
> NODE1_EM2=node1-priv
> NODE2_EM2=node2-priv
> VIP=192.168.0.6
> MONITOR_INTERVAL=60s
> 
> # Make sure that pacemaker is stopped on both nodes
> # NOT INCLUDED HERE
> 
> # Delete existing configuration
> rm -rf /var/log/cluster/*
> ssh root@$NODE2_EM2 'rm -rf /var/log/cluster/*'
> rm -rf /var/lib/pacemaker/cib/* /var/lib/pacemaker/cores/*
> /var/lib/pacemaker/pengine/* /var/lib/corosync/* /var/lib/cluster/*
> ssh root@$NODE2_EM2 'rm -rf /var/lib/pacemaker/cib/*
> /var/lib/pacemaker/cores/* /var/lib/pacemaker/pengine/*
> /var/lib/corosync/* /var/lib/cluster/*'
> 
> # Create the cluster
> ccs -f $CONFIG_FILE --createcluster $CLUSTER_NAME
> 
> # Add nodes to the cluster
> ccs -f $CONFIG_FILE --addnode $NODE1_EM1
> ccs -f $CONFIG_FILE --addnode $NODE2_EM1
> ccs -f $CONFIG_FILE --setcman two_node="1" expected_votes="1"
> 
> # Add alternative nodes name so that both network interfaces are used
> ccs -f $CONFIG_FILE --addalt $NODE1_EM1 $NODE1_EM2
> ccs -f $CONFIG_FILE --addalt $NODE2_EM1 $NODE2_EM2
> ccs -f $CONFIG_FILE --setdlm protocol="sctp"
> 
> # Teach CMAN how to send it's fencing requests to Pacemaker
> ccs -f $CONFIG_FILE --addfencedev pcmk agent=fence_pcmk
> ccs -f $CONFIG_FILE --addmethod pcmk-redirect $NODE1_EM1
> ccs -f $CONFIG_FILE --addmethod pcmk-redirect $NODE2_EM1
> ccs -f $CONFIG_FILE --addfenceinst pcmk $NODE1_EM1 pcmk-redirect
> port=$NODE1_EM1
> ccs -f $CONFIG_FILE --addfenceinst pcmk $NODE2_EM1 pcmk-redirect
> port=$NODE2_EM1
> 
> # Deploy configuration to node2
> scp /etc/cluster/cluster.conf root@$NODE2_EM2:/etc/cluster/cluster.conf
> 
> # Start pacemaker on main node
> /etc/init.d/pacemaker start
> sleep 30
> 
> # Disable stonith
> pcs property set stonith-enabled=false
> 
> # Disable quorum
> pcs property set no-quorum-policy=ignore
> 
> # Define ressources
> pcs resource create VIP_EM1 ocf:heartbeat:IPaddr params nic=em1
> ip=$VIP_EM1 cidr_netmask=24 op monitor interval=$MONITOR_INTERVAL
> pcs resource create PREFERRED_SRC_IP ocf:heartbeat:IPsrcaddr params
> ipaddress=$VIP_EM1 op monitor interval=$MONITOR_INTERVAL
> 
> # Define initial location and prevent ressources to go back to initial
> server after a failure
> pcs resource defaults resource-stickiness=100
> pcs constraint location VIP_EM1 prefers $NODE1_EM1=50
> --------------------------------------------------
> 
> After running this script from node1:
> 
> root at node1# pcs status
> Cluster name:
> Last updated: Wed Nov  6 17:17:30 2013
> Last change: Wed Nov  6 17:06:20 2013 via crm_attribute on node1
> Stack: cman
> Current DC: node1 - partition with quorum
> Version: 1.1.10-1.el6_4.4-368c726
> 2 Nodes configured
> 2 Resources configured
> 
> Online: [ node1 ]
> OFFLINE: [ node2 ]
> 
> Full list of resources:
> 
>  VIP_EM1    (ocf::heartbeat:IPaddr):    Stopped
>  PREFERRED_SRC_IP    (ocf::heartbeat:IPsrcaddr):    Stopped
> 
> Failed actions:
>     PREFERRED_SRC_IP_start_0 on node1 'unknown error' (1): call=19,
> status=complete, last-rc-change='Wed Nov  6 17:06:20 2013',
> queued=67ms, exec=0ms
> 
> PCSD Status:
> Error: no nodes found in corosync.conf
> 
> root at node1# ip route show
> 192.168.8.0/24 dev em2  proto kernel  scope link  src 192.168.8.1
> default via 192.168.0.1 dev em1
> 
> Error in /var/log/cluster/corosync.log:
> ...
> IPsrcaddr(PREFERRED_SRC_IP)[638]:       2013/11/06_16:50:32 ERROR:
> command 'ip route change to  default via 192.168.0.1 dev em1 src
> 192.168.0.6' failed
> Nov 06 16:50:32 [32461] node1.domain.org       lrmd:   notice:
> operation_finished:       PREFERRED_SRC_IP_start_0:638:stderr [
> RTNETLINK answers: No such process ]
> ...
> 
> If I run the command manually when pacemaker is not started (after
> rebooting the machine for example), the default route is modified as
> expected (I use 192.168.0.106 because the alias 192.168.0.6 is not
> started)
> 
> # ip route show
> 192.168.0.0/24 dev em1  proto kernel  scope link  src 192.168.0.106
> 192.168.8.0/24 dev em3  proto kernel  scope link  src 192.168.8.1
> default via 192.168.0.1 dev em1
> 
> # ip route change to  default via 192.168.0.1 dev em1 src 192.168.0.106
> 
> # ip route show
> 192.168.0.0/24 dev em1  proto kernel  scope link  src 192.168.0.106
> 192.168.8.0/24 dev em3  proto kernel  scope link  src 192.168.8.1
> default via 192.168.0.1 dev em1 src 192.168.0.106
> 
> If I run the same configure script without defining the
> PREFERRED_SRC_IP resource, I can check that the resource is started as
> expected:
> 
> # pcs status
> ...
> Online: [ node1 ]
> OFFLINE: [ node2 ]
> 
> Full list of resources:
>  VIP_EM1    (ocf::heartbeat:IPaddr):    Started node1
> ...
> 
> # ip addr show em1
> 6: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
>     inet 192.168.0.106/24 brd 192.168.0.255 scope global em1
>     inet 192.168.0.6/24 brd 192.168.0.255 scope global secondary em1
> 
> But when I create the  PREFERRED_SRC_IP resource, I get the same error:
> 
> # pcs resource create PREFERRED_SRC_IP ocf:heartbeat:IPsrcaddr params
> ipaddress=192.168.0.6 op monitor interval=60s

I noticed you didn't create a order constraint between the IPaddr and the IPsrcaddr resources.  You'll want to guarantee the IP address starts before setting it as the IPsrcaddr.

pcs constraint order VIP_EM1 then PREFERRED_SRC_IP

If that doesn't help anything, we'll need some debug information. After defining the src ip and watching it fail, run this and provide the debug info it provides.
crm_resource -r PREFERRED_SRC_IP --force-start -VV

Thanks,
-- Vossel

> 
> # pcs status
> ...
> Online: [ node1 ]
> OFFLINE: [ node2 ]
> 
> Full list of resources:
>  VIP_EM1    (ocf::heartbeat:IPaddr):    Started node1
>  PREFERRED_SRC_IP    (ocf::heartbeat:IPsrcaddr):    Stopped
> 
> Failed actions:
>     PREFERRED_SRC_IP_start_0 on node1 'unknown error' (1): call=24,
> status=complete, last-rc-change='Wed Nov  6 18:00:09 2013',
> queued=47ms, exec=0ms
> 
> Error in corosync.log:
> 
>  IPsrcaddr(PREFERRED_SRC_IP)[10035]:     2013/11/06_18:00:09 ERROR:
> command 'ip route change to  default via 192.168.0.1 dev em1 src
> 192.168.0.6' failed
>  Nov 06 18:00:09 [9172] node1.domain.org      lrmd:   notice:
> operation_finished:        PREFERRED_SRC_IP_start_0:10035:stderr [
> RTNETLINK answers: No such process ]
> 
> Thanks in advance,
> Mathieu
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>