[ClusterLabs] pacemaker remote configuration on ubuntu 14.04

Сергей Филатов filatecs at gmail.com
Sun Mar 20 04:40:49 CET 2016


I’m fairly new to pacemaker, could you tell me what could the blocker?
root at controller-1:~# pcs constraint
Location Constraints:
  Resource: clone_p_dns
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_haproxy
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_heat-engine
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_mysql
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_neutron-dhcp-agent
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_neutron-l3-agent
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_neutron-metadata-agent
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_neutron-plugin-openvswitch-agent
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_ntp
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_p_vrouter
    Enabled on: controller-1.domain.com (score:100)
  Resource: clone_ping_vip__public
    Enabled on: controller-1.domain.com (score:100)
  Resource: master_p_conntrackd
    Enabled on: controller-1.domain.com (score:100)
  Resource: master_p_rabbitmq-server
    Enabled on: controller-1.domain.com (score:100)
  Resource: vip__management
    Enabled on: controller-1.domain.com (score:100)
  Resource: vip__public
    Enabled on: controller-1.domain.com (score:100)
    Constraint: loc_ping_vip__public
      Rule: score=-INFINITY boolean-op=or
        Expression: not_defined pingd
        Expression: pingd lte 0
  Resource: vip__vrouter
    Enabled on: controller-1.domain.com (score:100)
  Resource: vip__vrouter_pub
    Enabled on: controller-1.domain.com (score:100)
Ordering Constraints:
Colocation Constraints:
  vip__vrouter with vip__vrouter_pub
  vip__management with clone_p_haproxy
  vip__public with clone_p_haproxy
  clone_p_dns with clone_p_vrouter
  vip__vrouter_pub with master_p_conntrackd (rsc-role:Started) (with-rsc-role:Master)


crm configure show:

node 14: controller-1.domain.com
primitive compute-1 ocf:pacemaker:remote \
        op monitor interval=60
primitive p_conntrackd ocf:fuel:ns_conntrackd \
        op monitor interval=30 timeout=60 \
        op monitor interval=27 role=Master timeout=60 \
        meta migration-threshold=INFINITY failure-timeout=180s
primitive p_dns ocf:fuel:ns_dns \
        op monitor interval=20 timeout=10 \
        op start interval=0 timeout=30 \
        op stop interval=0 timeout=30 \
        params ns=vrouter \
        meta migration-threshold=3 failure-timeout=120
primitive p_haproxy ocf:fuel:ns_haproxy \
        op monitor interval=30 timeout=60 \
        op start interval=0 timeout=60 \
        op stop interval=0 timeout=60 \
        params ns=haproxy debug=false other_networks="172.21.1.0/24 192.168.33.0/24 192.168.31.0/24 192.168.32.0/24 10.2.55.0/24" \
        meta migration-threshold=3 failure-timeout=120
primitive p_heat-engine ocf:fuel:heat-engine \
        op monitor interval=20 timeout=30 \
        op start interval=0 timeout=60 \
        op stop interval=0 timeout=60 \
        meta resource-stickiness=1 migration-threshold=3
primitive p_mysql ocf:fuel:mysql-wss \
        op monitor interval=60 timeout=55 \
        op start interval=0 timeout=300 \
        op stop interval=0 timeout=120 \
        params test_user=wsrep_sst test_passwd=mlNsGR89 socket="/var/run/mysqld/mysqld.sock"
primitive p_neutron-dhcp-agent ocf:fuel:ocf-neutron-dhcp-agent \
        op monitor interval=20 timeout=10 \
        op start interval=0 timeout=60 \
        op stop interval=0 timeout=60 \
        params plugin_config="/etc/neutron/dhcp_agent.ini" remove_artifacts_on_stop_start=true
primitive p_neutron-l3-agent ocf:fuel:ocf-neutron-l3-agent \
        op monitor interval=20 timeout=10 \
        op start interval=0 timeout=60 \
        op stop interval=0 timeout=60 \
        params plugin_config="/etc/neutron/l3_agent.ini" remove_artifacts_on_stop_start=true
primitive p_neutron-metadata-agent ocf:fuel:ocf-neutron-metadata-agent \
        op monitor interval=60 timeout=10 \
        op start interval=0 timeout=30 \
        op stop interval=0 timeout=30
primitive p_neutron-plugin-openvswitch-agent ocf:fuel:ocf-neutron-ovs-agent \
        op monitor interval=20 timeout=10 \

> On 11 Mar 2016, at 14:11, Ken Gaillot <kgaillot at redhat.com> wrote:
> 
> On 03/10/2016 11:36 PM, Сергей Филатов wrote:
>> This one is the right log
> 
> Something in the cluster configuration and state (for example, an
> unsatisfied constraint) is preventing the cluster from starting the
> resource:
> 
> Mar 10 04:00:53 [11785] controller-1.domain.com    pengine:     info:
> native_print:     compute-1       (ocf::pacemaker:remote):        Stopped
> Mar 10 04:00:53 [11785] controller-1.domain.com    pengine:     info:
> native_color:     Resource compute-1 cannot run anywhere
> 
> 
>> 
>> 
>> 
>>> On 10 Mar 2016, at 08:17, Сергей Филатов <filatecs at gmail.com 
>>> <mailto:filatecs at gmail.com>> wrote:
>>> 
>>> pcs resource show compute-1
>>> 
>>> Resource: compute-1 (class=ocf provider=pacemaker type=remote)
>>> Operations: monitor interval=60s (compute-1-monitor-interval-60s)
>>> 
>>> Can’t find _start_0 template in pacemaker logs
>>> I don’t have ipv6 address for remote node, but I guess it should be listening 
>>> on both
>>> 
>>> attached pacemaker.log for cluster node
>>> <pacemaker.log.tar.gz>
>>> 
>>> 
>>>> On 09 Mar 2016, at 10:23, Ken Gaillot <kgaillot at redhat.com 
>>>> <mailto:kgaillot at redhat.com>> wrote:
>>>> 
>>>> On 03/08/2016 11:38 PM, Сергей Филатов wrote:
>>>>> ssh -p 3121 compute-1
>>>>> ssh_exchange_identification: read: Connection reset by peer
>>>>> 
>>>>> That’s what I get in /var/log/pacemaker.log after restarting pacemaker_remote:
>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: crm_signal_dispatch:  Invoking handler for signal 15: 
>>>>> Terminated
>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: lrmd_shutdown:        Terminating with  0 clients
>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_withdraw:  withdrawing server sockets
>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: crm_xml_cleanup:      Cleaning up memory from libxml2
>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: crm_log_init:         Changed active directory to 
>>>>> /var/lib/heartbeat/cores/root
>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: lrmd
>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:   notice: lrmd_init_remote_tls_server:  Starting a tls listener 
>>>>> on port 3121.
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:   notice: bind_and_listen:      Listening on address ::
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: cib_ro
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: cib_rw
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: cib_shm
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: attrd
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: stonith-ng
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: crmd
>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>      lrmd:     info: main:         Starting
>>>> 
>>>> It looks like the cluster is not even trying to connect to the remote
>>>> node. pacemaker_remote here is binding only to IPv6, so the cluster will
>>>> need to contact it on that address.
>>>> 
>>>> What is your ocf:pacemaker:remote resource configuration?
>>>> 
>>>> Check your cluster node logs for the start action -- if your resource is
>>>> named R, the start action will be R_start_0. There will be two nodes of
>>>> interest: the node assigned the remote node resource, and the DC.
>>>> 
>>>>> I got only pacemaker-remote resource-agents pcs installed, so no 
>>>>> /etc/default/pacemaker file on remote node
>>>>> selinux is disabled and I specifically opened firewall on 2224, 3121 and 
>>>>> 21064 tcp and 5405 udp
>>>>> 
>>>>>> On 08 Mar 2016, at 08:51, Ken Gaillot <kgaillot at redhat.com 
>>>>>> <mailto:kgaillot at redhat.com>> wrote:
>>>>>> 
>>>>>> On 03/07/2016 09:10 PM, Сергей Филатов wrote:
>>>>>>> Thanks for an answer. Turned out the problem was not in ipv6.
>>>>>>> Remote node is listening on 3121 port and it’s name is resolving fine.
>>>>>>> Got authkey file at /etc/pacemaker on both remote and cluster nodes.
>>>>>>> What can I check in addition? Is there any walkthrough for ubuntu?
>>>>>> 
>>>>>> Nothing specific to ubuntu, but there's not much distro-specific to it.
>>>>>> 
>>>>>> If you "ssh -p 3121" to the remote node from a cluster node, what do you
>>>>>> get?
>>>>>> 
>>>>>> pacemaker_remote will use the usual log settings for pacemaker (probably
>>>>>> /var/log/pacemaker.log, probably configured in /etc/default/pacemaker on
>>>>>> ubuntu). You should see "New remote connection" in the remote node's log
>>>>>> when the cluster tries to connect, and "LRMD client connection
>>>>>> established" if it's successful.
>>>>>> 
>>>>>> As always, check for firewall and SELinux issues.
>>>>>> 
>>>>>>> 
>>>>>>>> On 07 Mar 2016, at 09:40, Ken Gaillot <kgaillot at redhat.com 
>>>>>>>> <mailto:kgaillot at redhat.com>> wrote:
>>>>>>>> 
>>>>>>>> On 03/06/2016 07:43 PM, Сергей Филатов wrote:
>>>>>>>>> Hi,
>>>>>>>>> I’m trying to set up pacemaker_remote resource on ubuntu 14.04
>>>>>>>>> I followed "remote node walkthrough” guide 
>>>>>>>>> (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280 
>>>>>>>>> <http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280>)
>>>>>>>>> After creating ocf:pacemaker:remote resource on cluster node, remote 
>>>>>>>>> node doesn’t show up as online.
>>>>>>>>> I guess I need to configure remote agent to listen on ipv4, where can I 
>>>>>>>>> configure it?
>>>>>>>>> Or is there any other steps to set up remote node besides the ones 
>>>>>>>>> mentioned in guide?
>>>>>>>>> tcp6       0      0 :::3121                 :::* 
>>>>>>>>>                   LISTEN      21620/pacemaker_rem off (0.00/0/0)
>>>>>>>>> 
>>>>>>>>> pacemaker and pacemaker_remote are 1.12 version
>>>>>>>> 
>>>>>>>> 
>>>>>>>> pacemaker_remote will try to bind to IPv6 addresses first, and only if
>>>>>>>> that fails, will it bind to IPv4. There is no way to configure this
>>>>>>>> behavior currently, though it obviously would be nice to have.
>>>>>>>> 
>>>>>>>> The only workarounds I can think of are to make IPv6 connections work
>>>>>>>> between the cluster and the remote node, or disable IPv6 on the remote
>>>>>>>> node. Using IPv6, there could be an issue if your name resolution
>>>>>>>> returns both IPv4 and IPv6 addresses for the remote host; you could
>>>>>>>> potentially work around that by adding an IPv6-only name for it, and
>>>>>>>> using that as the server option to the remote resource.
>>> 
>> 
> 




More information about the Users mailing list