[ClusterLabs] systemd RA start/stop delays

Thu Aug 18 01:17:19 UTC 2016

Hi

I am having a problem with a simple Active/Passive cluster which consists
in the next configuration

Cluster Name: kamcluster
Corosync Nodes:
 kam1vs3 kam2vs3
Pacemaker Nodes:
 kam1vs3 kam2vs3

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.0.1.206 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
              monitor interval=10s (ClusterIP-monitor-interval-10s)
 Resource: ClusterIP2 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.0.1.207 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ClusterIP2-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP2-stop-interval-0s)
              monitor interval=10s (ClusterIP2-monitor-interval-10s)
 Resource: rtpproxycluster (class=systemd type=rtpproxy)
  Operations: monitor interval=10s (rtpproxycluster-monitor-interval-10s)
              stop interval=0s on-fail=block
(rtpproxycluster-stop-interval-0s)
 Resource: kamailioetcfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/etc/kamailio fstype=ext4
  Operations: start interval=0s timeout=60 (kamailioetcfs-start-interval-0s)
              monitor interval=10s on-fail=fence
(kamailioetcfs-monitor-interval-1
                                           0s)
              stop interval=0s on-fail=fence
(kamailioetcfs-stop-interval-0s)
 Clone: fence_kam2_xvm-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: fence_kam2_xvm (class=stonith type=fence_xvm)
   Attributes: port=tegamjg_kam2 pcmk_host_list=kam2vs3
   Operations: monitor interval=60s (fence_kam2_xvm-monitor-interval-60s)
 Master: kamailioetcclone
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=t
                   rue on-fail=fence
  Resource: kamailioetc (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=kamailioetc
   Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s)
               promote interval=0s on-fail=fence
(kamailioetc-promote-interval-0s)
               demote interval=0s on-fail=fence
(kamailioetc-demote-interval-0s)
               stop interval=0s on-fail=fence (kamailioetc-stop-interval-0s)
               monitor interval=10s (kamailioetc-monitor-interval-10s)
 Clone: fence_kam1_xvm-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: fence_kam1_xvm (class=stonith type=fence_xvm)
   Attributes: port=tegamjg_kam1 pcmk_host_list=kam1vs3
   Operations: monitor interval=60s (fence_kam1_xvm-monitor-interval-60s)
 Resource: kamailiocluster (class=ocf provider=heartbeat type=kamailio)
  Attributes: listen_address=10.0.1.206 conffile=/etc/kamailio/kamailio.cfg
pidfil
                 e=/var/run/kamailio.pid monitoring_ip=10.0.1.206
monitoring_ip2=10.0.1.207 port=50
                                           60 proto=udp
kamctlrc=/etc/kamailio/kamctlrc shmem=128 pkg=8
  Meta Attrs: target-role=Stopped
  Operations: start interval=0s timeout=60
(kamailiocluster-start-interval-0s)
              stop interval=0s timeout=30 (kamailiocluster-stop-interval-0s)
              monitor interval=5s (kamailiocluster-monitor-interval-5s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start fence_kam1_xvm-clone then start fence_kam2_xvm-clone
(kind:Mandatory) (id:

 order-fence_kam1_xvm-clone-fence_kam2_xvm-clone-mandatory)
  start fence_kam2_xvm-clone then promote kamailioetcclone (kind:Mandatory)
(id:or
                 der-fence_kam2_xvm-clone-kamailioetcclone-mandatory)
  promote kamailioetcclone then start kamailioetcfs (kind:Optional)
(id:order-kama
                         ilioetcclone-kamailioetcfs-Optional)
  Resource Sets:
    set kamailioetcfs sequential=true (id:pcs_rsc_set_kamailioetcfs) set
ClusterIP
                    ClusterIP2 sequential=false
(id:pcs_rsc_set_ClusterIP_ClusterIP2) set rtpproxyclu
                                                               ster
kamailiocluster sequential=true
(id:pcs_rsc_set_rtpproxycluster_kamailioclust
                                                       er)
(id:pcs_rsc_order_set_kamailioetcfs_set_ClusterIP_ClusterIP2_set_rtpproxyclust

         er_kamailiocluster)
Colocation Constraints:
  rtpproxycluster with ClusterIP2 (score:INFINITY)
(id:colocation-rtpproxycluster-
                                         ClusterIP2-INFINITY)
  ClusterIP2 with ClusterIP (score:INFINITY)
(id:colocation-ClusterIP2-ClusterIP-I
                                               NFINITY)
  ClusterIP with kamailioetcfs (score:INFINITY)
(id:colocation-ClusterIP-kamailioe
                                             tcfs-INFINITY)
  kamailioetcfs with kamailioetcclone (score:INFINITY)
(with-rsc-role:Master) (id:

 colocation-kamailioetcfs-kamailioetcclone-INFINITY)
  fence_kam2_xvm-clone with fence_kam1_xvm-clone (score:INFINITY)
(id:colocation-f

 ence_kam2_xvm-clone-fence_kam1_xvm-clone-INFINITY)
  kamailioetcclone with fence_kam2_xvm-clone (score:INFINITY)
(id:colocation-kamai
                               lioetcclone-fence_kam2_xvm-clone-INFINITY)
  kamailiocluster with rtpproxycluster (score:INFINITY)
(id:colocation-kamailioclu
                                     ster-rtpproxycluster-INFINITY)

Resources Defaults:
 migration-threshold: 2
 failure-timeout: 10m
 resource-stickiness: 200
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: kamcluster
 dc-version: 1.1.13-10.el7_2.2-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1471479940
 no-quorum-policy: ignore
 start-failure-is-fatal: false
 stonith-action: reboot
 stonith-enabled: true

Now my problem is when the rtpproxy systemd resource starts/stops, i am
experiencing 2 seconds delay between the call from the start/stop signal is
sent to systemd and the confirmation and interpretation of the action to
the crmd. I am pretty sure that the initiation of this service doesnt take
even half of that time to start/stop. i did the same thing with a dummy
service  as a systemd resource and it als takes 2 seconds to start/stop the
resource. Also, this is not happening in any of the ocf resources. I am
putting what i see in the logs about this behaviour

This is after doing "pcs resource restart rtpproxycluster"

Aug 17 20:59:18 [13187] kam1       crmd:   notice: te_rsc_command:
 Initiating action 14: stop rtpproxycluster_stop_0 on kam1vs3 (local)
Aug 17 20:59:18 [13184] kam1       lrmd:     info: cancel_recurring_action:
    Cancelling systemd operation rtpproxycluster_status_10000
Aug 17 20:59:18 [13187] kam1       crmd:     info: do_lrm_rsc_op:
Performing key=14:30:0:8a202722-ece2-4617-b26e-8d4aa5f3522b
op=rtpproxycluster_stop_0
Aug 17 20:59:18 [13184] kam1       lrmd:     info: log_execute: executing -
rsc:rtpproxycluster action:stop call_id:106
Aug 17 20:59:18 [13187] kam1       crmd:     info: process_lrm_event:
Operation rtpproxycluster_monitor_10000: Cancelled (node=kam1vs3, call=104,
confirmed=true)
Aug 17 20:59:18 [13184] kam1       lrmd:     info: systemd_exec_result:
Call to stop passed: /org/freedesktop/systemd1/job/8302
Aug 17 20:59:20 [13187] kam1       crmd:   notice: process_lrm_event:
Operation rtpproxycluster_stop_0: ok (node=kam1vs3, call=106, rc=0,
cib-update=134, confirmed=true)
Aug 17 20:59:20 [13182] kam1        cib:     info: cib_perform_op:      +
 /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='rtpproxycluster']/lrm_rsc_op[@id='rtpproxycluster_last_0']:
 @operation_key=rtpproxycluster_stop_0, @operation=stop,
@transition-key=14:30:0:8a202722-ece2-4617-b26e-8d4aa5f3522b,
@transition-magic=0:0;14:30:0:8a202722-ece2-4617-b26e-8d4aa5f3522b,
@call-id=106, @last-run=1471481958, @last-rc-change=1471481958,
@exec-time=2116

Aug 17 20:59:20 [13186] kam1    pengine:     info: RecurringOp:  Start
recurring monitor (10s) for rtpproxycluster on kam1vs3
Aug 17 20:59:20 [13186] kam1    pengine:   notice: LogActions:  Start
rtpproxycluster (kam1vs3)
Aug 17 20:59:20 [13187] kam1       crmd:   notice: te_rsc_command:
 Initiating action 13: start rtpproxycluster_start_0 on kam1vs3 (local)
Aug 17 20:59:20 [13187] kam1       crmd:     info: do_lrm_rsc_op:
Performing key=13:31:0:8a202722-ece2-4617-b26e-8d4aa5f3522b
op=rtpproxycluster_start_0
Aug 17 20:59:20 [13184] kam1       lrmd:     info: log_execute: executing -
rsc:rtpproxycluster action:start call_id:107
Aug 17 20:59:21 [13184] kam1       lrmd:     info: systemd_exec_result:
Call to start passed: /org/freedesktop/systemd1/job/8303
Aug 17 20:59:23 [13187] kam1       crmd:   notice: process_lrm_event:
Operation rtpproxycluster_start_0: ok (node=kam1vs3, call=107, rc=0,
cib-update=136, confirmed=true)
Aug 17 20:59:23 [13182] kam1        cib:     info: cib_perform_op:      +
 /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='rtpproxycluster']/lrm_rsc_op[@id='rtpproxycluster_last_0']:
 @operation_key=rtpproxycluster_start_0, @operation=start,
@transition-key=13:31:0:8a202722-ece2-4617-b26e-8d4aa5f3522b,
@transition-magic=0:0;13:31:0:8a202722-ece2-4617-b26e-8d4aa5f3522b,
@call-id=107, @last-run=1471481960, @last-rc-change=1471481960,
@exec-time=2068

Why is this happening?, many of my resources depends on each other, so the
delay it takes when failover its needed is quite a lot (just 4 seconds of
delay stopping/starting just for that systemd resource.

Is this normal?, is there anyway to solve this?, Do i really need to make
an OCF resource for that service (that would take some time that i dont
really have at this moment)

Regards

Alejandro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160817/7a1089a8/attachment-0003.html>