[Pacemaker] advisory ordering question

Gianluca Cecchi gianluca.cecchi at gmail.com
Thu May 20 12:09:01 EDT 2010


Hello,
manual for 1.0 (and 1.1) reports this for Advisory Ordering:

On the other-hand, when score="0" is specified for a constraint, the
constraint is considered optional and only has an effect when both resources
are stopping and or starting. Any change in state by the first resource will
have no effect on the then resource.

(there is also a link to a
http://www.clusterlabs.org/mediawiki/images/d/d6/Ordering_Explained.pdf to
go deeper with constraints, but it seems broken right now...)

Is this also true for order defined between a group and a clone and not
between resources?
Because I have this config

order apache_after_nfsd 0: nfs-group apache_clone

where

group nfs-group lv_drbd0 ClusterIP NfsFS nfssrv \
meta target-role="Started"

group apache_group nfsclient apache \
meta target-role="Started"

clone apache_clone apache_group \
meta target-role="Started"

And when I have both nodes up but with corosync stoppped on both and I start
corosync on one node, I see in logs that:
- inside nfs-group the lv_drbd0 (linbit drbd resource) is just promoted but
the following components (nfssrv in particular) have not started yet
- the nfsclient part of apache_clone tries to start, but fails because the
nfssrv is not in place yet

I get the same problem if I change into
order apache_after_nfsd 0: nfssrv apache_clone

So I presume the problem could be caused by the fact that the second part is
a clone and not a resource? or a bug?
I can eventually send the whole config.

Setting a value different from 0 for the interval parameter of op start for
nfsclient doesn't make sense, correct? What would it determine?
A start every x seconds of the resource?

At the end of the process I have:
[root at webtest1 ]# crm_mon -fr1
============
Last updated: Thu May 20 17:58:38 2010
Stack: openais
Current DC: webtest1. - partition WITHOUT quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ webtest1. ]
OFFLINE: [ webtest2. ]

Full list of resources:

 Master/Slave Set: NfsData
     Masters: [ webtest1. ]
     Stopped: [ nfsdrbd:1 ]
 Resource Group: nfs-group
     lv_nfsdata_drbd    (ocf::heartbeat:LVM):   Started webtest1.
     NfsFS      (ocf::heartbeat:Filesystem):    Started webtest1.
     VIPlbtest  (ocf::heartbeat:IPaddr2):       Started webtest1.
     nfssrv     (ocf::heartbeat:nfsserver):     Started webtest1.
 Clone Set: cl-pinggw
     Started: [ webtest1. ]
     Stopped: [ pinggw:1 ]
 Clone Set: apache_clone
     Stopped: [ apache_group:0 apache_group:1 ]

Migration summary:
* Node webtest1.:  pingd=200
   nfsclient:0: migration-threshold=1000000 fail-count=1000000

Failed actions:
    nfsclient:0_start_0 (node=webtest1., call=15, rc=1, status=complete):
unknown error


Example logs for the second case:


May 20 17:33:55 webtest1 pengine: [14080]: info: determine_online_status:
Node webtest1. is online
May 20 17:33:55 webtest1 pengine: [14080]: notice: clone_print:
 Master/Slave Set: NfsData
May 20 17:33:55 webtest1 pengine: [14080]: notice: short_print:
 Stopped: [ nfsdrbd:0 nfsdrbd:1 ]
May 20 17:33:55 webtest1 pengine: [14080]: notice: group_print:  Resource
Group: nfs-group
May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print:
 lv_nfsdata_drbd  (ocf::heartbeat:LVM):   Stopped
May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print:      NfsFS
   (ocf::heartbeat:Filesystem):    Stopped
May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print:
 VIPlbtest        (ocf::heartbeat:IPaddr2):       Stopped
May 20 17:33:55 webtest1 pengine: [14080]: notice: native_print:      nfssrv
  (ocf::heartbeat:nfsserver):     Stopped
...
May 20 17:33:55 webtest1 pengine: [14080]: notice: clone_print:  Clone Set:
apache_clone
May 20 17:33:55 webtest1 pengine: [14080]: notice: short_print:
 Stopped: [ apache_group:0 apache_group:1 ]
...
May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start
nfsdrbd:0 (webtest1.)
...
May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start
nfsclient:0       (webtest1.)
May 20 17:33:55 webtest1 pengine: [14080]: notice: LogActions: Start
apache:0      (webtest1.)
...
May 20 17:33:57 webtest1 kernel: block drbd0: Starting worker thread (from
cqueue/0 [68])
May 20 17:33:57 webtest1 kernel: block drbd0: disk( Diskless -> Attaching )
May 20 17:33:57 webtest1 kernel: block drbd0: Found 4 transactions (7 active
extents) in activity log.
May 20 17:33:57 webtest1 kernel: block drbd0: Method to ensure write
ordering: barrier
May 20 17:33:57 webtest1 kernel: block drbd0: max_segment_size ( = BIO size
) = 32768
May 20 17:33:57 webtest1 kernel: block drbd0: drbd_bm_resize called with
capacity == 8388280
May 20 17:33:57 webtest1 kernel: block drbd0: resync bitmap: bits=1048535
words=32768
May 20 17:33:57 webtest1 kernel: block drbd0: size = 4096 MB (4194140 KB)
May 20 17:33:57 webtest1 kernel: block drbd0: recounting of set bits took
additional 0 jiffies
May 20 17:33:57 webtest1 kernel: block drbd0: 144 KB (36 bits) marked
out-of-sync by on disk bit-map.
May 20 17:33:57 webtest1 kernel: block drbd0: disk( Attaching -> UpToDate )
pdsk( DUnknown -> Outdated )
May 20 17:33:57 webtest1 kernel: block drbd0: conn( StandAlone ->
Unconnected )
May 20 17:33:57 webtest1 kernel: block drbd0: Starting receiver thread (from
drbd0_worker [14378])
May 20 17:33:57 webtest1 kernel: block drbd0: receiver (re)started
May 20 17:33:57 webtest1 kernel: block drbd0: conn( Unconnected ->
WFConnection )
May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output:
(nfsdrbd:0:start:stdout)
May 20 17:33:57 webtest1 attrd: [14079]: info: attrd_trigger_update: Sending
flush op to all hosts for: master-nfsdrbd:0 (10000)
May 20 17:33:57 webtest1 attrd: [14079]: info: attrd_perform_update: Sent
update 11: master-nfsdrbd:0=10000
May 20 17:33:57 webtest1 crmd: [14081]: info: abort_transition_graph:
te_update_diff:146 - Triggered transition abort (complete=0,
tag=transient_attributes, id=webtest1., magic=NA, cib=0.407.11) : Transient
attribute: update
May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output:
(nfsdrbd:0:start:stdout)
May 20 17:33:57 webtest1 crmd: [14081]: info: process_lrm_event: LRM
operation nfsdrbd:0_start_0 (call=10, rc=0, cib-update=37, confirmed=true)
ok
May 20 17:33:57 webtest1 crmd: [14081]: info: match_graph_event: Action
nfsdrbd:0_start_0 (12) confirmed on webtest1. (rc=0)
May 20 17:33:57 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 15 fired and confirmed
May 20 17:33:57 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 18 fired and confirmed
May 20 17:33:57 webtest1 crmd: [14081]: info: te_rsc_command: Initiating
action 90: notify nfsdrbd:0_post_notify_start_0 on webtest1. (local)
May 20 17:33:57 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing
key=90:1:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 )
May 20 17:33:57 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:12: notify
May 20 17:33:57 webtest1 lrmd: [14078]: info: RA output:
(nfsdrbd:0:notify:stdout)
...
May 20 17:34:01 webtest1 pengine: [14080]: info: master_color: Promoting
nfsdrbd:0 (Slave webtest1.)
May 20 17:34:01 webtest1 pengine: [14080]: info: master_color: NfsData:
Promoted 1 instances of a possible 1 to master
...
May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating
action 85: notify nfsdrbd:0_pre_notify_promote_0 on webtest1. (local)
May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing
key=85:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 )
May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:14: notify
May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 47 fired and confirmed
May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating
action 43: start nfsclient:0_start_0 on webtest1. (local)
May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing
key=43:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsclient:0_start_0 )
May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsclient:0:15: start
May 20 17:34:01 webtest1 crmd: [14081]: info: process_lrm_event: LRM
operation nfsdrbd:0_notify_0 (call=14, rc=0, cib-update=41, confirmed=true)
ok
May 20 17:34:01 webtest1 crmd: [14081]: info: match_graph_event: Action
nfsdrbd:0_pre_notify_promote_0 (85) confirmed on webtest1. (rc=0)
May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 23 fired and confirmed
...
May 20 17:34:01 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 20 fired and confirmed
May 20 17:34:01 webtest1 crmd: [14081]: info: te_rsc_command: Initiating
action 7: promote nfsdrbd:0_promote_0 on webtest1. (local)
May 20 17:34:01 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing
key=7:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_promote_0 )
May 20 17:34:01 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:16: promote
May 20 17:34:02 webtest1 kernel: block drbd0: role( Secondary -> Primary )
May 20 17:34:02 webtest1 lrmd: [14078]: info: RA output:
(nfsdrbd:0:promote:stdout)
May 20 17:34:02 webtest1 crmd: [14081]: info: process_lrm_event: LRM
operation nfsdrbd:0_promote_0 (call=16, rc=0, cib-update=42, confirmed=true)
ok
May 20 17:34:02 webtest1 crmd: [14081]: info: match_graph_event: Action
nfsdrbd:0_promote_0 (7) confirmed on webtest1. (rc=0)
May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 21 fired and confirmed
May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 24 fired and confirmed
May 20 17:34:02 webtest1 crmd: [14081]: info: te_rsc_command: Initiating
action 86: notify nfsdrbd:0_post_notify_promote_0 on webtest1. (local)
May 20 17:34:02 webtest1 crmd: [14081]: info: do_lrm_rsc_op: Performing
key=86:2:0:bf5161a2-5240-4aaf-bc7d-5f54044f5bb6 op=nfsdrbd:0_notify_0 )
May 20 17:34:02 webtest1 lrmd: [14078]: info: rsc:nfsdrbd:0:17: notify
May 20 17:34:02 webtest1 lrmd: [14078]: info: RA output:
(nfsdrbd:0:notify:stdout)
May 20 17:34:02 webtest1 crmd: [14081]: info: process_lrm_event: LRM
operation nfsdrbd:0_notify_0 (call=17, rc=0, cib-update=43, confirmed=true)
ok
May 20 17:34:02 webtest1 crmd: [14081]: info: match_graph_event: Action
nfsdrbd:0_post_notify_promote_0 (86) confirmed on webtest1. (rc=0)
May 20 17:34:02 webtest1 crmd: [14081]: info: te_pseudo_action: Pseudo
action 25 fired and confirmed
May 20 17:34:02 webtest1 Filesystem[14438]: INFO: Running start for
viplbtest.:/nfsdata/web on /usr/local/data
May 20 17:34:06 webtest1 crmd: [14081]: info: process_lrm_event: LRM
operation pinggw:0_monitor_10000 (call=13, rc=0, cib-update=44,
confirmed=false) ok
May 20 17:34:06 webtest1 crmd: [14081]: info: match_graph_event: Action
pinggw:0_monitor_10000 (38) confirmed on webtest1. (rc=0)
May 20 17:34:11 webtest1 attrd: [14079]: info: attrd_trigger_update: Sending
flush op to all hosts for: pingd (200)
May 20 17:34:11 webtest1 attrd: [14079]: info: attrd_perform_update: Sent
update 14: pingd=200
May 20 17:34:11 webtest1 crmd: [14081]: info: abort_transition_graph:
te_update_diff:146 - Triggered transition abort (complete=0,
tag=transient_attributes, id=webtest1., magic=NA, cib=0.407.19) : Transient
attribute: update
May 20 17:34:11 webtest1 crmd: [14081]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
May 20 17:34:11 webtest1 crmd: [14081]: info: update_abort_priority: Abort
action done superceeded by restart
May 20 17:34:14 webtest1 lrmd: [14078]: info: RA output:
(nfsclient:0:start:stderr) mount: mount to NFS server 'viplbtest.' failed:
System Error: No route to host.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100520/ca636be0/attachment.html>


More information about the Pacemaker mailing list