[Pacemaker] clone ip definition and location stops my resources...

Mon May 10 13:34:40 UTC 2010

Hello,
using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way
ping clone works to setup monitoring of gw... even after reading docs...

As soon as I run:
crm configure location nfs-group-with-pinggw nfs-group rule -inf:
not_defined pinggw or pinggw lte 0

the resources go stopped and don't re-start....

Then, as soon as I run
crm configure delete nfs-group-with-pinggw

the resources of the group start again...

config (part of it, actually) I try to apply is this:
group nfs-group ClusterIP lv_drbd0 NfsFS nfssrv \
meta target-role="Started"
ms NfsData nfsdrbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true"
primitive pinggw ocf:pacemaker:ping \
params host_list="192.168.101.1" multiplier="100" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
clone cl-pinggw pinggw \
meta globally-unique="false"
location nfs-group-with-pinggw nfs-group \
rule $id="nfs-group-with-pinggw-rule" -inf: not_defined pinggw or pinggw lte
0

Is the location constraint to be done with ping resource or with its clone?
Is it a cause of the problem that I have also defined an nfs client on the
other node with:

primitive nfsclient ocf:heartbeat:Filesystem \
params device="nfsha:/nfsdata/web" directory="/nfsdata/web" fstype="nfs" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
colocation nfsclient_not_on_nfs-group -inf: nfs-group nfsclient
order nfsclient_after_nfs-group inf: nfs-group nfsclient

Thansk in advance,
Gianluca

>From messages of the server running the nfs-group at that moment:
May 10 15:18:27 ha1 cibadmin: [29478]: info: Invoked: cibadmin -Ql
May 10 15:18:27 ha1 cibadmin: [29479]: info: Invoked: cibadmin -Ql
May 10 15:18:28 ha1 crm_shadow: [29536]: info: Invoked: crm_shadow -c
__crmshell.29455
May 10 15:18:28 ha1 cibadmin: [29537]: info: Invoked: cibadmin -p -U
May 10 15:18:28 ha1 crm_shadow: [29539]: info: Invoked: crm_shadow -C
__crmshell.29455 --force
May 10 15:18:28 ha1 cib: [8470]: info: cib_replace_notify: Replaced:
0.267.14 -> 0.269.1 from <null>
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: - <cib
epoch="267" num_updates="14" admin_epoch="0" />
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + <cib
epoch="269" num_updates="1" admin_epoch="0" >
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
<configuration >
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
<constraints >
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
<rsc_location id="nfs-group-with-pinggw" rsc="nfs-group"
__crm_diff_marker__="added:top" >
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
<rule boolean-op="or" id="nfs-group-with-pinggw-rule" score="-INFINITY" >
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
  <expression attribute="pinggw" id="nfs-group-with-pinggw-expression"
operation="not_defined" />
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
  <expression attribute="pinggw" id="nfs-group-with-pinggw-expression-0"
operation="lte" value="0" />
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
</rule>
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
</rsc_location>
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
</constraints>
May 10 15:18:28 ha1 crmd: [8474]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
May 10 15:18:28 ha1 attrd: [8472]: info: do_cib_replaced: Sending full
refresh
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
</configuration>
May 10 15:18:28 ha1 crmd: [8474]: info: need_abort: Aborting on change to
epoch
May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: master-nfsdrbd:0 (10000)
May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + </cib>
May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_replace for section 'all' (origin=local/crm_shadow/2,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/203,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 crmd: [8474]: info: do_pe_invoke: Query 205: Requesting
the current CIB: S_POLICY_ENGINE
May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: probe_complete (true)
May 10 15:18:28 ha1 cib: [29541]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-47.raw
May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_cib_replaced ]
May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: terminate (<null>)
May 10 15:18:28 ha1 cib: [29541]: info: write_cib_contents: Wrote version
0.269.0 of the CIB to disk (digest: 8f92c20ff8f96cde0fa0c75cd3207caa)
May 10 15:18:28 ha1 crmd: [8474]: info: update_dc: Unset DC ha1
May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: master-nfsdrbd:1 (<null>)
May 10 15:18:28 ha1 cib: [29541]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.FPnpLz (digest:
/var/lib/heartbeat/crm/cib.EsRWbp)
May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: shutdown (<null>)
May 10 15:18:28 ha1 crmd: [8474]: info: do_dc_takeover: Taking over DC
status for this partition
May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: pingd (100)
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_readwrite: We are now in
R/O mode
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_slave_all for section 'all' (origin=local/crmd/206,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_readwrite: We are now in
R/W mode
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_master for section 'all' (origin=local/crmd/207,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/208,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/210,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/212,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 crmd: [8474]: info: do_dc_join_offer_all: join-6:
Waiting on 2 outstanding join acks
May 10 15:18:28 ha1 crmd: [8474]: info: ais_dispatch: Membership 180: quorum
retained
May 10 15:18:28 ha1 crmd: [8474]: info: crm_ais_dispatch: Setting expected
votes to 2
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/215,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 crmd: [8474]: info: config_query_callback: Checking for
expired actions every 900000ms
May 10 15:18:28 ha1 crmd: [8474]: info: config_query_callback: Sending
expected-votes=2 to corosync
May 10 15:18:28 ha1 crmd: [8474]: info: update_dc: Set DC to ha1 (3.0.1)
May 10 15:18:28 ha1 crmd: [8474]: info: ais_dispatch: Membership 180: quorum
retained
May 10 15:18:28 ha1 crm_shadow: [29542]: info: Invoked: crm_shadow -D
__crmshell.29455 --force
May 10 15:18:28 ha1 crmd: [8474]: info: crm_ais_dispatch: Setting expected
votes to 2
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/218,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: All 2 cluster
nodes responded to the join offer.
May 10 15:18:28 ha1 crmd: [8474]: info: do_dc_join_finalize: join-6: Syncing
the CIB from ha1 to the rest of the cluster
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_sync for section 'all' (origin=local/crmd/219,
version=0.269.1): ok (rc=0)
May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/220,
version=0.269.1): ok (rc=0)
May 10 15:18:29 ha1 crmd: [8474]: info: do_dc_join_ack: join-6: Updating
node state to member for ha2
May 10 15:18:29 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/221,
version=0.269.1): ok (rc=0)
May 10 15:18:29 ha1 crmd: [8474]: info: do_dc_join_ack: join-6: Updating
node state to member for ha1
May 10 15:18:29 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='ha2']/lrm
(origin=local/crmd/222, version=0.269.2): ok (rc=0)
May 10 15:18:29 ha1 crmd: [8474]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='ha2']/lrm": ok (rc=0)
May 10 15:18:29 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='ha1']/lrm
(origin=local/crmd/224, version=0.269.4): ok (rc=0)
May 10 15:18:29 ha1 crmd: [8474]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
May 10 15:18:29 ha1 crmd: [8474]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
May 10 15:18:29 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/226,
version=0.269.5): ok (rc=0)
May 10 15:18:29 ha1 crmd: [8474]: info: do_dc_join_final: Ensuring DC,
quorum and node attributes are up-to-date
May 10 15:18:29 ha1 crmd: [8474]: info: crm_update_quorum: Updating quorum
status to true (call=228)
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_local_callback: Sending full
refresh (origin=crmd)
May 10 15:18:29 ha1 cib: [8470]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/228,
version=0.269.5): ok (rc=0)
May 10 15:18:29 ha1 crmd: [8474]: info: abort_transition_graph:
do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: master-nfsdrbd:0 (10000)
May 10 15:18:29 ha1 crmd: [8474]: info: do_pe_invoke: Query 229: Requesting
the current CIB: S_POLICY_ENGINE
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: probe_complete (true)
May 10 15:18:29 ha1 crmd: [8474]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='ha1']/lrm": ok (rc=0)
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: terminate (<null>)
May 10 15:18:29 ha1 crmd: [8474]: info: te_update_diff: Detected LRM refresh
- 8 resources updated: Skipping all resource events
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: master-nfsdrbd:1 (<null>)
May 10 15:18:29 ha1 crmd: [8474]: info: abort_transition_graph:
te_update_diff:227 - Triggered transition abort (complete=1, tag=diff,
id=(null), magic=NA, cib=0.269.5) : LRM Refresh
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: shutdown (<null>)
May 10 15:18:29 ha1 crmd: [8474]: info: do_pe_invoke_callback: Invoking the
PE: query=229, ref=pe_calc-dc-1273497509-143, seq=180, quorate=1
May 10 15:18:29 ha1 pengine: [8473]: notice: unpack_config: On loss of CCM
Quorum: Ignore
May 10 15:18:29 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush
op to all hosts for: pingd (100)
May 10 15:18:29 ha1 crmd: [8474]: info: do_pe_invoke: Query 230: Requesting
the current CIB: S_POLICY_ENGINE
May 10 15:18:29 ha1 pengine: [8473]: info: unpack_config: Node scores: 'red'
= -INFINITY, 'yellow' = 0, 'green' = 0
May 10 15:18:29 ha1 crmd: [8474]: info: do_pe_invoke_callback: Invoking the
PE: query=230, ref=pe_calc-dc-1273497509-144, seq=180, quorate=1
May 10 15:18:29 ha1 pengine: [8473]: info: determine_online_status: Node ha1
is online
May 10 15:18:29 ha1 pengine: [8473]: notice: unpack_rsc_op: Operation
nfsdrbd:0_monitor_0 found resource nfsdrbd:0 active in master mode on ha1
May 10 15:18:29 ha1 pengine: [8473]: info: determine_online_status: Node ha2
is online
May 10 15:18:29 ha1 pengine: [8473]: notice: native_print: SitoWeb
 (ocf::heartbeat:apache):        Started ha1
May 10 15:18:29 ha1 pengine: [8473]: notice: clone_print:  Master/Slave Set:
NfsData
May 10 15:18:29 ha1 pengine: [8473]: notice: short_print:      Masters: [
ha1 ]
May 10 15:18:29 ha1 pengine: [8473]: notice: short_print:      Slaves: [ ha2
]
May 10 15:18:29 ha1 pengine: [8473]: notice: group_print:  Resource Group:
nfs-group
May 10 15:18:29 ha1 pengine: [8473]: notice: native_print:      ClusterIP
    (ocf::heartbeat:IPaddr2):       Started ha1
May 10 15:18:29 ha1 pengine: [8473]: notice: native_print:      lv_drbd0
   (ocf::heartbeat:LVM):   Started ha1
May 10 15:18:29 ha1 pengine: [8473]: notice: native_print:      NfsFS
(ocf::heartbeat:Filesystem):    Started ha1
May 10 15:18:29 ha1 pengine: [8473]: notice: native_print:      nfssrv
 (ocf::heartbeat:nfsserver):     Started ha1
May 10 15:18:29 ha1 cibadmin: [29543]: info: Invoked: cibadmin -Ql
May 10 15:18:29 ha1 pengine: [8473]: notice: native_print: nfsclient
 (ocf::heartbeat:Filesystem):    Started ha2
May 10 15:18:29 ha1 pengine: [8473]: notice: clone_print:  Clone Set:
cl-pinggw
May 10 15:18:29 ha1 pengine: [8473]: notice: short_print:      Started: [
ha1 ha2 ]
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: NfsData:
Rolling back scores from ClusterIP
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: NfsData:
Rolling back scores from ClusterIP
May 10 15:18:29 ha1 pengine: [8473]: info: master_color: Promoting nfsdrbd:0
(Master ha1)
May 10 15:18:29 ha1 pengine: [8473]: info: master_color: NfsData: Promoted 1
instances of a possible 1 to master
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: nfsclient:
Rolling back scores from ClusterIP
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: nfsclient:
Rolling back scores from lv_drbd0
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: nfsclient:
Rolling back scores from NfsFS
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: nfsclient:
Rolling back scores from ClusterIP
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: ClusterIP:
Rolling back scores from lv_drbd0
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: ClusterIP:
Rolling back scores from SitoWeb
May 10 15:18:29 ha1 pengine: [8473]: WARN: native_color: Resource ClusterIP
cannot run anywhere
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: lv_drbd0:
Rolling back scores from NfsFS
May 10 15:18:29 ha1 pengine: [8473]: WARN: native_color: Resource lv_drbd0
cannot run anywhere
May 10 15:18:29 ha1 pengine: [8473]: info: native_merge_weights: NfsFS:
Rolling back scores from nfssrv
May 10 15:18:29 ha1 pengine: [8473]: WARN: native_color: Resource NfsFS
cannot run anywhere
May 10 15:18:29 ha1 pengine: [8473]: WARN: native_color: Resource nfssrv
cannot run anywhere
May 10 15:18:29 ha1 pengine: [8473]: WARN: native_color: Resource SitoWeb
cannot run anywhere
May 10 15:18:29 ha1 pengine: [8473]: info: master_color: Promoting nfsdrbd:0
(Master ha1)
May 10 15:18:29 ha1 pengine: [8473]: info: master_color: NfsData: Promoted 1
instances of a possible 1 to master
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Stop resource
SitoWeb  (ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Leave resource
nfsdrbd:0       (Master ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Leave resource
nfsdrbd:1       (Slave ha2)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Stop resource
ClusterIP        (ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Stop resource
lv_drbd0 (ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Stop resource NfsFS
   (ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Stop resource
nfssrv   (ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Stop resource
nfsclient        (Started ha2)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Leave resource
pinggw:0        (Started ha1)
May 10 15:18:29 ha1 pengine: [8473]: notice: LogActions: Leave resource
pinggw:1        (Started ha2)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100510/65a655cf/attachment-0001.html>