[Pacemaker] start st-ibmrsa2 failed, because its hostlist is empty / Stonith Owner

Roberto Giordani roberto.giordani at trs.it
Tue Apr 6 17:01:31 EDT 2010


Hi all,
I'm working on a Pacemaker cluster with 4 nodes (IBM x3650) with RSA on 
each node and they are connected through a Fiber Channel switch to an 
IBM Storage with dual CTRL.
The scenario will be 4 nodes Dom0 with some Xen VMs as resources on 
Opensuse 11.2 64bit environment
This is the crm configure show output:
node1:~ # crm configure show
node node2
node node1
node node3
node node4
primitive dlm ocf:pacemaker:controld \
     op monitor interval="120s"
primitive o2cb ocf:ocfs2:o2cb \
     op monitor interval="120s"
primitive st-ibmrsa1 stonith:external/ibmrsa-telnet params 
ip_address="192.168.1.12" username="hacluster" password="Cluster" 
nodename="node1"  meta target-role="started"
primitive st-ibmrsa2 stonith:external/ibmrsa-telnet params 
ip_address="192.168.1.13" username="hacluster" password="Cluster" 
nodename="node2"  meta target-role="started"
primitive st-ibmrsa3 stonith:external/ibmrsa-telnet params 
ip_address="192.168.1.14" username="hacluster" password="Cluster" 
nodename="node3" meta target-role="started"
primitive st-ibmrsa4 stonith:external/ibmrsa-telnet params 
ip_address="192.168.1.15" username="hacluster" password="Cluster" 
nodename="node4" meta target-role="started"
clone dlm-clone dlm   meta interleave="true"
clone o2cb-clone o2cb  meta interleave="true" target-role="started"
location l-st-nodo1 st-ibmrsa1 -inf: node1
location l-st-nodo2 st-ibmrsa2 -inf: node2
location l-st-nodo3 st-ibmrsa3 -inf: node3
location l-st-nodo4 st-ibmrsa4 -inf: node4
colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
order start-o2cb-after-dlm inf: dlm-clone o2cb-clone
property $id="cib-bootstrap-options" 
dc-version="1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160" \
     expected-quorum-votes="4" last-lrm-refresh="1270572049"
rsc_defaults $id="rsc-options" resource-stickiness="100"


I've already installed a simple cluster with two nodes with tha same HW 
but with  DRBD storage, now I'm going to create a production cluster 
with a real storage.
The question are about the ibmrsa stonith resources:
*1)*I've added 4 resources external/ibmrsa-telnet resources, enabled the 
stonith, and set the location for each one, but only 3 are running, the 
last "st-ibmrsa2" didn't run
this is the log messages after "cleanup resource" command from the 
pacemaker gui and the error seems "empty hostlist" but it is wrong!

Apr  6 22:43:21 node1 mgmtd: [2992]: info: Delete fail-count for 
st-ibmrsa2 from node2
Apr  6 22:43:21 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local 
LRM refresh
Apr  6 22:43:21 node1 openais[2887]: [crm  ] ERROR: route_ais_message: 
Child 8603 spawned to record non-fatal assertion failure line 1297: dest 
 > 0 && dest < SIZEOF(pcmk_children)
Apr  6 22:43:21 node1 openais[2887]: [crm  ] ERROR: route_ais_message: 
Invalid destination: 0
Apr  6 22:43:21 node1 openais[2887]: [MAIN ] Msg[358] 
(dest=local:unknown, from=node2:crmd.4844, remote=true, size=853): 
<create_request_adv origin="send_direct_ack" t="crmd" version="3.0.1" 
subt="request" refer
Apr  6 22:43:21 node1 cib: [8604]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-19.raw
Apr  6 22:43:21 node1 cib: [8604]: info: write_cib_contents: Wrote 
version 0.343.0 of the CIB to disk (digest: 63e0b94a027daf
19a1122391cd8653b0)
Apr  6 22:43:21 node1 cib: [8604]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.N2IctU (digest: 
/var/lib/heartbeat/crm/cib.0FFVrB)
Apr  6 22:43:23 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh'] 
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr  6 22:43:23 node1 cib: [8605]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-20.raw
Apr  6 22:43:23 node1 crmd: [2991]: info: do_lrm_invoke: Removing 
resource st-ibmrsa2 from the LRM
Apr  6 22:43:23 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/status//node_state[@id='node1']//nvpair[@name='fail-count-st-ibmrsa2'] 
(/cib/status/node_state[1]/transient_attributes/instance_attributes/nvpair[5])
Apr  6 22:43:23 node1 crmd: [2991]: info: send_direct_ack: ACK'ing 
resource op st-ibmrsa2_delete_0 from mgmtd-2992: 
lrm_invoke-lrmd-1270586603-24
Apr  6 22:43:23 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh'] 
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr  6 22:43:23 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local 
LRM refresh
Apr  6 22:43:23 node1 cib: [8605]: info: write_cib_contents: Wrote 
version 0.344.0 of the CIB to disk (digest: 898ef83ec60f0b
080c67dac0b96f4247)
Apr  6 22:43:23 node1 cib: [8605]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.4yG9ST (digest: 
/var/lib/heartbeat/crm/cib.Dr1NRG)
Apr  6 22:43:23 node1 mgmtd: [2992]: info: Delete fail-count for 
st-ibmrsa2 from node1
Apr  6 22:43:23 node1 cib: [8606]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-21.raw
Apr  6 22:43:23 node1 cib: [8606]: info: write_cib_contents: Wrote 
version 0.345.0 of the CIB to disk (digest: 
0db39b0d5be55ecf9ab68fd95c0ef307)
Apr  6 22:43:23 node1 cib: [8606]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.uZ6Dz0 (digest: 
/var/lib/heartbeat/crm/cib.ZlmGIN)
Apr  6 22:43:25 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh'] 
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr  6 22:43:25 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/status//node_state[@id='node3']//nvpair[@name='fail-count-st-ibmrsa2'] 
(/cib/status/node_state[3]/transient_attributes/instance_attributes/nvpair[6])
Apr  6 22:43:25 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local 
LRM refresh
Apr  6 22:43:25 node1 openais[2887]: [crm  ] ERROR: route_ais_message: 
Child 8607 spawned to record non-fatal assertion failure line 1297: dest 
 > 0 && dest < SIZEOF(pcmk_children)
Apr  6 22:43:25 node1 openais[2887]: [crm  ] ERROR: route_ais_message: 
Invalid destination: 0
Apr  6 22:43:25 node1 openais[2887]: [MAIN ] Msg[104] 
(dest=local:unknown, from=node3:crmd.5002, remote=true, size=852): 
<create_request_adv origin="send_direct_ack" t="crmd" version="3.0.1" 
subt="request" refer
Apr  6 22:43:25 node1 mgmtd: [2992]: info: Delete fail-count for 
st-ibmrsa2 from node3
Apr  6 22:43:25 node1 cib: [8608]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-22.raw
Apr  6 22:43:25 node1 cib: [8608]: info: write_cib_contents: Wrote 
version 0.346.0 of the CIB to disk (digest: 
e4061e0e405cf035c566f53a79935212)
Apr  6 22:43:25 node1 cib: [8608]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.88fw77 (digest: 
/var/lib/heartbeat/crm/cib.SsgJA1)
Apr  6 22:43:25 node1 lrmd: [2988]: notice: lrmd_rsc_new(): No 
lrm_rprovider field in message
Apr  6 22:43:25 node1 crmd: [2991]: info: do_lrm_rsc_op: Performing 
key=13:130:7:113d7b66-f090-46d5-bb11-a1782de6fa92 op=st-ibmrsa2_monitor_0 )
Apr  6 22:43:25 node1 lrmd: [2988]: info: rsc:st-ibmrsa2: monitor
Apr  6 22:43:25 node1 crmd: [2991]: info: process_lrm_event: LRM 
operation st-ibmrsa2_monitor_0 (call=42, rc=7, cib-update=172, 
confirmed=true) complete not running
Apr  6 22:43:26 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/status//node_state[@id='node1']//nvpair[@name='probe_complete'] 
(/cib/status/node_state[1]/transient_attributes/instance_attributes/nvpair[1])
Apr  6 22:43:26 node1 crmd: [2991]: info: do_lrm_rsc_op: Performing 
key=43:130:0:113d7b66-f090-46d5-bb11-a1782de6fa92 op=st-ibmrsa2_start_0 )
Apr  6 22:43:26 node1 lrmd: [2988]: info: rsc:st-ibmrsa2: start
Apr  6 22:43:26 node1 lrmd: [8611]: info: Try to start STONITH resource 
<rsc_id=st-ibmrsa2> : Device=external/ibmrsa-telnet
Apr  6 22:43:27 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh'] 
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr  6 22:43:27 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/status//node_state[@id='node4']//nvpair[@name='fail-count-st-ibmrsa2'] 
(/cib/status/node_state[4]/transient_attributes/instance_attributes/nvpair[5])
Apr  6 22:43:27 node1 cib: [8625]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-23.raw
Apr  6 22:43:27 node1 crmd: [2991]: WARN: msg_to_op(1224): failed to get 
the value of field lrm_opstatus from a ha_msg
Apr  6 22:43:27 node1 crmd: [2991]: info: msg_to_op: Message follows:
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG: Dumping message with 16 
fields
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[0] : [lrm_t=op]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[1] : [lrm_rid=st-ibmrsa2]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[2] : [lrm_op=start]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[3] : [lrm_timeout=20000]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[4] : [lrm_interval=0]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[5] : [lrm_delay=0]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[6] : [lrm_copyparams=1]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[7] : [lrm_t_run=0]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[8] : [lrm_t_rcchange=0]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[9] : [lrm_exec_time=0]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[10] : [lrm_queue_time=0]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[11] : [lrm_targetrc=-1]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[12] : [lrm_app=crmd]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[13] : 
[lrm_userdata=43:130:0:113d7b66-f090-46d5-bb11-a1782de6fa92]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[14] : 
[(2)lrm_param=0x6525f0(148 182)]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG: Dumping message with 6 fields
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[0] : [crm_feature_set=3.0.1]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[1] : [username=hacluster]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[2] : [nodename=node2]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[3] : [CRM_meta_timeout=20000]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[4] : [ip_address=192.168.1.13]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[5] : [password=Cluster]
Apr  6 22:43:27 node1 crmd: [2991]: info: MSG[15] : [lrm_callid=43]
Apr  6 22:43:27 node1 cib: [8625]: info: write_cib_contents: Wrote 
version 0.347.0 of the CIB to disk (digest: 
a781b13afe33af0dee9384b257ba4955)
Apr  6 22:43:27 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local 
LRM refresh
Apr  6 22:43:27 node1 openais[2887]: [crm  ] ERROR: route_ais_message: 
Child 8626 spawned to record non-fatal assertion failure line 1297: dest 
 > 0 && dest < SIZEOF(pcmk_children)
Apr  6 22:43:27 node1 openais[2887]: [crm  ] ERROR: route_ais_message: 
Invalid destination: 0
Apr  6 22:43:27 node1 openais[2887]: [MAIN ] Msg[135] 
(dest=local:unknown, from=node4:crmd.4857, remote=true, size=852): 
<create_request_adv origin="send_direct_ack" t="crmd" version="3.0.1" 
subt="request" refer
Apr  6 22:43:27 node1 cib: [8625]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.PeQ17d (digest: 
/var/lib/heartbeat/crm/cib.rYQfMd)
Apr  6 22:43:27 node1 mgmtd: [2992]: info: Delete fail-count for 
st-ibmrsa2 from node4
Apr  6 22:43:27 node1 cib: [8627]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-24.raw
Apr  6 22:43:28 node1 cib: [8627]: info: write_cib_contents: Wrote 
version 0.348.0 of the CIB to disk (digest: 
edfb5a292bbe7f6b9a7f2f7e8951401c)
Apr  6 22:43:28 node1 cib: [8627]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.D2Mnak (digest: 
/var/lib/heartbeat/crm/cib.X6bWXj)
Apr  6 22:43:29 node1 stonithd: [8613]: info: external_run_cmd: Calling 
'/usr/lib64/stonith/plugins/external/ibmrsa-telnet status' returned 256
Apr  6 22:43:29 node1 stonithd: [2986]: *WARN: start st-ibmrsa2 failed, 
because its hostlist is empty*
Apr  6 22:43:29 node1 crmd: [2991]: info: process_lrm_event: LRM 
operation st-ibmrsa2_start_0 (call=43, rc=1, cib-update=176, 
confirmed=true) complete unknown error
Apr  6 22:43:29 node1 crmd: [2991]: info: do_lrm_rsc_op: Performing 
key=3:131:0:113d7b66-f090-46d5-bb11-a1782de6fa92 op=st-ibmrsa2_stop_0 )
Apr  6 22:43:29 node1 lrmd: [2988]: info: rsc:st-ibmrsa2: stop
Apr  6 22:43:29 node1 lrmd: [8628]: info: Try to stop STONITH resource 
<rsc_id=st-ibmrsa2> : Device=external/ibmrsa-telnet
Apr  6 22:43:29 node1 stonithd: [2986]: notice: try to stop a resource 
st-ibmrsa2 who is not in started resource queue.
Apr  6 22:43:29 node1 crmd: [2991]: info: process_lrm_event: LRM 
operation st-ibmrsa2_stop_0 (call=44, rc=0, cib-update=177, 
confirmed=true) complete ok
Apr  6 22:43:29 node1 cib: [2987]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh'] 
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 cib: [8630]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-25.raw
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr  6 22:43:30 node1 cib: [8630]: info: write_cib_contents: Wrote 
version 0.349.0 of the CIB to disk (digest: 
6adc75d1d6ea221f66c3de30e73561ff)
Apr  6 22:43:30 node1 cib: [8630]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.o8SqKq (digest: 
/var/lib/heartbeat/crm/cib.shG0Iw)
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 haclient: on_event: from message queue: 
evt:cib_changed
Apr  6 22:43:30 node1 mgmtd: [2992]: info: CIB query: cib
Apr  6 22:43:32 node1 haclient: on_event:evt:cib_changed

*2)*should I clone the resources? Why
*3)*after running the 3 stonith resources, the owner doesn't respect the 
location that I've specified when created? Why

This is the crm_mon output
============
Last updated: Tue Apr  6 22:50:22 2010
Current DC: node2 (node2)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
4 Nodes configured.
6 Resources configured.
============

Node: node2 (node2): online
Node: node1 (node1): online
Node: node3 (node3): online
Node: node4 (node4): online

Clone Set: dlm-clone
     dlm:0       (ocf::pacemaker:controld):      Started node3
     dlm:1       (ocf::pacemaker:controld):      Started node1
     dlm:2       (ocf::pacemaker:controld):      Started node2
     dlm:3       (ocf::pacemaker:controld):      Started node4
Clone Set: o2cb-clone
     o2cb:0      (ocf::ocfs2:o2cb):      Started node2
     o2cb:1      (ocf::ocfs2:o2cb):      Started node4
     o2cb:2      (ocf::ocfs2:o2cb):      Started node1
     o2cb:3      (ocf::ocfs2:o2cb):      Started node3
st-ibmrsa1      (stonith:external/ibmrsa-telnet):       Started node3
st-ibmrsa3      (stonith:external/ibmrsa-telnet):       Started node2
st-ibmrsa4      (stonith:external/ibmrsa-telnet):       Started node1

Failed actions:
     st-ibmrsa2_start_0 (node=node1, call=43, rc=1, status=complete): 
unknown error
     st-ibmrsa2_start_0 (node=node3, call=49, rc=1, status=complete): 
unknown error
     st-ibmrsa2_start_0 (node=node4, call=50, rc=1, status=complete): 
unknown error

Any idea to resolve it?
Regards,
Roberto.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100406/f7535bdf/attachment.html>


More information about the Pacemaker mailing list