[Pacemaker] start st-ibmrsa2 failed, because its hostlist is empty / Stonith Owner
Roberto Giordani
roberto.giordani at trs.it
Tue Apr 6 21:01:31 UTC 2010
Hi all,
I'm working on a Pacemaker cluster with 4 nodes (IBM x3650) with RSA on
each node and they are connected through a Fiber Channel switch to an
IBM Storage with dual CTRL.
The scenario will be 4 nodes Dom0 with some Xen VMs as resources on
Opensuse 11.2 64bit environment
This is the crm configure show output:
node1:~ # crm configure show
node node2
node node1
node node3
node node4
primitive dlm ocf:pacemaker:controld \
op monitor interval="120s"
primitive o2cb ocf:ocfs2:o2cb \
op monitor interval="120s"
primitive st-ibmrsa1 stonith:external/ibmrsa-telnet params
ip_address="192.168.1.12" username="hacluster" password="Cluster"
nodename="node1" meta target-role="started"
primitive st-ibmrsa2 stonith:external/ibmrsa-telnet params
ip_address="192.168.1.13" username="hacluster" password="Cluster"
nodename="node2" meta target-role="started"
primitive st-ibmrsa3 stonith:external/ibmrsa-telnet params
ip_address="192.168.1.14" username="hacluster" password="Cluster"
nodename="node3" meta target-role="started"
primitive st-ibmrsa4 stonith:external/ibmrsa-telnet params
ip_address="192.168.1.15" username="hacluster" password="Cluster"
nodename="node4" meta target-role="started"
clone dlm-clone dlm meta interleave="true"
clone o2cb-clone o2cb meta interleave="true" target-role="started"
location l-st-nodo1 st-ibmrsa1 -inf: node1
location l-st-nodo2 st-ibmrsa2 -inf: node2
location l-st-nodo3 st-ibmrsa3 -inf: node3
location l-st-nodo4 st-ibmrsa4 -inf: node4
colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
order start-o2cb-after-dlm inf: dlm-clone o2cb-clone
property $id="cib-bootstrap-options"
dc-version="1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160" \
expected-quorum-votes="4" last-lrm-refresh="1270572049"
rsc_defaults $id="rsc-options" resource-stickiness="100"
I've already installed a simple cluster with two nodes with tha same HW
but with DRBD storage, now I'm going to create a production cluster
with a real storage.
The question are about the ibmrsa stonith resources:
*1)*I've added 4 resources external/ibmrsa-telnet resources, enabled the
stonith, and set the location for each one, but only 3 are running, the
last "st-ibmrsa2" didn't run
this is the log messages after "cleanup resource" command from the
pacemaker gui and the error seems "empty hostlist" but it is wrong!
Apr 6 22:43:21 node1 mgmtd: [2992]: info: Delete fail-count for
st-ibmrsa2 from node2
Apr 6 22:43:21 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local
LRM refresh
Apr 6 22:43:21 node1 openais[2887]: [crm ] ERROR: route_ais_message:
Child 8603 spawned to record non-fatal assertion failure line 1297: dest
> 0 && dest < SIZEOF(pcmk_children)
Apr 6 22:43:21 node1 openais[2887]: [crm ] ERROR: route_ais_message:
Invalid destination: 0
Apr 6 22:43:21 node1 openais[2887]: [MAIN ] Msg[358]
(dest=local:unknown, from=node2:crmd.4844, remote=true, size=853):
<create_request_adv origin="send_direct_ack" t="crmd" version="3.0.1"
subt="request" refer
Apr 6 22:43:21 node1 cib: [8604]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-19.raw
Apr 6 22:43:21 node1 cib: [8604]: info: write_cib_contents: Wrote
version 0.343.0 of the CIB to disk (digest: 63e0b94a027daf
19a1122391cd8653b0)
Apr 6 22:43:21 node1 cib: [8604]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.N2IctU (digest:
/var/lib/heartbeat/crm/cib.0FFVrB)
Apr 6 22:43:23 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh']
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr 6 22:43:23 node1 cib: [8605]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-20.raw
Apr 6 22:43:23 node1 crmd: [2991]: info: do_lrm_invoke: Removing
resource st-ibmrsa2 from the LRM
Apr 6 22:43:23 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/status//node_state[@id='node1']//nvpair[@name='fail-count-st-ibmrsa2']
(/cib/status/node_state[1]/transient_attributes/instance_attributes/nvpair[5])
Apr 6 22:43:23 node1 crmd: [2991]: info: send_direct_ack: ACK'ing
resource op st-ibmrsa2_delete_0 from mgmtd-2992:
lrm_invoke-lrmd-1270586603-24
Apr 6 22:43:23 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh']
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr 6 22:43:23 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local
LRM refresh
Apr 6 22:43:23 node1 cib: [8605]: info: write_cib_contents: Wrote
version 0.344.0 of the CIB to disk (digest: 898ef83ec60f0b
080c67dac0b96f4247)
Apr 6 22:43:23 node1 cib: [8605]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.4yG9ST (digest:
/var/lib/heartbeat/crm/cib.Dr1NRG)
Apr 6 22:43:23 node1 mgmtd: [2992]: info: Delete fail-count for
st-ibmrsa2 from node1
Apr 6 22:43:23 node1 cib: [8606]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-21.raw
Apr 6 22:43:23 node1 cib: [8606]: info: write_cib_contents: Wrote
version 0.345.0 of the CIB to disk (digest:
0db39b0d5be55ecf9ab68fd95c0ef307)
Apr 6 22:43:23 node1 cib: [8606]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.uZ6Dz0 (digest:
/var/lib/heartbeat/crm/cib.ZlmGIN)
Apr 6 22:43:25 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh']
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr 6 22:43:25 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/status//node_state[@id='node3']//nvpair[@name='fail-count-st-ibmrsa2']
(/cib/status/node_state[3]/transient_attributes/instance_attributes/nvpair[6])
Apr 6 22:43:25 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local
LRM refresh
Apr 6 22:43:25 node1 openais[2887]: [crm ] ERROR: route_ais_message:
Child 8607 spawned to record non-fatal assertion failure line 1297: dest
> 0 && dest < SIZEOF(pcmk_children)
Apr 6 22:43:25 node1 openais[2887]: [crm ] ERROR: route_ais_message:
Invalid destination: 0
Apr 6 22:43:25 node1 openais[2887]: [MAIN ] Msg[104]
(dest=local:unknown, from=node3:crmd.5002, remote=true, size=852):
<create_request_adv origin="send_direct_ack" t="crmd" version="3.0.1"
subt="request" refer
Apr 6 22:43:25 node1 mgmtd: [2992]: info: Delete fail-count for
st-ibmrsa2 from node3
Apr 6 22:43:25 node1 cib: [8608]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-22.raw
Apr 6 22:43:25 node1 cib: [8608]: info: write_cib_contents: Wrote
version 0.346.0 of the CIB to disk (digest:
e4061e0e405cf035c566f53a79935212)
Apr 6 22:43:25 node1 cib: [8608]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.88fw77 (digest:
/var/lib/heartbeat/crm/cib.SsgJA1)
Apr 6 22:43:25 node1 lrmd: [2988]: notice: lrmd_rsc_new(): No
lrm_rprovider field in message
Apr 6 22:43:25 node1 crmd: [2991]: info: do_lrm_rsc_op: Performing
key=13:130:7:113d7b66-f090-46d5-bb11-a1782de6fa92 op=st-ibmrsa2_monitor_0 )
Apr 6 22:43:25 node1 lrmd: [2988]: info: rsc:st-ibmrsa2: monitor
Apr 6 22:43:25 node1 crmd: [2991]: info: process_lrm_event: LRM
operation st-ibmrsa2_monitor_0 (call=42, rc=7, cib-update=172,
confirmed=true) complete not running
Apr 6 22:43:26 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/status//node_state[@id='node1']//nvpair[@name='probe_complete']
(/cib/status/node_state[1]/transient_attributes/instance_attributes/nvpair[1])
Apr 6 22:43:26 node1 crmd: [2991]: info: do_lrm_rsc_op: Performing
key=43:130:0:113d7b66-f090-46d5-bb11-a1782de6fa92 op=st-ibmrsa2_start_0 )
Apr 6 22:43:26 node1 lrmd: [2988]: info: rsc:st-ibmrsa2: start
Apr 6 22:43:26 node1 lrmd: [8611]: info: Try to start STONITH resource
<rsc_id=st-ibmrsa2> : Device=external/ibmrsa-telnet
Apr 6 22:43:27 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh']
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr 6 22:43:27 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/status//node_state[@id='node4']//nvpair[@name='fail-count-st-ibmrsa2']
(/cib/status/node_state[4]/transient_attributes/instance_attributes/nvpair[5])
Apr 6 22:43:27 node1 cib: [8625]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-23.raw
Apr 6 22:43:27 node1 crmd: [2991]: WARN: msg_to_op(1224): failed to get
the value of field lrm_opstatus from a ha_msg
Apr 6 22:43:27 node1 crmd: [2991]: info: msg_to_op: Message follows:
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG: Dumping message with 16
fields
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[0] : [lrm_t=op]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[1] : [lrm_rid=st-ibmrsa2]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[2] : [lrm_op=start]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[3] : [lrm_timeout=20000]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[4] : [lrm_interval=0]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[5] : [lrm_delay=0]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[6] : [lrm_copyparams=1]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[7] : [lrm_t_run=0]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[8] : [lrm_t_rcchange=0]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[9] : [lrm_exec_time=0]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[10] : [lrm_queue_time=0]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[11] : [lrm_targetrc=-1]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[12] : [lrm_app=crmd]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[13] :
[lrm_userdata=43:130:0:113d7b66-f090-46d5-bb11-a1782de6fa92]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[14] :
[(2)lrm_param=0x6525f0(148 182)]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG: Dumping message with 6 fields
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[0] : [crm_feature_set=3.0.1]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[1] : [username=hacluster]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[2] : [nodename=node2]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[3] : [CRM_meta_timeout=20000]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[4] : [ip_address=192.168.1.13]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[5] : [password=Cluster]
Apr 6 22:43:27 node1 crmd: [2991]: info: MSG[15] : [lrm_callid=43]
Apr 6 22:43:27 node1 cib: [8625]: info: write_cib_contents: Wrote
version 0.347.0 of the CIB to disk (digest:
a781b13afe33af0dee9384b257ba4955)
Apr 6 22:43:27 node1 crmd: [2991]: info: do_lrm_invoke: Forcing a local
LRM refresh
Apr 6 22:43:27 node1 openais[2887]: [crm ] ERROR: route_ais_message:
Child 8626 spawned to record non-fatal assertion failure line 1297: dest
> 0 && dest < SIZEOF(pcmk_children)
Apr 6 22:43:27 node1 openais[2887]: [crm ] ERROR: route_ais_message:
Invalid destination: 0
Apr 6 22:43:27 node1 openais[2887]: [MAIN ] Msg[135]
(dest=local:unknown, from=node4:crmd.4857, remote=true, size=852):
<create_request_adv origin="send_direct_ack" t="crmd" version="3.0.1"
subt="request" refer
Apr 6 22:43:27 node1 cib: [8625]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.PeQ17d (digest:
/var/lib/heartbeat/crm/cib.rYQfMd)
Apr 6 22:43:27 node1 mgmtd: [2992]: info: Delete fail-count for
st-ibmrsa2 from node4
Apr 6 22:43:27 node1 cib: [8627]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-24.raw
Apr 6 22:43:28 node1 cib: [8627]: info: write_cib_contents: Wrote
version 0.348.0 of the CIB to disk (digest:
edfb5a292bbe7f6b9a7f2f7e8951401c)
Apr 6 22:43:28 node1 cib: [8627]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.D2Mnak (digest:
/var/lib/heartbeat/crm/cib.X6bWXj)
Apr 6 22:43:29 node1 stonithd: [8613]: info: external_run_cmd: Calling
'/usr/lib64/stonith/plugins/external/ibmrsa-telnet status' returned 256
Apr 6 22:43:29 node1 stonithd: [2986]: *WARN: start st-ibmrsa2 failed,
because its hostlist is empty*
Apr 6 22:43:29 node1 crmd: [2991]: info: process_lrm_event: LRM
operation st-ibmrsa2_start_0 (call=43, rc=1, cib-update=176,
confirmed=true) complete unknown error
Apr 6 22:43:29 node1 crmd: [2991]: info: do_lrm_rsc_op: Performing
key=3:131:0:113d7b66-f090-46d5-bb11-a1782de6fa92 op=st-ibmrsa2_stop_0 )
Apr 6 22:43:29 node1 lrmd: [2988]: info: rsc:st-ibmrsa2: stop
Apr 6 22:43:29 node1 lrmd: [8628]: info: Try to stop STONITH resource
<rsc_id=st-ibmrsa2> : Device=external/ibmrsa-telnet
Apr 6 22:43:29 node1 stonithd: [2986]: notice: try to stop a resource
st-ibmrsa2 who is not in started resource queue.
Apr 6 22:43:29 node1 crmd: [2991]: info: process_lrm_event: LRM
operation st-ibmrsa2_stop_0 (call=44, rc=0, cib-update=177,
confirmed=true) complete ok
Apr 6 22:43:29 node1 cib: [2987]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='last-lrm-refresh']
(/cib/configuration/crm_config/cluster_property_set/nvpair[3])
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 cib: [8630]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-25.raw
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:29 node1 haclient: on_event:evt:cib_changed
Apr 6 22:43:30 node1 cib: [8630]: info: write_cib_contents: Wrote
version 0.349.0 of the CIB to disk (digest:
6adc75d1d6ea221f66c3de30e73561ff)
Apr 6 22:43:30 node1 cib: [8630]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.o8SqKq (digest:
/var/lib/heartbeat/crm/cib.shG0Iw)
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 haclient: on_event: from message queue:
evt:cib_changed
Apr 6 22:43:30 node1 mgmtd: [2992]: info: CIB query: cib
Apr 6 22:43:32 node1 haclient: on_event:evt:cib_changed
*2)*should I clone the resources? Why
*3)*after running the 3 stonith resources, the owner doesn't respect the
location that I've specified when created? Why
This is the crm_mon output
============
Last updated: Tue Apr 6 22:50:22 2010
Current DC: node2 (node2)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
4 Nodes configured.
6 Resources configured.
============
Node: node2 (node2): online
Node: node1 (node1): online
Node: node3 (node3): online
Node: node4 (node4): online
Clone Set: dlm-clone
dlm:0 (ocf::pacemaker:controld): Started node3
dlm:1 (ocf::pacemaker:controld): Started node1
dlm:2 (ocf::pacemaker:controld): Started node2
dlm:3 (ocf::pacemaker:controld): Started node4
Clone Set: o2cb-clone
o2cb:0 (ocf::ocfs2:o2cb): Started node2
o2cb:1 (ocf::ocfs2:o2cb): Started node4
o2cb:2 (ocf::ocfs2:o2cb): Started node1
o2cb:3 (ocf::ocfs2:o2cb): Started node3
st-ibmrsa1 (stonith:external/ibmrsa-telnet): Started node3
st-ibmrsa3 (stonith:external/ibmrsa-telnet): Started node2
st-ibmrsa4 (stonith:external/ibmrsa-telnet): Started node1
Failed actions:
st-ibmrsa2_start_0 (node=node1, call=43, rc=1, status=complete):
unknown error
st-ibmrsa2_start_0 (node=node3, call=49, rc=1, status=complete):
unknown error
st-ibmrsa2_start_0 (node=node4, call=50, rc=1, status=complete):
unknown error
Any idea to resolve it?
Regards,
Roberto.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100406/f7535bdf/attachment-0001.html>
More information about the Pacemaker
mailing list