[Pacemaker] crm resource restart fails to restart the service
Dan Frincu
dfrincu at streamwide.ro
Wed Nov 17 15:00:09 UTC 2010
Hi,
Vadym Chepkov wrote:
> On Wed, Nov 17, 2010 at 8:01 AM, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
>
>
>>> RA doesn't support restart action? Most luckily you get OCF_ERR_UNIMPLEMENTED in the log
>>>
>> It's actually a resource stop followed by start. It says so in
>> the help too. Perhaps the start precludes the stop action. The
>> logs should give a hint. We need a sleep in between.
>>
>>
>
> In this case this command is not working at all, because I tried in
> the past for many resources and it never worked, so I just assumed it
> has to be implemented by RA.
>
> To test it right now I issued a command
> # crm resource restart xen_vbuild
>
> where xen_vbuild is a Xen VM and no results whatsoever.
>
> Here is output of the log
>
> Nov 17 13:04:13 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:05:14 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:06:14 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:07:15 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> + <primitive id="xen_vbuild" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> + <meta_attributes id="xen_vbuild-meta_attributes"
> __crm_diff_marker__="added:top" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> + <nvpair id="xen_vbuild-meta_attributes-target-role"
> name="target-role" value="Stopped" />
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> - <primitive id="xen_vbuild" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> - <meta_attributes id="xen_vbuild-meta_attributes" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> - <nvpair value="Stopped"
> id="xen_vbuild-meta_attributes-target-role" />
> Nov 17 13:07:44 xen-11 pengine: [22958]: debug: unpack_lrm_rsc_state:
> xen_vbuild: Overwriting calculated next role Unknown with requested
> next role Stopped
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> + <primitive id="xen_vbuild" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> + <meta_attributes id="xen_vbuild-meta_attributes" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> + <nvpair value="Started"
> id="xen_vbuild-meta_attributes-target-role" />
> Nov 17 13:07:45 xen-11 pengine: [22958]: debug: unpack_lrm_rsc_state:
> xen_vbuild: Overwriting calculated next role Unknown with requested
> next role Stopped
> Nov 17 13:07:45 xen-11 pengine: [22958]: notice: native_print:
> xen_vbuild (ocf::heartbeat:Xen): Started xen-11
> Nov 17 13:07:45 xen-11 pengine: [22958]: debug: native_assign_node:
> All nodes for resource xen_vbuild are unavailable, unclean or shutting
> down (xen-11: 1, -1000000)
> Nov 17 13:07:45 xen-11 pengine: [22958]: debug: native_assign_node:
> Could not allocate a node for xen_vbuild
> Nov 17 13:07:45 xen-11 pengine: [22958]: info: native_color: Resource
> xen_vbuild cannot run anywhere
> Nov 17 13:07:45 xen-11 pengine: [22958]: notice: LogActions: Stop
> resource xen_vbuild (xen-11)
> Nov 17 13:07:46 xen-11 pengine: [22958]: notice: native_print:
> xen_vbuild (ocf::heartbeat:Xen): Started xen-11
> Nov 17 13:07:46 xen-11 pengine: [22958]: debug: native_assign_node:
> Assigning xen-11 to xen_vbuild
> Nov 17 13:07:46 xen-11 pengine: [22958]: notice: LogActions: Leave
> resource xen_vbuild (Started xen-11)
> Nov 17 13:08:15 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:09:15 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:10:16 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:11:16 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:12:17 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:13:17 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:14:17 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:15:18 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:16:18 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:17:19 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:17:47 xen-11 pengine: [22958]: notice: native_print:
> xen_vbuild (ocf::heartbeat:Xen): Started xen-11
> Nov 17 13:17:47 xen-11 pengine: [22958]: debug: native_assign_node:
> Assigning xen-11 to xen_vbuild
> Nov 17 13:17:47 xen-11 pengine: [22958]: notice: LogActions: Leave
> resource xen_vbuild (Started xen-11)
> Nov 17 13:18:19 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:19:19 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:20:20 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
>
> but VM never stopped:
>
>
> [root at xen-11 ~]# xm list|grep vbuild
> vbuild 3 511 2 -b---- 352.4
>
>
> still ID 3 as it was before
>
In my case the custom OCF RA, works, after some tweaks, now I'm stuck
with the mysql RA, I think this is the issue:
/usr/lib/ocf/resource.d/heartbeat# ./mysql stop
./mysql: line 523: (/1000)-5: syntax error: operand expected (error
token is "/1000)-5")
First I thought it was because I set the monitor, start and stop
timeouts to other values than the default, but even after setting the
defaults, same thing.
primitive mysqld ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
enable_creation="0" datadir="/mysql/database" user="root"
test_user="monitor" test_passwd="monitor" test_table="cluster.monitor" \
op monitor interval="10s" timeout="30s" \
op start interval="0s" timeout="120s" \
op stop interval="0s" timeout="120s" \
meta target-role="Started"
Same thing goes for httpd RA:
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# pgrep httpd
11985
11997
11998
11999
12000
12001
12002
12003
12004
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# crm resource restart httpd
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# pgrep httpd
11985
11997
11998
11999
12000
12001
12002
12003
12004
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# crm
crm(live)# resource
crm(live)resource# restart httpd
crm(live)resource# exit
bye
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# pgrep httpd
11985
11997
11998
11999
12000
12001
12002
12003
12004
And the logs
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <cib admin_epoch="0" epoch="57" num_updates="4" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <configuration >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <resources >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <nvpair value="Started"
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </meta_attributes>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </resources>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </configuration>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: need_abort: Aborting on
change to admin_epoch
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <cib admin_epoch="0" epoch="58" num_updates="1" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <configuration >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <resources >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke: Query 17431:
Requesting the current CIB: S_POLICY_ENGINE
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <nvpair value="Stopped"
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </meta_attributes>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </resources>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </configuration>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: cib_process_request:
Operation complete: op cib_replace for section resources
(origin=local/cibadmin/2, version=0.58.1): ok (rc=0)
Nov 17 07:51:11 cluster1 cib: [5026]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-49.raw
Nov 17 07:51:11 cluster1 cib: [5026]: info: write_cib_contents: Wrote
version 0.58.0 of the CIB to disk (digest: 3027e9f4990e368ce7536b46a7f8b46b)
Nov 17 07:51:11 cluster1 cib: [5026]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.cNSkQf (digest:
/var/lib/heartbeat/crm/cib.kv1k0m)
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke_callback:
Invoking the PE: query=17431, ref=pe_calc-dc-1290005471-17354, seq=116,
quorate=1
Nov 17 07:51:11 cluster1 pengine: [25687]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Nov 17 07:51:11 cluster1 pengine: [25687]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Nov 17 07:51:11 cluster1 pengine: [25687]: info:
determine_online_status: Node cluster1 is online
Nov 17 07:51:11 cluster1 pengine: [25687]: info:
determine_online_status: Node cluster2 is online
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <cib admin_epoch="0" epoch="58" num_updates="1" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <configuration >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <resources >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- <nvpair value="Stopped"
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </meta_attributes>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </resources>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </configuration>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
- </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <cib admin_epoch="0" epoch="59" num_updates="1" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <configuration >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <resources >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ <nvpair value="Started"
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 crmd: [25688]: info: unpack_graph: Unpacked
transition 17187: 6 actions in 6 synapses
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </meta_attributes>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_te_invoke: Processing
graph 17187 (ref=pe_calc-dc-1290005471-17354) derived from
/var/lib/pengine/pe-input-27464.bz2
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </resources>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </configuration>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff:
+ </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: cib_process_request:
Operation complete: op cib_replace for section resources
(origin=local/cibadmin/2, version=0.59.1): ok (rc=0)
Nov 17 07:51:11 cluster1 crmd: [25688]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=0) : Non-status change
Nov 17 07:51:11 cluster1 crmd: [25688]: info: update_abort_priority:
Abort priority upgraded from 0 to 1000000
Nov 17 07:51:11 cluster1 crmd: [25688]: info: update_abort_priority:
Abort action done superceeded by restart
Nov 17 07:51:11 cluster1 crmd: [25688]: info: need_abort: Aborting on
change to admin_epoch
Nov 17 07:51:11 cluster1 crmd: [25688]: info: run_graph:
====================================================
Nov 17 07:51:11 cluster1 crmd: [25688]: notice: run_graph: Transition
17187 (Complete=0, Pending=0, Fired=0, Skipped=6, Incomplete=0,
Source=/var/lib/pengine/pe-input-27464.bz2): Stopped
Nov 17 07:51:11 cluster1 crmd: [25688]: info: te_graph_trigger:
Transition 17187 is now complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke: Query 17432:
Requesting the current CIB: S_POLICY_ENGINE
Nov 17 07:51:11 cluster1 pengine: [25687]: info: process_pe_message:
Transition 17187: PEngine Input stored in:
/var/lib/pengine/pe-input-27464.bz2
Nov 17 07:51:11 cluster1 cib: [5028]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-50.raw
Nov 17 07:51:11 cluster1 cib: [5028]: info: write_cib_contents: Wrote
version 0.59.0 of the CIB to disk (digest: 0363c0ff8bed95a2195288798a9ece7b)
Nov 17 07:51:11 cluster1 cib: [5028]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.pqsr2f (digest:
/var/lib/heartbeat/crm/cib.N4eyon)
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke_callback:
Invoking the PE: query=17432, ref=pe_calc-dc-1290005471-17355, seq=116,
quorate=1
Nov 17 07:51:11 cluster1 pengine: [25687]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Nov 17 07:51:11 cluster1 pengine: [25687]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Nov 17 07:51:11 cluster1 pengine: [25687]: info:
determine_online_status: Node cluster1 is online
Nov 17 07:51:11 cluster1 pengine: [25687]: info:
determine_online_status: Node cluster2 is online
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: unpack_graph: Unpacked
transition 17188: 0 actions in 0 synapses
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_te_invoke: Processing
graph 17188 (ref=pe_calc-dc-1290005471-17355) derived from
/var/lib/pengine/pe-input-27465.bz2
Nov 17 07:51:11 cluster1 crmd: [25688]: info: run_graph:
====================================================
Nov 17 07:51:11 cluster1 crmd: [25688]: notice: run_graph: Transition
17188 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-27465.bz2): Complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: te_graph_trigger:
Transition 17188 is now complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: notify_crmd: Transition
17188 status: done - <null>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition:
Starting PEngine Recheck Timer
Nov 17 07:51:11 cluster1 pengine: [25687]: info: process_pe_message:
Transition 17188: PEngine Input stored in:
/var/lib/pengine/pe-input-27465.bz2
Regards,
Dan
> Vadym
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101117/392ddff5/attachment-0001.htm>
More information about the Pacemaker
mailing list