[Pacemaker] crm resource cleanup ignored
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Jul 2 13:32:02 UTC 2010
Hi,
On Fri, Jul 02, 2010 at 02:56:04PM +0200, Bernd Schubert wrote:
> Hello all,
>
> after the update 1.0.9 on our test cluster, new weird stonith issues
> come up.
>
> 1) It fails to start stonith resources on *some* nodes
> =======================================================
>
> Jul 02 14:43:23 phys-oss3 pengine: [18077]: WARN: unpack_rsc_op: Processing failed op st-riloe-phys-oss1_start_0 on phys-oss3: unknown error
> (1)
>
> Failed actions:
> st-riloe-phys-oss1_start_0 (node=phys-oss3, call=25, rc=1, status=complete): unknown error
> st-riloe-phys-oss2_start_0 (node=phys-oss0, call=25, rc=1, status=complete): unknown error
>
>
> On other nodes it properly starts it:
>
> Node phys-oss0 (d8b9b1c6-fdf4-40f1-be3d-9158237ad4cb): online
> st-riloe-phys-oss1 (stonith:external/riloe) Started
>
>
> 2) When I try to clean it, it does not work:
> ============================================
>
> root at rhel5-nfs@phys-oss3:~# date
> Fri Jul 2 14:50:15 CEST 2010
>
>
> root at rhel5-nfs@phys-oss3:~# crm resource cleanup st-riloe-phys-oss1 phys-oss3
> Cleaning up st-riloe-phys-oss1 on phys-oss3
>
> crm_mon:
>
> Failed actions:
> st-riloe-phys-oss1_start_0 (node=phys-oss3, call=25, rc=1, status=complete): unknown error
> st-riloe-phys-oss2_start_0 (node=phys-oss0, call=25, rc=1, status=complete): unknown error
> Failed actions:
> st-riloe-phys-oss1_start_0 (node=phys-oss3, call=25, rc=1, status=complete): unknown error
> st-riloe-phys-oss2_start_0 (node=phys-oss0, call=25, rc=1, status=complete): unknown error
>
>
> root at rhel5-nfs@phys-oss3:~# tail /var/log/ha-log
> Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: ais_status_callback: status: phys-oss2 is now lost (was member)
Why did the node disappear? Any coredumps around?
> Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: crm_update_peer: Node phys-oss2: id=4 state=lost (new) addr=(null) votes=-1 born=5 seen=6
> proc=00000000000000000000000000000200
> Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: erase_node_from_join: Removed node phys-oss1 from join calculations: welcomed=0 itegrated=0
> finalized=0 confirmed=1
> Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: erase_node_from_join: Removed node phys-oss2 from join calculations: welcomed=0 itegrated=0
> finalized=0 confirmed=1
> Jul 02 14:48:40 phys-oss3 crmd: [18056]: info: populate_cib_nodes_ha: Requesting the list of configured nodes
> Jul 02 14:48:40 phys-oss3 cib: [18052]: info: cib_process_request: Operation complete: op cib_modify for section nodes
> (origin=local/crmd/133, version=0.735.1): ok (rc=0)
> Jul 02 14:50:23 phys-oss3 crmd: [18056]: notice: do_lrm_invoke: Not creating resource for a delete event: (null)
> Jul 02 14:50:23 phys-oss3 crmd: [18056]: info: send_direct_ack: ACK'ing resource op st-riloe-phys-oss1_delete_60000 from 0:0:crm-
> resource-21728: lrm_invoke-lrmd-1278075023-300
> Jul 02 14:51:14 phys-oss3 crmd: [18056]: notice: do_lrm_invoke: Not creating resource for a delete event: (null)
> Jul 02 14:51:14 phys-oss3 crmd: [18056]: info: send_direct_ack: ACK'ing resource op st-riloe-phys-oss1_delete_60000 from 0:0:crm-
> resource-21797: lrm_invoke-lrmd-1278075074-302
>
>
>
> Any ideas?
There should be more logs and some showing the actual error. If
you can't find it, then please open a bugzilla with hb_report.
Thanks,
Dejan
> Thanks,
> Bernd
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list