[Pacemaker] resource starts but then fails right away
Andrew Beekhof
andrew at beekhof.net
Mon May 13 01:13:25 CEST 2013
On 10/05/2013, at 9:23 PM, Brian J. Murrell <brian at interlinx.bc.ca> wrote:
> On 13-05-09 09:53 PM, Andrew Beekhof wrote:
>>
>> May 7 02:36:16 node1 crmd[16836]: info: delete_resource: Removing resource testfs-resource1 for 18002_crm_resource (internal) on node1
>> May 7 02:36:16 node1 lrmd: [16833]: info: flush_op: process for operation monitor[8] on ocf::Target::testfs-resource1 for client 16836 still running, flush delayed
>> May 7 02:36:16 node1 crmd[16836]: info: lrm_remove_deleted_op: Removing op testfs-resource1_monitor_0:8 for deleted resource testfs-resource1
>>
>> So apparently a badly timed cleanup was run.
>
> :-( I didn't know there could such timing problems. I might have to
> change my process a bit then perhaps.
>
>> Did you do that or was it the crm shell?
>
> That was "me" doing a "crm resource cleanup" (soon to become
> "crm_resource -r ... --cleanup"). The process is typically:
>
> - create resource
> - start resource
> - wait for resource to start
>
> where "start resource" is:
> - "clean it to start with a known clean resource"
> (crm resource cleanup)
> - "start resource"
> (crm_resource -r ... -p target-role -m -v Started)
>
> and "wait for resource" is a loop of "crm resource status ..." (soon to
> be "crm_resource -r ... --locate")
>
> So the create, clean, start operations happen in quite quick succession
> (i.e. scripted). Is that pathological? Is a clean between create and
> start known to be problematic?
Its certainly known to be unnecessary.
In some older versions it is also problematic.
>
> FWIW, the reason for clean before the start, even after just creating
> the resource is that "clean" and "start" are lumped together into a
> function that is called after create, but can also be called at other
> times during the life-cycle, so it could be needed to clean a resource
> before trying to start it. I was hoping the cleaning of a just created
> resource was going to be effectively a NOOP.
Its never a no-op, and at that particular point the cluster is trying to discover the status of the resource.
Running a clean in the middle of that interferes with this.
>
> I guess for completeness, I should add here that creating the resource
> is a "cibadmin -o resource -C ..." operation.
>
>> If the machine is heavily loaded, or just very busy with file I/O, that can still take quite a long time.
>
> Yeah, not very loaded at all, especially at this point. This is all
> happening before anything really gets started on the machine... this is
> the process of getting the resources up and running and the machine is
> dedicated to running the tasks associated with these resources.
>
> Cheers,
> b.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list