[Pacemaker] crm_simulate a resource failure

Wed Oct 24 17:43:18 EDT 2012

On Thu, Oct 25, 2012 at 1:37 AM, Cal Heldenbrand <cal at fbsdata.com> wrote:
> Thanks Andrew!  My first few attempts at playing around with the failure
> states are working as expected.
>
> A few follow-ups below:
>
>
>> --op-fail isn't the command you want though.
>> From the man page:
>>
>>        -i, --op-inject=value
>>               $rsc_$task_$interval@$node=$rc - Inject the specified
>> task before running the simulation
>>
>>        -F, --op-fail=value
>>               $rsc_$task_$interval@$node=$rc - Fail the specified task
>> while running the simulation
>>
>> Note the difference between the two descriptions: before vs. while.
>> --op-inject is the one you want.  It is mostly useful for pretending a
>> recurring monitor failed and seeing what the cluster would do about
>> it.
>>
>> --op-fail on the other hand, is used for pretending that part of the
>> recovery process failed.
>
>
> Your follow up description here is great, and makes more sense.  I was
> reading "Fail the specified task" as literally, "here's my task, fail it and
> show me the results"  I'd suggest to add a little paragraph in the man page
> to elaborate these points too.

Ok, I'll add that today.

> Also, can you tell me what all of the return
> codes are?  Do I have to use integers, or do strings like "error" work?

Just integers I'm afraid.
The full list for OCF agents is here:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-ocf-return-codes.html
LSB return codes are slightly different.

> While we're on the subject of documentation / usability, I would also
> suggest to split out these two features into more parameters.  (What would
> happen if I named my resource with an underscore?)  Maybe something like:
>
> --op-pre-resource=[primitive name]
> --op-pre-task=[monitor|start|stop]
> --op-pre-interval=[integer]
> --op-pre-node=[hostname]
> --op-pre-rc=[error|timeout|other stuff]
>
> Then have similar --op-post-* parameters.  Or whatever verbs make the most
> sense in the spirit of Pacemaker vocabulary.  (pre/post, before/after,
> inject/fail, input/output, etc)

The reason for not doing that, is that we wanted to be able to inject
multiple pre/post failures at a time and see the result.

> And, examples are always awesome in man
> pages too.
>
> Of course, this is all great future version stuff, but that doesn't help all
> of the RedHat 6 people that will be using pacemaker 1.1 packages for the
> next ~10 years until RedHat 7 comes out.

Don;t worry, the man page updates we just talked about will be in the
6.4 packages :)

> So I suppose documenting the old
> code in the online docs is a Good Thing.  :-)
>
> Thanks again!
>
> --Cal
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>