[Pacemaker] crm_simulate a resource failure

Thu Oct 25 20:05:15 EDT 2012

On Fri, Oct 26, 2012 at 2:31 AM, Cal Heldenbrand <cal at fbsdata.com> wrote:
> Andrew,
>
> The updated description looks nice, but could you please remove my
> fbsdata.com domain name from the man page?

Sure.  I assumed it was an internal network name that wasn't reachable
from outside.

> Also, the "memcached" OCF script
> was my own creation, and might not be a good example.

Its pretty irrelevant what the name of the resource or type of the
agent is (I had assumed it was an LSB script).
It just needs to be something that doesn't say ${resource} :-)

> Maybe one of the
> other commonly used example resources like an IP address or mysql or
> something?
>
> If you guys would like to include my memcache my OCF script in the pacemaker
> distribution just let me know, I'll clean it up for public use and email it.

resource-agents would be the better project, i'm sure they'd love to have it

> Thanks again!
>
> --Cal
>
>
>
> On Wed, Oct 24, 2012 at 7:32 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>> How is this?
>>
>> ...
>>
>>        -i, --op-inject=value
>>               Generate a failure for the cluster to react to in the
>> simulation
>>
>>               Value is of the form
>> ${resource}_${task}_${interval}@${node}=${rc}.  Eg.
>> memcached_monitor_20000 at m1.fbsdata.com=7
>>
>>        -F, --op-fail=value
>>               If the specified task occurs during the simulation, have
>> it fail with return code ${rc}
>>
>>               Value is of the form
>> ${resource}_${task}_${interval}@${node}=${rc}.  Eg.
>> memcached_stop_0 at m1.fbsdata.com=1
>>
>>               The transition will normally stop at the failed action,
>> save the result with --save-output and re-run crm_simulate with
>> --xml-file
>>
>> ...
>>
>> EXAMPLES
>>        Pretend the recurring memcached monitor failed on node
>> m1.fbsdata.com and, during recovery, that the memcached stop action
>> did too
>>
>>               # crm_simulate -LS --op-inject
>> memcached:0_monitor_20000 at m1.fbsdata.com=7 --op-fail
>> memcached:0_stop_0 at m1.fbsdata.com=1 --save-output
>> /tmp/memcached-test.xml
>>
>>        Now see what the reaction to the stop failure would be
>>
>>               # crm_simulate -S --xml-file /tmp/memcached-test.xml
>>
>>
>>
>> On Thu, Oct 25, 2012 at 8:43 AM, Andrew Beekhof <andrew at beekhof.net>
>> wrote:
>> > On Thu, Oct 25, 2012 at 1:37 AM, Cal Heldenbrand <cal at fbsdata.com>
>> > wrote:
>> >> Thanks Andrew!  My first few attempts at playing around with the
>> >> failure
>> >> states are working as expected.
>> >>
>> >> A few follow-ups below:
>> >>
>> >>
>> >>> --op-fail isn't the command you want though.
>> >>> From the man page:
>> >>>
>> >>>        -i, --op-inject=value
>> >>>               $rsc_$task_$interval@$node=$rc - Inject the specified
>> >>> task before running the simulation
>> >>>
>> >>>        -F, --op-fail=value
>> >>>               $rsc_$task_$interval@$node=$rc - Fail the specified task
>> >>> while running the simulation
>> >>>
>> >>> Note the difference between the two descriptions: before vs. while.
>> >>> --op-inject is the one you want.  It is mostly useful for pretending a
>> >>> recurring monitor failed and seeing what the cluster would do about
>> >>> it.
>> >>>
>> >>> --op-fail on the other hand, is used for pretending that part of the
>> >>> recovery process failed.
>> >>
>> >>
>> >> Your follow up description here is great, and makes more sense.  I was
>> >> reading "Fail the specified task" as literally, "here's my task, fail
>> >> it and
>> >> show me the results"  I'd suggest to add a little paragraph in the man
>> >> page
>> >> to elaborate these points too.
>> >
>> > Ok, I'll add that today.
>> >
>> >> Also, can you tell me what all of the return
>> >> codes are?  Do I have to use integers, or do strings like "error" work?
>> >
>> > Just integers I'm afraid.
>> > The full list for OCF agents is here:
>> >
>> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-ocf-return-codes.html
>> > LSB return codes are slightly different.
>> >
>> >> While we're on the subject of documentation / usability, I would also
>> >> suggest to split out these two features into more parameters.  (What
>> >> would
>> >> happen if I named my resource with an underscore?)  Maybe something
>> >> like:
>> >>
>> >> --op-pre-resource=[primitive name]
>> >> --op-pre-task=[monitor|start|stop]
>> >> --op-pre-interval=[integer]
>> >> --op-pre-node=[hostname]
>> >> --op-pre-rc=[error|timeout|other stuff]
>> >>
>> >> Then have similar --op-post-* parameters.  Or whatever verbs make the
>> >> most
>> >> sense in the spirit of Pacemaker vocabulary.  (pre/post, before/after,
>> >> inject/fail, input/output, etc)
>> >
>> > The reason for not doing that, is that we wanted to be able to inject
>> > multiple pre/post failures at a time and see the result.
>> >
>> >> And, examples are always awesome in man
>> >> pages too.
>> >>
>> >> Of course, this is all great future version stuff, but that doesn't
>> >> help all
>> >> of the RedHat 6 people that will be using pacemaker 1.1 packages for
>> >> the
>> >> next ~10 years until RedHat 7 comes out.
>> >
>> > Don;t worry, the man page updates we just talked about will be in the
>> > 6.4 packages :)
>> >
>> >> So I suppose documenting the old
>> >> code in the online docs is a Good Thing.  :-)
>> >>
>> >> Thanks again!
>> >>
>> >> --Cal
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>