[Pacemaker] crm_simulate a resource failure

Thu Oct 25 00:32:33 UTC 2012

How is this?

...

       -i, --op-inject=value
              Generate a failure for the cluster to react to in the simulation

              Value is of the form
${resource}_${task}_${interval}@${node}=${rc}.  Eg.
memcached_monitor_20000 at m1.fbsdata.com=7

       -F, --op-fail=value
              If the specified task occurs during the simulation, have
it fail with return code ${rc}

              Value is of the form
${resource}_${task}_${interval}@${node}=${rc}.  Eg.
memcached_stop_0 at m1.fbsdata.com=1

              The transition will normally stop at the failed action,
save the result with --save-output and re-run crm_simulate with
--xml-file

...

EXAMPLES
       Pretend the recurring memcached monitor failed on node
m1.fbsdata.com and, during recovery, that the memcached stop action
did too

              # crm_simulate -LS --op-inject
memcached:0_monitor_20000 at m1.fbsdata.com=7 --op-fail
memcached:0_stop_0 at m1.fbsdata.com=1 --save-output
/tmp/memcached-test.xml

       Now see what the reaction to the stop failure would be

              # crm_simulate -S --xml-file /tmp/memcached-test.xml

On Thu, Oct 25, 2012 at 8:43 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Thu, Oct 25, 2012 at 1:37 AM, Cal Heldenbrand <cal at fbsdata.com> wrote:
>> Thanks Andrew!  My first few attempts at playing around with the failure
>> states are working as expected.
>>
>> A few follow-ups below:
>>
>>
>>> --op-fail isn't the command you want though.
>>> From the man page:
>>>
>>>        -i, --op-inject=value
>>>               $rsc_$task_$interval@$node=$rc - Inject the specified
>>> task before running the simulation
>>>
>>>        -F, --op-fail=value
>>>               $rsc_$task_$interval@$node=$rc - Fail the specified task
>>> while running the simulation
>>>
>>> Note the difference between the two descriptions: before vs. while.
>>> --op-inject is the one you want.  It is mostly useful for pretending a
>>> recurring monitor failed and seeing what the cluster would do about
>>> it.
>>>
>>> --op-fail on the other hand, is used for pretending that part of the
>>> recovery process failed.
>>
>>
>> Your follow up description here is great, and makes more sense.  I was
>> reading "Fail the specified task" as literally, "here's my task, fail it and
>> show me the results"  I'd suggest to add a little paragraph in the man page
>> to elaborate these points too.
>
> Ok, I'll add that today.
>
>> Also, can you tell me what all of the return
>> codes are?  Do I have to use integers, or do strings like "error" work?
>
> Just integers I'm afraid.
> The full list for OCF agents is here:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-ocf-return-codes.html
> LSB return codes are slightly different.
>
>> While we're on the subject of documentation / usability, I would also
>> suggest to split out these two features into more parameters.  (What would
>> happen if I named my resource with an underscore?)  Maybe something like:
>>
>> --op-pre-resource=[primitive name]
>> --op-pre-task=[monitor|start|stop]
>> --op-pre-interval=[integer]
>> --op-pre-node=[hostname]
>> --op-pre-rc=[error|timeout|other stuff]
>>
>> Then have similar --op-post-* parameters.  Or whatever verbs make the most
>> sense in the spirit of Pacemaker vocabulary.  (pre/post, before/after,
>> inject/fail, input/output, etc)
>
> The reason for not doing that, is that we wanted to be able to inject
> multiple pre/post failures at a time and see the result.
>
>> And, examples are always awesome in man
>> pages too.
>>
>> Of course, this is all great future version stuff, but that doesn't help all
>> of the RedHat 6 people that will be using pacemaker 1.1 packages for the
>> next ~10 years until RedHat 7 comes out.
>
> Don;t worry, the man page updates we just talked about will be in the
> 6.4 packages :)
>
>> So I suppose documenting the old
>> code in the online docs is a Good Thing.  :-)
>>
>> Thanks again!
>>
>> --Cal
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>