[Pacemaker] crm_simulate a resource failure

Tue Oct 23 18:24:09 EDT 2012

On Wed, Oct 24, 2012 at 5:01 AM, Cal Heldenbrand <cal at fbsdata.com> wrote:
> Thanks Jake, that at gives a little better description of the parameters,
> but I still just can't seem to get anything to trigger with the various
> syntaxes I'm trying.  See below, I'm using single quotes so the $ symbol
> isn't parsed by bash.  I've tried using my clone name, different return
> values, different task names, without the $ symbols... nothing seems to
> trigger anything in the Transition stuff.  And I don't get any error
> messages at all.
>
> Any other ideas for me?

Definitely don't include the $ symbols.
$rsc for example was intended to mean "put the name of your resource here".
Maybe I need to include an example too.

--op-fail isn't the command you want though.
>From the man page:

       -i, --op-inject=value
              $rsc_$task_$interval@$node=$rc - Inject the specified
task before running the simulation

       -F, --op-fail=value
              $rsc_$task_$interval@$node=$rc - Fail the specified task
while running the simulation

Note the difference between the two descriptions: before vs. while.
--op-inject is the one you want.  It is mostly useful for pretending a
recurring monitor failed and seeing what the cluster would do about
it.

--op-fail on the other hand, is used for pretending that part of the
recovery process failed.

So if you ran:

crm_simulate  -LS --op-inject memcached:0_monitor_1 at m1.fbsdata.com=7
--op-fail memcached:0_stop_0 at m1.fbsdata.com=1 --save-output
/tmp/memcached-test.xml

You see what Pacemaker would do if a monitoring failure of memcached occurred.
The simulation would stop at the point the memcached stop action was
run (because we also specified it should fail too), so anything that
needed memcached to stop first would not yet be stopped.

You can then see how Pacemaker would react to the second failure by running:

crm_simulate  --xml-file /tmp/memcached-test.xml -S

Perhaps the man page should include an example like this?

>
> Thanks!
>
> ---------------------------------------------------------------------------------
> [root at m3 /]# crm_simulate -LS
> --op-fail='$memcached:0_$monitor_$1@$m1.fbsdata.com=$not_running'
>
> Current cluster status:
> Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ]
>
>  Clone Set: memcached_clone [memcached]
>      Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ]
>  cluster-ip-m1  (ocf::heartbeat:IPaddr2):       Started m1.fbsdata.com
>  cluster-ip-m2  (ocf::heartbeat:IPaddr2):       Started m2.fbsdata.com
>  cluster-ip-m3  (ocf::heartbeat:IPaddr2):       Started m3.fbsdata.com
>
> Transition Summary:
>
> Executing cluster transition:
>
> Revised cluster status:
> Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ]
>
>  Clone Set: memcached_clone [memcached]
>      Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ]
>  cluster-ip-m1  (ocf::heartbeat:IPaddr2):       Started m1.fbsdata.com
>  cluster-ip-m2  (ocf::heartbeat:IPaddr2):       Started m2.fbsdata.com
>  cluster-ip-m3  (ocf::heartbeat:IPaddr2):       Started m3.fbsdata.com
> ---------------------------------------------------------------------------------
>
> On Tue, Oct 23, 2012 at 12:27 PM, Jake Smith <jsmith at argotec.com> wrote:
>>
>>
>> ----- Original Message -----
>>
>> > From: "Cal Heldenbrand" <cal at fbsdata.com>
>> > To: pacemaker at oss.clusterlabs.org
>> > Sent: Tuesday, October 23, 2012 11:50:11 AM
>> > Subject: [Pacemaker] crm_simulate a resource failure
>>
>> > Hi everyone,
>>
>> > I'm not able to find documentation or examples on this. If I have a
>> > cloned primitive set across a cluster, how can I simulate a failure
>> > of a resource on an individual node? I mainly want to see the scores
>> > on why a particular action is taken so I can adjust my configs.
>>
>> > I think the --op-fail parameter is what I need, but I just don't get
>> > the syntax of the value in the man page.
>>
>> I usually use the crm shell so I'm not positive but I think these are the
>> parts you need...
>>
>> $rsc_$task_$interval@$node=$rc
>>
>> $rsc = resource to test, in your case I believe you want to specify the
>> primitive instance of the clone i.e. p_resource:0
>> $task = monitor or migrate or stop or whatever operation you want to take
>> $interval = the interval of a monitor task
>> $node = the node
>> $rc = the exit code you want to fail with i.e. error, not_running
>>
>> So (I think) something like:
>> --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running
>>
>> You *should* be able to experiment till you get it just right since its
>> simulate.. :-)
>>
>> HTH
>>
>> Jake
>>
>> > Thank you!
>>
>> > --Cal
>>
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>