[Pacemaker] Question about the resource to fence a node

Fri Nov 15 00:36:18 UTC 2013

On 14 Nov 2013, at 5:53 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:

> Hi, Andrew
> 
> 2013/11/13 Kazunori INOUE <kazunori.inoue3 at gmail.com>:
>> 2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
>>> 
>>> On 16 Oct 2013, at 8:51 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>>> 
>>>> 
>>>> On 15/10/2013, at 8:24 PM, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm using pacemaker-1.1 (the latest devel).
>>>>> I started resource (f1 and f2) which fence vm3 on vm1.
>>>>> 
>>>>> $ crm_mon -1
>>>>> Last updated: Tue Oct 15 15:16:37 2013
>>>>> Last change: Tue Oct 15 15:16:21 2013 via crmd on vm1
>>>>> Stack: corosync
>>>>> Current DC: vm1 (3232261517) - partition with quorum
>>>>> Version: 1.1.11-0.284.6a5e863.git.el6-6a5e863
>>>>> 3 Nodes configured
>>>>> 3 Resources configured
>>>>> 
>>>>> Online: [ vm1 vm2 vm3 ]
>>>>> 
>>>>> pDummy (ocf::pacemaker:Dummy): Started vm3
>>>>> Resource Group: gStonith3
>>>>>   f1 (stonith:external/libvirt):     Started vm1
>>>>>   f2 (stonith:external/ssh): Started vm1
>>>>> 
>>>>> 
>>>>> "reset" of f1 which hasn't been started on vm2 was performed when vm3 is fenced.
>>>>> 
>>>>> $ ssh vm3 'rm -f /var/run/Dummy-pDummy.state'
>>>>> $ for i in vm1 vm2; do ssh $i 'hostname; egrep " reset | off "
>>>>> /var/log/ha-log'; done
>>>>> vm1
>>>>> Oct 15 15:17:35 vm1 stonith-ng[14870]:  warning: log_operation:
>>>>> f2:15076 [ Performing: stonith -t external/ssh -T reset vm3 ]
>>>>> Oct 15 15:18:06 vm1 stonith-ng[14870]:  warning: log_operation:
>>>>> f2:15464 [ Performing: stonith -t external/ssh -T reset vm3 ]
>>>>> vm2
>>>>> Oct 15 15:17:16 vm2 stonith-ng[9160]:  warning: log_operation: f1:9273
>>>>> [ Performing: stonith -t external/libvirt -T reset vm3 ]
>>>>> Oct 15 15:17:46 vm2 stonith-ng[9160]:  warning: log_operation: f1:9588
>>>>> [ Performing: stonith -t external/libvirt -T reset vm3 ]
>>>>> 
>>>>> Is it specifications?
>>>> 
>>>> Yes, although the host on which the device is started usually gets priority.
>>>> I will try to find some time to look through the report to see why this didn't happen.
>>> 
>>> Reading through this again, it sounds like it should be fixed by your earlier pull request:
>>> 
>>>   https://github.com/beekhof/pacemaker/commit/6b4bfd6
>>> 
>>> Yes?
>> 
>> No.
> 
> How is this change?

Thanks for this.  I tweaked it a bit further and pushed:

https://github.com/beekhof/pacemaker/commit/4cbbeb0

> 
> diff --git a/fencing/remote.c b/fencing/remote.c
> index 6c11ba9..68b31c5 100644
> --- a/fencing/remote.c
> +++ b/fencing/remote.c
> @@ -778,6 +778,7 @@ stonith_choose_peer(remote_fencing_op_t * op)
> {
>     st_query_result_t *peer = NULL;
>     const char *device = NULL;
> +    uint32_t active = fencing_active_peers();
> 
>     do {
>         if (op->devices) {
> @@ -790,7 +791,8 @@ stonith_choose_peer(remote_fencing_op_t * op)
> 
>         if ((peer = find_best_peer(device, op, FIND_PEER_SKIP_TARGET
> | FIND_PEER_VERIFIED_ONLY))) {
>             return peer;
> -        } else if ((peer = find_best_peer(device, op,
> FIND_PEER_SKIP_TARGET))) {
> +        } else if ((op->query_timer == 0 || op->replies >=
> op->replies_expected || op->replies >= active)
> +                   && (peer = find_best_peer(device, op,
> FIND_PEER_SKIP_TARGET))) {
>             return peer;
>         } else if ((peer = find_best_peer(device, op,
> FIND_PEER_TARGET_ONLY))) {
>             return peer;
> @@ -801,8 +803,13 @@ stonith_choose_peer(remote_fencing_op_t * op)
>              && stonith_topology_next(op) == pcmk_ok);
> 
>     if (op->devices) {
> -        crm_notice("Couldn't find anyone to fence %s with %s", op->target,
> -                   (char *)op->devices->data);
> +        if (op->query_timer == 0 || op->replies >=
> op->replies_expected || op->replies >= active) {
> +            crm_notice("Couldn't find anyone to fence %s with %s", op->target,
> +                       (char *)op->devices->data);
> +        } else {
> +            crm_debug("Couldn't find verified device to fence %s with
> %s", op->target,
> +                       (char *)op->devices->data);
> +        }
>     } else {
>         crm_debug("Couldn't find anyone to fence %s", op->target);
>     }
> 
> 
>>>> I'm kind of swamped at the moment though.
>>>> 
>>>>> 
>>>>> Best Regards,
>>>>> Kazunori INOUE
>>>>> <stopped_resource_performed_reset.tar.bz2>_______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131115/f4c11807/attachment-0004.sig>