[Pacemaker] Question about the resource to fence a node

Kazunori INOUE kazunori.inoue3 at gmail.com
Thu Nov 14 01:53:45 EST 2013


Hi, Andrew

2013/11/13 Kazunori INOUE <kazunori.inoue3 at gmail.com>:
> 2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
>>
>> On 16 Oct 2013, at 8:51 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>>>
>>> On 15/10/2013, at 8:24 PM, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using pacemaker-1.1 (the latest devel).
>>>> I started resource (f1 and f2) which fence vm3 on vm1.
>>>>
>>>> $ crm_mon -1
>>>> Last updated: Tue Oct 15 15:16:37 2013
>>>> Last change: Tue Oct 15 15:16:21 2013 via crmd on vm1
>>>> Stack: corosync
>>>> Current DC: vm1 (3232261517) - partition with quorum
>>>> Version: 1.1.11-0.284.6a5e863.git.el6-6a5e863
>>>> 3 Nodes configured
>>>> 3 Resources configured
>>>>
>>>> Online: [ vm1 vm2 vm3 ]
>>>>
>>>> pDummy (ocf::pacemaker:Dummy): Started vm3
>>>> Resource Group: gStonith3
>>>>    f1 (stonith:external/libvirt):     Started vm1
>>>>    f2 (stonith:external/ssh): Started vm1
>>>>
>>>>
>>>> "reset" of f1 which hasn't been started on vm2 was performed when vm3 is fenced.
>>>>
>>>> $ ssh vm3 'rm -f /var/run/Dummy-pDummy.state'
>>>> $ for i in vm1 vm2; do ssh $i 'hostname; egrep " reset | off "
>>>> /var/log/ha-log'; done
>>>> vm1
>>>> Oct 15 15:17:35 vm1 stonith-ng[14870]:  warning: log_operation:
>>>> f2:15076 [ Performing: stonith -t external/ssh -T reset vm3 ]
>>>> Oct 15 15:18:06 vm1 stonith-ng[14870]:  warning: log_operation:
>>>> f2:15464 [ Performing: stonith -t external/ssh -T reset vm3 ]
>>>> vm2
>>>> Oct 15 15:17:16 vm2 stonith-ng[9160]:  warning: log_operation: f1:9273
>>>> [ Performing: stonith -t external/libvirt -T reset vm3 ]
>>>> Oct 15 15:17:46 vm2 stonith-ng[9160]:  warning: log_operation: f1:9588
>>>> [ Performing: stonith -t external/libvirt -T reset vm3 ]
>>>>
>>>> Is it specifications?
>>>
>>> Yes, although the host on which the device is started usually gets priority.
>>> I will try to find some time to look through the report to see why this didn't happen.
>>
>> Reading through this again, it sounds like it should be fixed by your earlier pull request:
>>
>>    https://github.com/beekhof/pacemaker/commit/6b4bfd6
>>
>> Yes?
>
> No.

How is this change?

diff --git a/fencing/remote.c b/fencing/remote.c
index 6c11ba9..68b31c5 100644
--- a/fencing/remote.c
+++ b/fencing/remote.c
@@ -778,6 +778,7 @@ stonith_choose_peer(remote_fencing_op_t * op)
 {
     st_query_result_t *peer = NULL;
     const char *device = NULL;
+    uint32_t active = fencing_active_peers();

     do {
         if (op->devices) {
@@ -790,7 +791,8 @@ stonith_choose_peer(remote_fencing_op_t * op)

         if ((peer = find_best_peer(device, op, FIND_PEER_SKIP_TARGET
| FIND_PEER_VERIFIED_ONLY))) {
             return peer;
-        } else if ((peer = find_best_peer(device, op,
FIND_PEER_SKIP_TARGET))) {
+        } else if ((op->query_timer == 0 || op->replies >=
op->replies_expected || op->replies >= active)
+                   && (peer = find_best_peer(device, op,
FIND_PEER_SKIP_TARGET))) {
             return peer;
         } else if ((peer = find_best_peer(device, op,
FIND_PEER_TARGET_ONLY))) {
             return peer;
@@ -801,8 +803,13 @@ stonith_choose_peer(remote_fencing_op_t * op)
              && stonith_topology_next(op) == pcmk_ok);

     if (op->devices) {
-        crm_notice("Couldn't find anyone to fence %s with %s", op->target,
-                   (char *)op->devices->data);
+        if (op->query_timer == 0 || op->replies >=
op->replies_expected || op->replies >= active) {
+            crm_notice("Couldn't find anyone to fence %s with %s", op->target,
+                       (char *)op->devices->data);
+        } else {
+            crm_debug("Couldn't find verified device to fence %s with
%s", op->target,
+                       (char *)op->devices->data);
+        }
     } else {
         crm_debug("Couldn't find anyone to fence %s", op->target);
     }


>>> I'm kind of swamped at the moment though.
>>>
>>>>
>>>> Best Regards,
>>>> Kazunori INOUE
>>>> <stopped_resource_performed_reset.tar.bz2>_______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list