[ClusterLabs] Pacemaker on-fail standby recovery does not start DRBD slave resource

Sam Gardner SGardner at trustwave.com
Wed Mar 30 17:20:37 UTC 2016


One other note: Manually standby-ing and unstandby-ing a node gives the
behavior I want (eg, after the node is unstandby-ed, the DRBDSlave
resource works).
--
Sam Gardner
Trustwave | SMART SECURITY ON DEMAND


On 3/30/16, 11:46 AM, "Ken Gaillot" <kgaillot at redhat.com> wrote:

>On 03/30/2016 11:20 AM, Sam Gardner wrote:
>> I have configured some network resources to automatically standby their
>>node if the system detects a failure on them. However, the DRBD slave
>>that I have configured does not automatically restart after the node is
>>"unstandby-ed" after the failure-timeout expires.
>> Is there any way to make the "stopped" DRBDSlave resource automatically
>>start again once the node is recovered?
>>
>> See the  progression of events below:
>>
>> Running cluster:
>> Wed Mar 30 16:04:20 UTC 2016
>> Cluster name:
>> Last updated: Wed Mar 30 16:04:20 2016
>> Last change: Wed Mar 30 16:03:24 2016
>> Stack: classic openais (with plugin)
>> Current DC:
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom - partition with quorum
>> Version: 1.1.12-561c4cf
>> 2 Nodes configured, 2 expected votes
>> 7 Resources configured
>>
>>
>> Online: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>
>> Full list of resources:
>>
>>  Resource Group: network
>>      inif       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>      outif      (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>      dmz1       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>  Master/Slave Set: DRBDMaster [DRBDSlave]
>>      Masters: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom ]
>>      Slaves: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>  Resource Group: filesystem
>>      DRBDFS     (ocf::heartbeat:Filesystem):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>  Resource Group: application
>>      service_failover   (ocf::custom:service_failover):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>
>>
>> version: 8.4.5 (api:1/proto:86-101)
>> srcversion: 315FB2BBD4B521D13C20BF4
>>
>>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>>     ns:4 nr:0 dw:4 dr:757 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> [153766.565352] block drbd1: send bitmap stats [Bytes(packets)]: plain
>>0(0), RLE 21(1), total 21; compression: 100.0%
>> [153766.568303] block drbd1: receive bitmap stats [Bytes(packets)]:
>>plain 0(0), RLE 21(1), total 21; compression: 100.0%
>> [153766.568316] block drbd1: helper command: /sbin/drbdadm
>>before-resync-source minor-1
>> [153766.568356] block drbd1: helper command: /sbin/drbdadm
>>before-resync-source minor-1 exit code 255 (0xfffffffe)
>> [153766.568363] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk(
>>Consistent -> Inconsistent )
>> [153766.568374] block drbd1: Began resync as SyncSource (will sync 4 KB
>>[1 bits set]).
>> [153766.568444] block drbd1: updated sync UUID
>>B0DA745C79C56591:36E0631B6F022952:36DF631B6F022952:133127197CF097C6
>> [153766.577695] block drbd1: Resync done (total 1 sec; paused 0 sec; 4
>>K/sec)
>> [153766.577700] block drbd1: updated UUIDs
>>B0DA745C79C56591:0000000000000000:36E0631B6F022952:36DF631B6F022952
>> [153766.577705] block drbd1: conn( SyncSource -> Connected ) pdsk(
>>Inconsistent -> UpToDate )¯
>>
>> Failure detected:
>> Wed Mar 30 16:08:22 UTC 2016
>> Cluster name:
>> Last updated: Wed Mar 30 16:08:22 2016
>> Last change: Wed Mar 30 16:03:24 2016
>> Stack: classic openais (with plugin)
>> Current DC:
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom - partition with quorum
>> Version: 1.1.12-561c4cf
>> 2 Nodes configured, 2 expected votes
>> 7 Resources configured
>>
>>
>> Node ha-d1.tw.com: standby (on-fail)
>> Online: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>
>> Full list of resources:
>>
>>  Resource Group: network
>>      inif       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>      outif      (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>      dmz1       (ocf::custom:ip.sh):       FAILED
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>  Master/Slave Set: DRBDMaster [DRBDSlave]
>>      Masters: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom ]
>>      Slaves: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>  Resource Group: filesystem
>>      DRBDFS     (ocf::heartbeat:Filesystem):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>  Resource Group: application
>>      service_failover   (ocf::custom:service_failover):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>
>> Failed actions:
>>     dmz1_monitor_7000 on
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom 'not running' (7):
>>call=156, status=complete, last-rc-change='Wed Mar 30 16:08:19 2016',
>>queued=0ms, exec=0ms
>>
>>
>>
>> version: 8.4.5 (api:1/proto:86-101)
>> srcversion: 315FB2BBD4B521D13C20BF4
>>
>>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
>>     ns:4 nr:0 dw:4 dr:765 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>> [153766.568356] block drbd1: helper command: /sbin/drbdadm
>>before-resync-source minor-1 exit code 255 (0xfffffffe)
>> [153766.568363] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk(
>>Consistent -> Inconsistent )
>> [153766.568374] block drbd1: Began resync as SyncSource (will sync 4 KB
>>[1 bits set]).
>> [153766.568444] block drbd1: updated sync UUID
>>B0DA745C79C56591:36E0631B6F022952:36DF631B6F022952:133127197CF097C6
>> [153766.577695] block drbd1: Resync done (total 1 sec; paused 0 sec; 4
>>K/sec)
>> [153766.577700] block drbd1: updated UUIDs
>>B0DA745C79C56591:0000000000000000:36E0631B6F022952:36DF631B6F022952
>> [153766.577705] block drbd1: conn( SyncSource -> Connected ) pdsk(
>>Inconsistent -> UpToDate )
>> [154057.455270] e1000: eth2 NIC Link is Down
>> [154057.455451] e1000 0000:02:02.0 eth2: Reset adapter
>>
>> Failover complete:
>> Wed Mar 30 16:09:02 UTC 2016
>> Cluster name:
>> Last updated: Wed Mar 30 16:09:02 2016
>> Last change: Wed Mar 30 16:03:24 2016
>> Stack: classic openais (with plugin)
>> Current DC:
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom - partition with quorum
>> Version: 1.1.12-561c4cf
>> 2 Nodes configured, 2 expected votes
>> 7 Resources configured
>>
>>
>> Node ha-d1.tw.com: standby (on-fail)
>> Online: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>
>> Full list of resources:
>>
>>  Resource Group: network
>>      inif       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>      outif      (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>      dmz1       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>  Master/Slave Set: DRBDMaster [DRBDSlave]
>>      Masters: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>      Stopped: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom ]
>>  Resource Group: filesystem
>>      DRBDFS     (ocf::heartbeat:Filesystem):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>  Resource Group: application
>>      service_failover   (ocf::custom:service_failover):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>
>> Failed actions:
>>     dmz1_monitor_7000 on
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom 'not running' (7):
>>call=156, status=complete, last-rc-change='Wed Mar 30 16:08:19 2016',
>>queued=0ms, exec=0ms
>>
>>
>>
>> version: 8.4.5 (api:1/proto:86-101)
>> srcversion: 315FB2BBD4B521D13C20BF4
>> [154094.894524] drbd wwwdata: conn( Disconnecting -> StandAlone )
>> [154094.894525] drbd wwwdata: receiver terminated
>> [154094.894527] drbd wwwdata: Terminating drbd_r_wwwdata
>> [154094.894559] block drbd1: disk( UpToDate -> Failed )
>> [154094.894569] block drbd1: bitmap WRITE of 0 pages took 0 jiffies
>> [154094.894571] block drbd1: 4 KB (1 bits) marked out-of-sync by on
>>disk bit-map.
>> [154094.894574] block drbd1: disk( Failed -> Diskless )
>> [154094.894647] block drbd1: drbd_bm_resize called with capacity == 0
>> [154094.894652] drbd wwwdata: Terminating drbd_w_wwwdata
>>
>> Standby node recovered, with DRBDSlave stopped (I want DRBDSlave
>>started here):
>> Wed Mar 30 16:13:01 UTC 2016
>> Cluster name:
>> Last updated: Wed Mar 30 16:13:01 2016
>> Last change: Wed Mar 30 16:03:24 2016
>> Stack: classic openais (with plugin)
>> Current DC:
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom - partition with quorum
>> Version: 1.1.12-561c4cf
>> 2 Nodes configured, 2 expected votes
>> 7 Resources configured
>>
>>
>> Online: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>
>> Full list of resources:
>>
>>  Resource Group: network
>>      inif       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>      outif      (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>      dmz1       (ocf::custom:ip.sh):       Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>  Master/Slave Set: DRBDMaster [DRBDSlave]
>>      Masters: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>      Stopped: [
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom ]
>>  Resource Group: filesystem
>>      DRBDFS     (ocf::heartbeat:Filesystem):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>  Resource Group: application
>>      service_failover   (ocf::custom:service_failover):    Started
>>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom
>>
>>
>> version: 8.4.5 (api:1/proto:86-101)
>> srcversion: 315FB2BBD4B521D13C20BF4
>> [154094.894574] block drbd1: disk( Failed -> Diskless )
>> [154094.894647] block drbd1: drbd_bm_resize called with capacity == 0
>> [154094.894652] drbd wwwdata: Terminating drbd_w_wwwdata
>>
>> --
>> Sam Gardner
>> Trustwave | SMART SECURITY ON DEMAND
>
>This might be a bug. A crm_report covering a few minutes around when the
>failure expires might help.
>
>Does the slave start after the next cluster-recheck-interval?
>
>_______________________________________________
>Users mailing list: Users at clusterlabs.org
>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIhA
>U6Q2h8kw&s=5&u=http%3a%2f%2fclusterlabs%2eorg%2fmailman%2flistinfo%2fusers
>
>Project Home:
>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIhA
>c4QWh4xw&s=5&u=http%3a%2f%2fwww%2eclusterlabs%2eorg
>Getting started:
>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIhA
>RtSDR7kg&s=5&u=http%3a%2f%2fwww%2eclusterlabs%2eorg%2fdoc%2fCluster%5ffrom
>%5fScratch%2epdf
>Bugs:
>http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIhA
>Y9RmB8xg&s=5&u=http%3a%2f%2fbugs%2eclusterlabs%2eorg


________________________________

This transmission may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.




More information about the Users mailing list