[ClusterLabs] Antw: Growing a cluster from 1 node without fencing
Edwin Török
edvin.torok at citrix.com
Mon Aug 14 15:12:25 CEST 2017
On 14/08/17 13:46, Klaus Wenninger wrote:
> How does your /etc/sysconfig/sbd look like?
> With just that pcs-command you get some default-config with
> watchdog-only-support.
It currently looks like this:
SBD_DELAY_START=no
SBD_OPTS="-n cluster1"
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5
> Without cluster-property stonith-watchdog-timeout set to a
> value matching (twice is a good choice) the watchdog-timeout
> configured in /etc/sysconfig/sbd (default = 5s) a node will never
> assume the unseen partner as fenced.
> Anyway watchdog-only-sbd is of very limited use in 2-node
> scenarios. Kind of limits the availability to the one of the node
> that would win the tie_breaker-game. But might still be useful
> in certain scenarios of course. (like load-sharing ...)
Good point.
> On 08/14/2017 12:20 PM, Ulrich Windl wrote:
>> Hi!
>>
>> Have you tried studying the logs? Usually you get useful information from
>> there (to share!).
Here is journalctl and pacemaker.log output:
Aug 14 08:57:26 cluster1 crmd[2221]: notice: Result of start operation
for dlm on cluster1: 0 (ok)
Aug 14 08:57:26 cluster1 sbd[2202]: pcmk: info:
set_servant_health: Node state: online
Aug 14 08:57:26 cluster1 sbd[2202]: pcmk: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:26 cluster1 sbd[2199]: notice: inquisitor_child: Servant
pcmk is healthy (age: 0)
Aug 14 08:57:26 cluster1 sbd[2199]: notice: inquisitor_child: Active
cluster detected
Aug 14 08:57:26 cluster1 crmd[2221]: notice: Initiating monitor
operation dlm:0_monitor_30000 locally on cluster1
Aug 14 08:57:26 cluster1 crmd[2221]: notice: Transition 0 (Complete=5,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-44.bz2): Complete
Aug 14 08:57:26 cluster1 crmd[2221]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE
Aug 14 08:57:27 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:27 cluster1 sbd[2202]: pcmk: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:28 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:28 cluster1 sbd[2202]: pcmk: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:28 cluster1 sbd[2202]: pcmk: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:29 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:29 cluster1 sbd[2202]: pcmk: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:30 cluster1 corosync[2208]: [CFG ] Config reload
requested by node 1
Aug 14 08:57:30 cluster1 corosync[2208]: [TOTEM ] adding new UDPU
member {10.71.77.147}
Aug 14 08:57:30 cluster1 corosync[2208]: [QUORUM] This node is within
the non-primary component and will NOT provide any services.
Aug 14 08:57:30 cluster1 corosync[2208]: [QUORUM] Members[1]: 1
Aug 14 08:57:30 cluster1 crmd[2221]: warning: Quorum lost
Aug 14 08:57:30 cluster1 pacemakerd[2215]: warning: Quorum lost
^^^^^^^^^ Looks unexpected
Aug 14 08:57:30 cluster1 sbd[2202]: pcmk: info:
set_servant_health: Quorum lost: Ignore
Aug 14 08:57:30 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:30 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:30 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:31 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:31 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:32 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:32 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:32 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:33 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:33 cluster1 sbd[2199]: warning: inquisitor_child: Servant
pcmk is outdated (age: 4)
Aug 14 08:57:33 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:34 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:34 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:35 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:35 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Aug 14 08:57:36 cluster1 sbd[2203]: cluster: info: notify_parent:
Notifying parent: healthy
Aug 14 08:57:36 cluster1 sbd[2199]: warning: inquisitor_child: Latency:
No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
Aug 14 08:57:36 cluster1 sbd[2202]: pcmk: info: notify_parent:
Not notifying parent: state transient (2)
Thanks,
--Edwin
More information about the Users
mailing list