[Pacemaker] Recovery after lost quorum
Andrew Beekhof
andrew at beekhof.net
Wed Jun 5 00:15:54 UTC 2013
On 05/06/2013, at 9:22 AM, Denis Witt <denis.witt at concepts-and-training.de> wrote:
>
> Am 05.06.2013 um 00:52 schrieb Andrew Beekhof <andrew at beekhof.net>:
>
>>> been restored the resources aren't restarted. Running crm_resource -P
>>> brings anything up, but of course it would be nice if this happens
>>> automatically. Is there any way to archive this?
>>
>> It should happen automatically.
>> Logs?
>
> Hi Andrew,
>
> thanks for your reply.
>
> Here are the logs:
>
[snip]
> Jun 5 01:11:06 test4 pengine: [18625]: WARN: cluster_status: We do not have quorum - fencing and resource management disabled
> Jun 5 01:11:06 test4 pengine: [18625]: notice: LogActions: Start pingtest:0#011(test4 - blocked)
> Jun 5 01:11:06 test4 pengine: [18625]: notice: LogActions: Start drbd:0#011(test4 - blocked)
Here's your reason. We didn't get quorum until:
> Jun 5 01:11:11 test4 crmd: [18626]: notice: ais_dispatch_message: Membership 128: quorum acquired
[snipp]
>
> Please notice that at the moment there are only two of the three nodes online, but quorum is established,
Actually not.
> as expected. Both nodes are running corosync and pacemaker, but the second node didn't have any of the configured resources (so i got "not installed" errors there, usually pacemaker is disabled on this node). The resources aren't started as well if pacemaker is disabled on this node (only corosync).
>
> analysis.txt from hb_report states:
>
> Log patterns:
> Jun 5 01:14:11 test4 crmd: [18626]: ERROR: crm_timer_popped: Integration Timer (I_INTEGRATED) just popped in state S_INTEGRATION! (180000ms)
>
> My config:
>
> node backup3 \
> attributes standby="off"
> node test3
> node test4
> primitive apache lsb:apache2 \
> op monitor interval="10" timeout="20" \
> meta target-role="Started"
> primitive drbd ocf:linbit:drbd \
> params drbd_resource="www_r0" \
> op monitor interval="10"
> primitive fs_drbd ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/var/www" fstype="ext4" \
> op monitor interval="5" \
> meta target-role="Started"
> primitive pingtest ocf:pacemaker:ping \
> params multiplier="1000" host_list="192.168.100.19" \
> op monitor interval="5"
> primitive sip ocf:heartbeat:IPaddr2 \
> params ip="192.168.100.30" nic="eth0" \
> op monitor interval="10" timeout="20" \
> meta target-role="Started"
> group grp_all sip fs_drbd apache
> ms ms_drbd drbd \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> clone clone_pingtest pingtest
> location loc_all_on_best_ping grp_all \
> rule $id="loc_all_on_best_ping-rule" -inf: not_defined pingd or pingd lt 1000
> colocation coloc_all_on_drbd inf: grp_all ms_drbd:Master
> order order_all_after_drbd inf: ms_drbd:promote grp_all:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="3" \
> no-quorum-policy="stop" \
> stonith-enabled="false" \
> last-lrm-refresh="1370360692" \
> default-resource-stickiness="100" \
> maintenance-mode="false"
>
> Best regards,
> Denis Witt
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list