[Pacemaker] pcmk_shutdown: Still waiting for crmd

Wed Dec 7 10:06:14 UTC 2011

On 12/07/2011 10:27 AM, Erik Schwalbe wrote:
> Hi,
> 
> I built a test cluster with 2 nodes.
> Ubuntu 10.4.3 LTS with *ppa:ubuntu-ha-maintainers/ppa*
> 
> corosync 1.4.2
> pacemaker 1.1.6
> 
> primitive clvm ocf:lvm2:clvmd \
>         params daemon_timeout="30" \
>         operations $id="clvm-operations" \
>         op start interval="0" timeout="90" \
>         op stop interval="0" timeout="100" \
>         op monitor interval="0" timeout="20" start-delay="0" \
>         meta target-role="started"
> primitive data ocf:heartbeat:LVM \
>         params volgrpname="data" \
>         operations $id="data-operations" \
>         op start interval="0" timeout="30" \
>         op stop interval="0" timeout="30" \
>         op monitor interval="10" timeout="120" start-delay="0" \
>         op methods interval="0" timeout="5" \
>         meta target-role="started"
> primitive dlm ocf:pacemaker:controld \
>         operations $id="dlm-operations" \
>         op start interval="0" timeout="90" \
>         op stop interval="0" timeout="100" \
>         op monitor interval="10" timeout="20" start-delay="0" \
>         meta target-role="started"
> primitive fs ocf:heartbeat:Filesystem \
>         params device="/dev/data/test" directory="/data/test"
> fstype="ocfs2" \
>         operations $id="fs-operations" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60" \
>         op monitor interval="120" timeout="40" start-delay="0" \
>         op notify interval="0" timeout="60" \
>         meta target-role="started"
> primitive o2cb ocf:pacemaker:o2cb \
>         operations $id="o2cb-operations" \
>         op start interval="0" timeout="90" \
>         op stop interval="0" timeout="100" \
>         op monitor interval="0" timeout="20" start-delay="0" \
>         meta target-role="started"
> primitive res_DRBD ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         operations $id="res_DRBD-operations" \
>         op start interval="0" timeout="240" \
>         op promote interval="0" timeout="90" \
>         op demote interval="0" timeout="90" \
>         op stop interval="0" timeout="100" \
>         op monitor interval="30" timeout="20" start-delay="1min" \
>         op notify interval="0" timeout="90" \
>         meta target-role="started"
> group dlm-clvm dlm clvm
> ms ms_DRBD res_DRBD \
>         meta master-max="2" clone-max="2" notify="true" interleave="true"
> clone clone_data data \
>         meta clone-max="2" ordered="true" interleave="true"
> clone dlm-clvm-clone dlm-clvm \
>         meta interleave="true" ordered="true"
> clone fs-clone fs \
>         meta clone-max="2" ordered="true" interleave="true"
> clone o2cb-clone o2cb \
>         meta clone-max="2" interleave="true"
> colocation col_data_clvm-dlm-clone inf: clone_data dlm-clvm-clone
> colocation col_fs_o2cb inf: fs-clone o2cb-clone
> colocation col_ms_DRBD_dlm-clvm-clone inf: dlm-clvm-clone ms_DRBD:Master
> colocation col_o2cb_dlm-clvm inf: o2cb-clone dlm-clvm-clone
> order ord_data_after_clvm-dlm-clone inf: dlm-clvm-clone clone_data
> order ord_ms_DRBD_dlm-clvm-clone inf: ms_DRBD:promote dlm-clvm-clone:start
> order ord_o2cb_after_dlm-clvm 0: dlm-clvm-clone o2cb-clone
> order ord_o2cb_fs inf: o2cb-clone fs-clone
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1323246238" \
>         default-resource-stickiness="1000"
> 
> The problem is to restart corosync or to reboot a cluster node. All
> resources are stopped except for drbd resource. Than the system hangs
> for a long time.

Is there a timeout on stopping/demoteing DRBD and do you see kernel
messages from DRBD about being unable to demote because in use ... or is
there never an attempt to demote it?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> corosync.log:
> 
> ubuntu0 crmd: [926]: info: do_state_transition: (Re)Issuing shutdown
> request now that we are the DC
> ubuntu0 crmd: [926]: info: do_state_transition: Starting PEngine Recheck
> Timer
> ubuntu0 crmd: [926]: info: do_shutdown_req: Sending shutdown request to
> DC: ubuntu0
> ubuntu0 crmd: [926]: info: handle_shutdown_request: Creating shutdown
> request for ubuntu0 (state=S_IDLE)
> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
> (pid=926, seq=6) to terminate...
> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
> (pid=926, seq=6) to terminate...
> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
> (pid=926, seq=6) to terminate...
> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
> (pid=926, seq=6) to terminate...
> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
> (pid=926, seq=6) to terminate...
> 
> I tested the same config with a debian 6.0.3. The reboot works. The
> behaviour there is, that in the first step the drbd resource demote to
> secondary and then goes down.
> 
> Is this a known problem??
> 
> Thank you for help.
> 
> Regards,
> Erik 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111207/80bcef28/attachment-0004.sig>