[Pacemaker] pacemaker-remote tls handshaking
David Vossel
dvossel at redhat.com
Fri May 24 18:06:35 UTC 2013
----- Original Message -----
> From: "David Vossel" <dvossel at redhat.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, May 23, 2013 11:21:33 PM
> Subject: Re: [Pacemaker] pacemaker-remote tls handshaking
>
> ----- Original Message -----
> > From: "Lindsay Todd" <rltodd.ml1 at gmail.com>
> > To: "The Pacemaker cluster resource manager"
> > <pacemaker at oss.clusterlabs.org>
> > Sent: Thursday, May 23, 2013 4:35:02 PM
> > Subject: Re: [Pacemaker] pacemaker-remote tls handshaking
> >
> > Working on this problem further...
> >
> > On Tue, May 21, 2013 at 5:14 PM, David Vossel <dvossel at redhat.com> wrote:
> > > I'd suggest this. Try running the pacemaker_remote regression test and
> > > see
> > > what happens. This will start up
> > > an instance of pacemaker_remote locally and issue client commands to it
> > > to
> > > test both the TLS connection and
> > > the ability to start/stop/monitor services.
> > >
> > > /usr/share/pacemaker/tests/lrmd/regression.py -R
> >
> > But sadly SL 6.4 doesn't have the systemctl commands this is trying to
>
> oops
>
> > use. (Also I am building RPMs and installing those, the lrmd
> > regression tests aren't included in pacemaker-cts.
>
> another oops
>
> > No problem, I ran
> > directly from the build directory.) It doesn't seem to make much
> > progress. The stdout is:
> >
> > sh: systemctl: command not found
> > sh: /lib/systemd/system/lrmd_dummy_daemon.service: No such file or
> > directory
> > sh: systemctl: command not found
> > Starting ...
> >
> > And the lrmd-regression.log has:
> > Set r/w permissions for uid=496, gid=494 on /tmp/lrmd-regression.log
> > May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: lrmd
> > May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted: notice:
> > lrmd_init_remote_tls_server: Starting a tls listener on port 3121.
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: cib_ro
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: cib_rw
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: cib_shm
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: attrd
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: stonith-ng
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > qb_ipcs_us_publish: server name: crmd
> > May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> > main: Starting
> >
> >
> > > By default, the connection should retry for 60 seconds after the vm
> > > resource starts. Like you've noticed, this
> > > can be extended to account for vms that take longer to boot.
> >
> > But maybe this should start after the monitor method for the VM first
> > indicates success? Or does it already?
>
> The policy engine has no way of expressing this right now. It would be
> difficult to make this happen. Likely your idea of additional start scripts
> to verify when the VM's network is actually available would be a better
> choice.
>
> >
> > >> There have been a few segfaults of crmd during my testing of this, so
> > >> perhaps
> > >> there is a memory smash somewhere. (A couple times the failure was at
> > >> remote_lrmd_ra.c:186,
> > >
> > > Please provide gdb backtrace. We need to get this resolved asap before
> > > the
> > > release of v.1.1.10 is complete.
> > > I believe there is a new rc in the works already.
> >
> > So I've attached results from a few core dumps. All were triggered
> > using "crm resource cleanup swbuildsl6" where swbuildsl6 is the host
> > name of the VM (that I can still telnet to port 3121).
>
> thanks :)
>
> > >> > I doubt this will make a difference, but here's the key I use during
> > >> > testing,
> > >> > lrmd:ce9db0bc3cec583d3b3bf38b0ac9ff91
> >
> > It makes no difference. I had wondered if the shorter key would matter.
> >
> > Also, I've attached some patches I made to 1.1.10rc3 to try to resolve
> > this problem. So far no success. Some of these add logging; the
> > others are fix what look to me to be fishy code with cases that aren't
> > completely handled. With the additional logging, I see these results
> > being logged:
> >
> > May 23 17:06:51 swbuildsl6 pacemaker_remoted[2326]: notice:
> > lrmd_remote_listen: LRMD client connection established. 0x995250 id:
> > df04d8ee-7fcb-4025-8c8f-8a1555a4d097
> > May 23 17:06:53 cvmh02 crmd[18982]: warning: lrmd_tcp_connect_cb:
> > Client tls handshake failed for server swbuildsl6:3121. Disconnecting
> > May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]: error:
> > lrmd_remote_client_msg: Remote lrmd tls handshake failed: -9
> > May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]: notice:
> > lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> > name: <unknown> id: df04d8ee-7fcb-4025-8c8f-8a1555a4d097
> >
> > Puzzling -- nothing being logged from
> > crm_initiate_client_tls_handshake -- is there something I need to add
> > to somehow activate the crm_err and crm_info calls?
>
> Well, you've definitely gotten my attention. I tried this on my rhel 6 box
> and sure enough, I'm seeing the exact same thing you're seeing. No worries.
> I'll track this down. I'm sure it has to do with the gnutls version being
> used.
I figured it out. It's a gnutls bug I believe. The old gnutls library version doesn't like the way I'm setting the psk credentials (which makes the handshake fail) I have a work-around I'm implementing now. I'll have a patch by Tuesday.
-- Vossel
>
> In the mean time, if you want to test this feature, it does work in Fedora
> 18. Thanks for all your work on testing this. You're feedback came just in
> time. We are about to release 1.1.10 soon :)
>
> -- Vossel
>
> > /rlt
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list