[Pacemaker] pacemaker-remote tls handshaking

Andrew Beekhof andrew at beekhof.net
Thu May 23 22:31:10 UTC 2013


On 24/05/2013, at 7:35 AM, Lindsay Todd <rltodd.ml1 at gmail.com> wrote:

> Working on this problem further...
> 
> On Tue, May 21, 2013 at 5:14 PM, David Vossel <dvossel at redhat.com> wrote:
>> I'd suggest this.  Try running the pacemaker_remote regression test and see what happens.  This will start up
>> an instance of pacemaker_remote locally and issue client commands to it to test both the TLS connection and
>> the ability to start/stop/monitor services.
>> 
>> /usr/share/pacemaker/tests/lrmd/regression.py  -R
> 
> But sadly SL 6.4 doesn't have the systemctl commands this is trying to
> use.

Baaad David :-)
These can safely be replaced with the more common "service $component $action"

>  (Also I am building RPMs and installing those, the lrmd
> regression tests aren't included in pacemaker-cts.  No problem, I ran
> directly from the build directory.)  It doesn't seem to make much
> progress.  The stdout is:
> 
>    sh: systemctl: command not found
>    sh: /lib/systemd/system/lrmd_dummy_daemon.service: No such file or directory
>    sh: systemctl: command not found
>    Starting ...
> 
> And the lrmd-regression.log has:
>    Set r/w permissions for uid=496, gid=494 on /tmp/lrmd-regression.log
>    May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: lrmd
>    May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted:   notice:
> lrmd_init_remote_tls_server:     Starting a tls listener on port 3121.
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: cib_ro
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: cib_rw
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: cib_shm
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: attrd
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: stonith-ng
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> qb_ipcs_us_publish:      server name: crmd
>    May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted:     info:
> main:    Starting
> 
> 
>> By default, the connection should retry for 60 seconds after the vm resource starts.  Like you've noticed, this
>> can be extended to account for vms that take longer to boot.
> 
> But maybe this should start after the monitor method for the VM first
> indicates success?  Or does it already?
> 
>>> There have been a few segfaults of crmd during my testing of this, so perhaps
>>> there is a memory smash somewhere. (A couple times the failure was at
>>> remote_lrmd_ra.c:186,
>> 
>> Please provide gdb backtrace.  We need to get this resolved asap before the release of v.1.1.10 is complete.
>> I believe there is a new rc in the works already.
> 
> So I've attached results from a few core dumps.  All were triggered
> using "crm resource cleanup swbuildsl6" where swbuildsl6 is the host
> name of the VM  (that I can still telnet to port 3121).
> 
>>>> I doubt this will make a difference, but here's the key I use during
>>>> testing,
>>>> lrmd:ce9db0bc3cec583d3b3bf38b0ac9ff91
> 
> It makes no difference.  I had wondered if the shorter key would matter.
> 
> Also, I've attached some patches I made to 1.1.10rc3 to try to resolve
> this problem.  So far no success.  Some of these add logging; the
> others are fix what look to me to be fishy code with cases that aren't
> completely handled.  With the additional logging, I see these results
> being logged:
> 
>    May 23 17:06:51 swbuildsl6 pacemaker_remoted[2326]:   notice:
> lrmd_remote_listen: LRMD client connection established. 0x995250 id:
> df04d8ee-7fcb-4025-8c8f-8a1555a4d097
>    May 23 17:06:53 cvmh02 crmd[18982]:  warning: lrmd_tcp_connect_cb:
> Client tls handshake failed for server swbuildsl6:3121. Disconnecting
>    May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]:    error:
> lrmd_remote_client_msg: Remote lrmd tls handshake failed: -9
>    May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]:   notice:
> lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> name: <unknown> id: df04d8ee-7fcb-4025-8c8f-8a1555a4d097
> 
> Puzzling -- nothing being logged from
> crm_initiate_client_tls_handshake -- is there something I need to add
> to somehow activate the crm_err and crm_info calls?
> 
> /rlt
> <corelog.txt><pacemaker-ccni.patch>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list