[Pacemaker] pacemaker-remote tls handshaking
Andrew Beekhof
andrew at beekhof.net
Thu May 23 22:31:10 UTC 2013
On 24/05/2013, at 7:35 AM, Lindsay Todd <rltodd.ml1 at gmail.com> wrote:
> Working on this problem further...
>
> On Tue, May 21, 2013 at 5:14 PM, David Vossel <dvossel at redhat.com> wrote:
>> I'd suggest this. Try running the pacemaker_remote regression test and see what happens. This will start up
>> an instance of pacemaker_remote locally and issue client commands to it to test both the TLS connection and
>> the ability to start/stop/monitor services.
>>
>> /usr/share/pacemaker/tests/lrmd/regression.py -R
>
> But sadly SL 6.4 doesn't have the systemctl commands this is trying to
> use.
Baaad David :-)
These can safely be replaced with the more common "service $component $action"
> (Also I am building RPMs and installing those, the lrmd
> regression tests aren't included in pacemaker-cts. No problem, I ran
> directly from the build directory.) It doesn't seem to make much
> progress. The stdout is:
>
> sh: systemctl: command not found
> sh: /lib/systemd/system/lrmd_dummy_daemon.service: No such file or directory
> sh: systemctl: command not found
> Starting ...
>
> And the lrmd-regression.log has:
> Set r/w permissions for uid=496, gid=494 on /tmp/lrmd-regression.log
> May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: lrmd
> May 23 15:14:39 [3610] swbuildsl6 pacemaker_remoted: notice:
> lrmd_init_remote_tls_server: Starting a tls listener on port 3121.
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: cib_ro
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: cib_rw
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: cib_shm
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: attrd
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: stonith-ng
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: crmd
> May 23 15:14:40 [3610] swbuildsl6 pacemaker_remoted: info:
> main: Starting
>
>
>> By default, the connection should retry for 60 seconds after the vm resource starts. Like you've noticed, this
>> can be extended to account for vms that take longer to boot.
>
> But maybe this should start after the monitor method for the VM first
> indicates success? Or does it already?
>
>>> There have been a few segfaults of crmd during my testing of this, so perhaps
>>> there is a memory smash somewhere. (A couple times the failure was at
>>> remote_lrmd_ra.c:186,
>>
>> Please provide gdb backtrace. We need to get this resolved asap before the release of v.1.1.10 is complete.
>> I believe there is a new rc in the works already.
>
> So I've attached results from a few core dumps. All were triggered
> using "crm resource cleanup swbuildsl6" where swbuildsl6 is the host
> name of the VM (that I can still telnet to port 3121).
>
>>>> I doubt this will make a difference, but here's the key I use during
>>>> testing,
>>>> lrmd:ce9db0bc3cec583d3b3bf38b0ac9ff91
>
> It makes no difference. I had wondered if the shorter key would matter.
>
> Also, I've attached some patches I made to 1.1.10rc3 to try to resolve
> this problem. So far no success. Some of these add logging; the
> others are fix what look to me to be fishy code with cases that aren't
> completely handled. With the additional logging, I see these results
> being logged:
>
> May 23 17:06:51 swbuildsl6 pacemaker_remoted[2326]: notice:
> lrmd_remote_listen: LRMD client connection established. 0x995250 id:
> df04d8ee-7fcb-4025-8c8f-8a1555a4d097
> May 23 17:06:53 cvmh02 crmd[18982]: warning: lrmd_tcp_connect_cb:
> Client tls handshake failed for server swbuildsl6:3121. Disconnecting
> May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]: error:
> lrmd_remote_client_msg: Remote lrmd tls handshake failed: -9
> May 23 17:06:52 swbuildsl6 pacemaker_remoted[2326]: notice:
> lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> name: <unknown> id: df04d8ee-7fcb-4025-8c8f-8a1555a4d097
>
> Puzzling -- nothing being logged from
> crm_initiate_client_tls_handshake -- is there something I need to add
> to somehow activate the crm_err and crm_info calls?
>
> /rlt
> <corelog.txt><pacemaker-ccni.patch>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list