Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS without cstruct #497

Merged
merged 22 commits into from
Aug 20, 2024
Merged

TLS without cstruct #497

merged 22 commits into from
Aug 20, 2024

Conversation

dinosaure
Copy link
Contributor

#493 rebased to include #495 & #496 plus an update of tls-async. The only remaining package about the no-cstruct move is tls-eio.

@art-w
Copy link
Contributor

art-w commented Jul 19, 2024

In case it helps, this patch 05b203e should provide the missing tls-eio :)

@hannesm
Copy link
Member

hannesm commented Jul 21, 2024

this is great. I guess we need to rebase and wait for the x509 release. (plus then adapt lower bounds)

@art-w
Copy link
Contributor

art-w commented Jul 23, 2024

Nice rebase! The eio patch and a fix for the latest X509 breaking change (renaming of Authenticator functions) can be found at https://github.com/art-w/ocaml-tls/commits/no-cstruct/

@hannesm
Copy link
Member

hannesm commented Jul 24, 2024

@dinosaure would you mind to pick @art-w commits here?

let ( let* ) = Result.bind

let guard p e = if p then Ok () else Error e

let split_str ?(start = 0) str off =
String.sub str start off,
String.sub str (start + off) (String.length str - off - start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's likely I introduced this function, but from our learnings in MirageVPN, I guess we should re-visit the allocation and ownership stuff in this PR. The Strong.sub is super-expensive, and we should try to allocate the least possible.

This doesn't need to be in this PR, but before a release. We should as well benchmark the amount of handshakes and throughput before and after this change (with varying block sizes in send()). Not sure whether we have a suitable executable for this right now. Best to compare to OpenSSL on the same machine.

buf
let buf = Bytes.create 8 in
Bytes.set_int64_be buf 0 seq ;
Bytes.unsafe_to_string buf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocation-wise likely a bad idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? We should use Bytes.to_string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the entire function and allocation discipline should be different. better to allocate a bytes once, and then filling it & passing offsets along. this requires, as mentioned, a pretty deep rewriting -- which brings the suggestion to first work on benchmarks (for the latest release - e2e, maybe similar to what @reynir worked on for MirageVPN -- but would be great if cstruct allocation could be part of the benchmark), to be able to compare with the changes in here and future changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to allocate a bytes once

Not sure to understand again, the content of this buffer actually depends on the given argument (seq). We probably can pre-allocate a set of sequence_buf and use them then but not sure this is what you want. If I think in this way, it's related to the multicore usage than can be done with ocaml-tls. In such context, "allocate a bytes once" (globally? per-domain?) can be risky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry for the misunderstanding. my intention is: we get some data to be sent out. now, instead of allocating a header for crypto, a header for tls, etc. - let's allocate once a buffer where the encrypted payload together with the protocol header has space. thus we won't need any other allocations, neither a concat or append. this is all local to a send_data operation, not global/per domain.

nothing for this pr, though.

in
Uncommon.Cs.xor nonce s
Uncommon.xor nonce s
Copy link
Member

@hannesm hannesm Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, likely better without allocations, and xor_into

@@ -424,9 +427,9 @@ let answer_server_finished state (session : session_data) client_verify fin log
let computed =
Handshake_crypto.finished (state_version state) session.ciphersuite session.common_session_data.master_secret "server finished" log
in
let* () = guard (Cstruct.equal computed fin) (`Fatal `BadFinished) in
let* () = guard (String.equal computed fin) (`Fatal `BadFinished) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need eqaf here?

lib/handshake_client.ml Outdated Show resolved Hide resolved
let piece = Cstruct.sub sh.server_random 24 8 in
let* () = guard (not (Cstruct.equal Packet.downgrade12 piece)) (`Fatal `Downgrade12) in
guard (not (Cstruct.equal Packet.downgrade11 piece)) (`Fatal `Downgrade11)
let piece = String.sub sh.server_random 24 8 in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocates, maybe we have a function in eqaf or define our own to have off and len params for eq?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a try about implementations of String.sub_equal (which does not guarantee the constant time as eqaf). My benchmark is available here: https://gist.github.com/dinosaure/7ef54e9431217435e3b81df890192b72. From my results (on AMD), the "naive" implementation (the Oracle one) is fast enough for the worst case. For readability, I will consider this implementation for ocaml-tls (into the lib/utils.ml file) than the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, and according to my benchmark, it seems that an equal_sub which uses String.sub or an equal_sub which tries to iterate into given strings have the same performance. My conclusion is: even if we do a String.sub here, try to avoid this String.sub and try to directly read bytes and compare them is not faster than the String.sub...

lib/handshake_client.ml Outdated Show resolved Hide resolved
| Some idx, Some (psk, _epoch) ->
let* () = guard (idx = 0) (`Fatal `InvalidServerHello) in
Ok (psk.secret, true)
in
let early_secret = Handshake_crypto13.(derive (empty cipher) psk) in
let hs_secret = Handshake_crypto13.derive early_secret shared in
let log = log <+> raw in
let log = log ^ raw in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I remember, we should be able to use a hash state since we never need the entire log data. this is for sure a separate PR.

lib/handshake_common.ml Outdated Show resolved Hide resolved
let s_mac, rt1 = Core.split_str rt0 mac in
let c_key, rt2 = Core.split_str rt1 key in
let s_key, rt3 = Core.split_str rt2 key in
let c_iv , s_iv = Core.split_str rt3 iv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 10 string.sub. we can easily have fewer by using the offset and string.sub directly

lib/handshake_crypto13.ml Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Jul 24, 2024

I think this is already pretty big, and we should work towards merging it. From my point of view, what is important:

  • a benchmark comparing tls_version, ciphersuite, block size of write() calls of latest release and this PR (as a separate PR, to be merged before this PR
  • have as well a script for that matrix using OpenSSL

Then we can compare and merge this PR and prepare performance improvements in subsequent changesets (e.g. allocate less). We as well have these decrypt/encrypt_into functions in mirage-crypto that we should use.

If someone would be eager to develop the benchmark, that'd speed up our release schedule:)

tls.opam Outdated Show resolved Hide resolved
tls.opam Outdated Show resolved Hide resolved
tls-eio.opam Outdated Show resolved Hide resolved
tls-eio.opam Outdated Show resolved Hide resolved
tls-lwt.opam Outdated Show resolved Hide resolved
tls.opam Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Jul 24, 2024

I don't understand the tls-eio mdx failures... I see two options: remove it or someone who's into that stuff proposing a fix /cc @talex5 (who developed these bits)

@talex5
Copy link
Contributor

talex5 commented Jul 25, 2024

I don't understand the tls-eio mdx failures

It doesn't seem to have much to do with tls-eio or mdx; just trying to load tls in utop no longer works:

$ utop
utop # #require "tls";;
Error: Reference to undefined compilation unit `Digestif'
Hint: This means that the interface of a module is loaded, but its implementation is not.
      Did you mean to load a compiled implementation of the module 
      using #load or by passing it as an argument to the toplevel?

@hannesm
Copy link
Member

hannesm commented Jul 27, 2024

I don't understand the tls-eio mdx failures

It doesn't seem to have much to do with tls-eio or mdx; just trying to load tls in utop no longer works:

$ utop
utop # #require "tls";;
Error: Reference to undefined compilation unit `Digestif'
Hint: This means that the interface of a module is loaded, but its implementation is not.
      Did you mean to load a compiled implementation of the module 
      using #load or by passing it as an argument to the toplevel?

thanks for your investigation. @dinosaure is it known and is there a workaround for digestif not being loadable into ocaml toplevel (due to virtual library?)?

@dinosaure
Copy link
Contributor Author

thanks for your investigation. @dinosaure is it known and is there a workaround for digestif not being loadable into ocaml toplevel (due to virtual library?)?

You need to load an implementation of digestif. digestif.c or digestif.ocaml. But in my opinion, maybe we should reconsider mdx in the tests. I've already seen mdx related errors in the release process of some libraries and it's a burden that shouldn't fall on the ocaml-tls maintainers.

@hannesm
Copy link
Member

hannesm commented Aug 20, 2024

force-pushed on top of main (#500 was merged), and fixed speed.ml

@hannesm hannesm merged commit f89d107 into mirleft:main Aug 20, 2024
1 check failed
hannesm added a commit to hannesm/opam-repository that referenced this pull request Aug 21, 2024
CHANGES:

* API breaking change: remove usage of Cstruct.t inside of TLS, use bytes
  and string instead (mirleft/ocaml-tls#497 by @art-w, @hannesm, @dinosaure, @reynir)
  Performance is up to 3x improved (bandwidth), 2x improvement for handshake/s
  on an Intel Core(TM) i7-5600U CPU @ 2.60GHz
* FEATURE: add tls-miou-unix package, which adds miou support for TLS
  (mirleft/ocaml-tls#494 mirleft/ocaml-tls#503 @dinosaure)
* FEATURE: tls-lwt and tls-async: allow TLS over an existing connection
  `Tls_lwt.client_of_channels : Tls.Config.client -> ?host:[`host] Domain_name.t -> Lwt_io.input_channel * Lwt_io.output_channel -> t Lwt.t`
  and
  `Tls_lwt.server_of_channels : Tls.Config.server -> Lwt_io.input_channel * Lwt_io.output_channel -> t Lwt.t`
  (mirleft/ocaml-tls#499 @art-w @MisterDA)
* API breaking changes: revise errors - reduce the polymorphic variant
  in size, align it with RFC specified errors, be in parts more precise
  about errors, in other parts skip data (mirleft/ocaml-tls#505, @hannesm - fixes mirleft/ocaml-tls#491)
  NB: if you relied on a specific error constructor, please open an issue
* Remove unused constructors from Packet.{alert_type, compression_methods,
  client_certificate_type, extension_type} (mirleft/ocaml-tls#505, @hannesm)
  NB: if you relied on specific constructors, please open an issue
* API breaking change: Tls.Config.{server,client} now return a result
  type instead of raising an exception (mirleft/ocaml-tls#502, @hannesm, fixes mirleft/ocaml-tls#411)
* FEATURE: add bench/speed.exe, a benchmark for bandwidth (for different
  ciphersuites) and handshakes (different key exchanges and private keys)
  (mirleft/ocaml-tls#500 @hannesm @dinosaure @reynir)
* BUGFIX: tests/feedback.exe update with TLS 1.3 semantics, run as test
  (mirleft/ocaml-tls#501, @hannesm - reported by @dinosaure)
hannesm added a commit to hannesm/opam-repository that referenced this pull request Aug 22, 2024
CHANGES:

* API breaking change: remove usage of Cstruct.t inside of TLS, use bytes
  and string instead (mirleft/ocaml-tls#497 by @art-w, @hannesm, @dinosaure, @reynir)
  Performance is up to 3x improved (bandwidth), 2x improvement for handshake/s
  on an Intel Core(TM) i7-5600U CPU @ 2.60GHz
* FEATURE: add tls-miou-unix package, which adds miou support for TLS
  (mirleft/ocaml-tls#494 mirleft/ocaml-tls#503 @dinosaure)
* FEATURE: tls-lwt and tls-async: allow TLS over an existing connection
  `Tls_lwt.client_of_channels : Tls.Config.client -> ?host:[`host] Domain_name.t -> Lwt_io.input_channel * Lwt_io.output_channel -> t Lwt.t`
  and
  `Tls_lwt.server_of_channels : Tls.Config.server -> Lwt_io.input_channel * Lwt_io.output_channel -> t Lwt.t`
  (mirleft/ocaml-tls#499 @art-w @MisterDA)
* API breaking changes: revise errors - reduce the polymorphic variant
  in size, align it with RFC specified errors, be in parts more precise
  about errors, in other parts skip data (mirleft/ocaml-tls#505, @hannesm - fixes mirleft/ocaml-tls#491)
  NB: if you relied on a specific error constructor, please open an issue
* Remove unused constructors from Packet.{alert_type, compression_methods,
  client_certificate_type, extension_type} (mirleft/ocaml-tls#505, @hannesm)
  NB: if you relied on specific constructors, please open an issue
* API breaking change: Tls.Config.{server,client} now return a result
  type instead of raising an exception (mirleft/ocaml-tls#502, @hannesm, fixes mirleft/ocaml-tls#411)
* FEATURE: add bench/speed.exe, a benchmark for bandwidth (for different
  ciphersuites) and handshakes (different key exchanges and private keys)
  (mirleft/ocaml-tls#500 @hannesm @dinosaure @reynir)
* BUGFIX: tests/feedback.exe update with TLS 1.3 semantics, run as test
  (mirleft/ocaml-tls#501, @hannesm - reported by @dinosaure)
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

* API breaking change: remove usage of Cstruct.t inside of TLS, use bytes
  and string instead (mirleft/ocaml-tls#497 by @art-w, @hannesm, @dinosaure, @reynir)
  Performance is up to 3x improved (bandwidth), 2x improvement for handshake/s
  on an Intel Core(TM) i7-5600U CPU @ 2.60GHz
* FEATURE: add tls-miou-unix package, which adds miou support for TLS
  (mirleft/ocaml-tls#494 mirleft/ocaml-tls#503 @dinosaure)
* FEATURE: tls-lwt and tls-async: allow TLS over an existing connection
  `Tls_lwt.client_of_channels : Tls.Config.client -> ?host:[`host] Domain_name.t -> Lwt_io.input_channel * Lwt_io.output_channel -> t Lwt.t`
  and
  `Tls_lwt.server_of_channels : Tls.Config.server -> Lwt_io.input_channel * Lwt_io.output_channel -> t Lwt.t`
  (mirleft/ocaml-tls#499 @art-w @MisterDA)
* API breaking changes: revise errors - reduce the polymorphic variant
  in size, align it with RFC specified errors, be in parts more precise
  about errors, in other parts skip data (mirleft/ocaml-tls#505, @hannesm - fixes mirleft/ocaml-tls#491)
  NB: if you relied on a specific error constructor, please open an issue
* Remove unused constructors from Packet.{alert_type, compression_methods,
  client_certificate_type, extension_type} (mirleft/ocaml-tls#505, @hannesm)
  NB: if you relied on specific constructors, please open an issue
* API breaking change: Tls.Config.{server,client} now return a result
  type instead of raising an exception (mirleft/ocaml-tls#502, @hannesm, fixes mirleft/ocaml-tls#411)
* FEATURE: add bench/speed.exe, a benchmark for bandwidth (for different
  ciphersuites) and handshakes (different key exchanges and private keys)
  (mirleft/ocaml-tls#500 @hannesm @dinosaure @reynir)
* BUGFIX: tests/feedback.exe update with TLS 1.3 semantics, run as test
  (mirleft/ocaml-tls#501, @hannesm - reported by @dinosaure)
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants