Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch DirectoryControl to use AsnWriter, AsnDecoder #101512

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

edwardneal
Copy link
Contributor

Relates to #97540.

This PR replaces all references to BerConverter in LDAP directory control generation/parsing to use AsnWriter and AsnDecoder. SortRequestControl didn't use BerConverter directly - it called the OpenLDAP and WLDAP ldap_create_sort_control APIs instead. This class was the only thing in S.DS.P which referenced the ldap_create_sort_control and SortKeyInterop struct, so I deleted them both.

SortRequestControl

The change to SortRequestControl's generation mechanism might also resolve #34679, since there shouldn't be any mechanism for the heap corruption to occur.

Most of the SortRequestControl's new ASN.1 encoding is pretty uncontroversial, but there was a bit of discussion in PR #65548 around the encoding of the sort key's attribute name, and this was marshalled (as part of SortKeyInterop) with different encodings between Windows and Linux. In the RFC, this is defined (indirectly) as an LdapString; this is described as ISO10646 characters, encoded as a UTF-8 string and represented as an OCTET STRING. I'm fairly sure that UTF8Encoding.GetBytes fulfils this, and running the associated test case against a real AD domain controller passes.

Test changes

There are also test changes, but these are largely to change the special-casing of expected byte values between OpenLDAP and WLDAP - .NET now generates these values in a consistent format (the OpenLDAP format) regardless of platform. The .NET Framework tests continued to use the version of S.DS.P from the GAC in my environment, so I've special-cased by the framework version rather than by the platform.

Misc. optimizations

There were a handful of byte-by-byte array copies, which I've switched over to using span-based copies in hopes that they'll benefit slightly from vectorisation. TransformControls and GetValue have a related change: where they used to reference properties returning byte arrays (which took defensive copies) they now reference the property values directly. These should both reduce GC traffic slightly.

Replaced this with the managed AsnDecoder, removing PInvoke from a potential hot path.
Also removed the manual API calls to ldap_create_sort_control - this is now built in managed code.
This then has knock-on effects to eliminate the SortKeyInterop classes.
Most of the Control tests were hardcoded to the output of BerConverter, which uses four-byte lengths in all cases.
This behaviour is now different: the same output is returned across all platforms for .NET, and remains unchanged for .NET Framework.
This should also close issue 34679.
Reduce number of copies required in TransformControls, and enable these copies to take advantage of newer intrinsics where available.
Windows domain controllers may return a distinguished name starting with OU=, rather than ou=.
@PaulusParssinen
Copy link
Contributor

Out of curiosity, any benchmarks for perf. numbers before/after switching to AsnReader/AsnWriter?

@edwardneal
Copy link
Contributor Author

I've not got benchmarks right now, but will write some in the next few days. In advance of these, I expect there'll be a modest reduction in managed and unmanaged memory usage, and that execution time will reduce (while remaining within the margin of error for the network request itself.)

Preallocating space for AsnWriter buffers to reduce memory usage.
Correctly handling attribute names in SortControls.
@edwardneal
Copy link
Contributor Author

edwardneal commented Apr 27, 2024

Benchmarks are below. To summarize:

  • As expected, the performance improvements are working in the margins. I fully expect most of the execution time variations to be lost in the noise of network traffic.
  • 35% median reduction in memory usage. One notable improvement on this in DirectoryControl.TransformControls, which is on the hot path for processing LDAP responses and reduces memory usage by about 75%.
  • Although the percentage reductions in memory usage are good, the absolute reductions are pretty small - the median reduction was of 144 bytes.
  • 88.5% median reduction in execution time, although the absolute reductions are often small - AsqRequest is reduced from 1.869 microseconds to 185.1 nanoseconds.
  • DirectoryControl.TransformControls is another notable exception to this, reducing from 6.596us to 1.588us.
  • Most of the original code's memory allocations stuck around for Gen1 GCs. This GC pressure no longer exists.
  • I've got no data on unmanaged memory usage. This is particularly relevant for SortRequest, which moved from 400 bytes to 416 bytes managed memory usage. I'm assuming that this lack of data is the reason for the increase in memory usage - it's not actually increasing, it's just now trackable in the managed counters.
  • Code size is 15 bytes in most places. The disassembly puts this at the size of the benchmark itself - just enough to return DirectoryControl.GetValue. I think this is just noise from the JIT inlining.
Performance header
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3296/23H2/2023Update/SunValley3)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2
AsqRequestControl.GetValue: -90% execution time, -35% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.869 μs 0.0373 μs 0.0485 μs 127 B 0.1030 0.0992 432 B
PR 185.1 ns 3.25 ns 4.12 ns 15 B 0.0668 280 B
CrossDomainMoveControl.GetValue: -54% execution time, -62% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 89.81 ns 1.297 ns 1.332 ns 1,039 B 0.0516 216 B
PR 41.24 ns 0.469 ns 0.438 ns 2,577 B 0.0191 80 B
DirSyncRequestControl.GetValue: -89% execution time, -35% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.587 μs 0.0453 μs 0.1329 μs 191 B 0.1221 0.1183 520 B
PR 162.4 ns 2.20 ns 2.06 ns 15 B 0.0782 328 B
ExtendedDNControl.GetValue: -89% execution time, -30% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.059 μs 0.0213 μs 0.0522 μs 147 B 0.0877 0.0858 368 B
PR 115.9 ns 1.72 ns 1.44 ns 15 B 0.0610 256 B
PageResultRequestControl.GetValue: -90% execution time, -40% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.412 μs 0.0283 μs 0.0432 μs 160 B 0.1030 0.1011 432 B
PR 134.7 ns 1.49 ns 1.32 ns 15 B 0.0610 256 B
QuotaControl.GetValue: -92% execution time, -28% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.607 μs 0.0261 μs 0.0232 μs 127 B 0.0916 0.0877 392 B
PR 120.4 ns 1.65 ns 1.29 ns 15 B 0.0668 280 B
SearchOptionsControl.GetValue: -90% execution time, -30% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.251 μs 0.0223 μs 0.0197 μs 147 B 0.0877 0.0858 368 B
PR 114.7 ns 1.56 ns 1.39 ns 15 B 0.0610 256 B
SecurityDescriptorFlagControl.GetValue: -88% execution time, -30% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.021 μs 0.0188 μs 0.0460 μs 147 B 0.0877 0.0858 368 B
PR 115.8 ns 1.47 ns 1.37 ns 15 B 0.0610 256 B
SortRequestControl.GetValue: -79% execution time, +4% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.893 μs 0.0220 μs 0.0195 μs 15 B 0.0954 400 B
PR 390.5 ns 5.37 ns 5.03 ns 15 B 0.0992 416 B
DirectoryControl.TransformControls: -75% execution time, -76% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 6.596 μs 0.1310 μs 0.3164 μs 3,603 B 0.8392 0.0153 3.43 KB
PR 1.588 μs 0.0208 μs 0.0255 μs 12,854 B 0.1945 816 B
VerifyNameControl.GetValue: -87% execution time, -46% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.575 μs 0.0480 μs 0.1240 μs 1,526 B 0.1488 0.1469 624 B
PR 195.1 ns 1.43 ns 1.27 ns 15 B 0.0801 336 B
VlvRequestControl.GetValue: -84% execution time, -69% Gen0 memory allocation
Method Mean Error StdDev Code Size Gen0 Gen1 Allocated
Original 1.459 μs 0.0276 μs 0.0650 μs 15 B 0.2289 0.0038 968 B
PR 227.7 ns 3.21 ns 3.00 ns 15 B 0.0706 296 B

I've made a performance adjustment by specifying the initial size of AsnWriter, since this is trivial to calculate (or always static.) AsnWriter grows in 1KB increments, which is much larger than the size of a normal directory control and causes memory usage to balloon.

One inefficiency which I couldn't eliminate is that when writing strings as ASN.1 octet strings, I want to manually select the encoding to use and encode directly into the AsnWriter buffer. This isn't possible, (probably to keep AsnWriter specification-compliant) so I have to reserve/allocate a byte array, encode into that and write that out as an octet string. An example of this behaviour is in VerifyNameControl.GetValue.

Edit: the updated build has completed and the test failures are unrelated, so I'm now happy that the benchmarks are valid @PaulusParssinen

@edwardneal
Copy link
Contributor Author

I've run the usual LDAP tests against a Windows domain controller, and all have passed as expected (including SortRequestControl, which was the only DirectoryControl left untested after the AsnWriter pooling changes.) Sorry about the wait!

I'm happy with the changes made to pool AsnWriters. With the inlining hint on DirectoryControl.GetWriter, I think this should JIT back down to a constant field access for cases such as ExtendedDNControl which have a constant size.

@ericstj
Copy link
Member

ericstj commented Jul 22, 2024

@buyaa-n @bartonjs can you have another look here?

{
// The control is a PageControl.
object[] result = BerConverter.Decode("{iO}", value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking to see if there are any interesting edge cases popping up from this, I found a case where we're increasing the range of supported values, in a surprising way.

The DER sequence 30 05 02 01 03 04 00 is a SEQUENCE(INTEGER(3), OCTET_STRING('')). Under BER (and CER) it can (or, for CER, must) also be encoded with an indefinite length: 30 80 02 01 03 04 00 00 00 (change the length of the sequence from 0x05 to 0x80 and add two 0x00 values on the end)

I was incredibly surprised to learn that BerConverter.Decode doesn't seem to support indefinite-length encoding.

If it's important to continue NOT supporting indefinite-length encoding, that will require special handling.

The main thing I was trying to test for is "does the existing support trailing data", since the new code seems to. And it looks like yes (as measured from the perspective of BerDecoder).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's caught me by surprise too, thanks for pointing it out. BerConverter.Decode's lack of support for indefinite-length encoding is consistent between Windows and Linux - they may have interpreted section 5.1 of RFC4511, which states that LDAP shall be encoded for exchange using BER and only use the definite form of length encoding, in order to ease the encoding/decoding overhead. Since their ber_scanf interpretation knows it'll never need to deal with indefinite length encoding, it's just not supported.

I don't think it precludes us from using an ASN.1 parser which supports it. Anyone who sends this is breaching the protocol specification. Section 4.1.1 notes:

If the server receives an LDAPMessage from the client in which the
LDAPMessage SEQUENCE tag cannot be recognized, the messageID cannot
be parsed, the tag of the protocolOp is not recognized as a request,
or the encoding structures or lengths of data fields are found to be
incorrect, then the server SHOULD return the Notice of Disconnection
described in Section 4.4.1, with the resultCode set to protocolError,
and MUST immediately terminate the LDAP session as described in
Section 5.3.

In other cases where the client or server cannot parse an LDAP PDU,
it SHOULD abruptly terminate the LDAP session (Section 5.3) where
further communication (including providing notice) would be
pernicious. Otherwise, server implementations MUST return an
appropriate response to the request, with the resultCode set to
protocolError.

As I read it: if the client receives an LDAP PDU containing an indefinite-length BER-encoded value from the server but can still parse this PDU without any errors, the LDAP specification doesn't explicitly define the expected client-side action.

I've read the intent of the RFC's restriction as being solely to reduce encoding/decoding overhead, so unless you'd prefer to flag the server's breach of RFC4511 with an exception, I'm comfortable with DirectoryControl.TransformControls silently parsing any indefinite-length BER value properly and proceeding as it currently does.

cookie = AsnDecoder.ReadOctetString(asnSpan, AsnEncodingRules.BER, out bytesConsumed);
asnSpan = asnSpan.Slice(bytesConsumed);
}
ThrowUnless(asnSpan.IsEmpty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of this check (it matches what the RFC says), but it's not something that BerConverter.Decode does, so it's introducing an exception where there wasn't one previously.

That's not forbidden, but there should be a test for it (maybe there already is) and it needs to be part of a breaking change doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be any self-contained test coverage of the controls decoded in this method; the closest it could be said to come is the processing which takes place when somebody tests against a real LDAP server. I'll add this coverage, then broaden it to explicitly test trailing data in SEQUENCEs.

@bartonjs bartonjs added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Jul 22, 2024
@dotnet-policy-service dotnet-policy-service bot added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Jul 22, 2024
* Ensured that all references to DirectoryControl names in comments are accurate.
* Specified the name of the ASN.1 structure which is being parsed in the comments.
* Added explanatory comment stating that BER INTEGER values are being interpreted as 32-bit signed integers.
}
asnSpan = asnSpan.Slice(bytesConsumed);

attribute = s_utf8Encoding.GetString(attributeNameBuffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is using the throw-on-invalid form of the UTF-8 decoder there should be a catch for DecoderFallbackException (probably turning it into a BerConversionException)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also a behavioral change which is virtuous but probably belongs in a breaking change doc.

Comment on lines 1006 to 1008
{
UTF8Encoding encoder = new UTF8Encoding();
_target = encoder.GetBytes(target);
_target = s_utf8Encoding.GetBytes(target);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes invalid inputs (such as "\uDD74\uD800") throw an EncoderFallbackException, which it used to not do.

I'm always a fan of rejecting nonsense inputs; but it's an observable change so it needs to have a test, and documented as a breaking change.

Copy link
Member

@bartonjs bartonjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@buyaa-n At this point I have completed my review of the technical impacts (that I noticed) of changing from BerConverter to AsnDecoder; with one outstanding issue regarding the encoding of SortResult/SortResponseControl's attribute value (https://github.com/dotnet/runtime/pull/101512/files?w=1#r1690564822).

I strongly oppose the current PR's efforts to do manual bookkeeping while trying to maintain multiple AsnWriter instances tiered by the size of their internal buffers; but it's really your call, not mine.

The test changes where patterns like 84 00 00 00 12 turn into 12 are all correct, those are both valid encodings (under BER) for the length 18 (0x12).

I am not personally confident that there's sufficient test coverage for this library to call this a "low risk" change. I haven't done a thorough investigation of the coverage, only looked at what is in this diff.

The DirectoryControl.cs file really ought to be broken up into one-class-per-file... but definitely not concurrent with changes.

I am neither granting a checkmark nor mandating changes required. I defer those decisions to the area owners.

* Refactored LDAP string writing logic into a separate WriteLdapString method in AsnWriterExtensions.
* Replaced separate AsnWriter instances with a single one (reduced memory usage when compared to multiple instances being active, even with varying buffer sizes.)
* Resolved backwards-compatibility issue when attempting to process an attribute name which contains invalid UTF-8 characters.
* Correcting expected ASN tag when parsing an attribute name in a sort result.
@edwardneal
Copy link
Contributor Author

I've resolved the outstanding issue with SortResponseControl, and addressed bartonjs' feedback by removing the tiered pooled AsnWriters.

I am not personally confident that there's sufficient test coverage for this library to call this a "low risk" change. I haven't done a thorough investigation of the coverage, only looked at what is in this diff.

This comes at roughly the same point as I write the tests for the five DirectoryControls being modified, and their behavioural changes. Instead of including all of these tests in this PR, I'll extract the majority of those tests (including cases where BerConverter's parsing is lax/nonconformant to standard) into their own PR as a baseline against the existing method. I think that having this baseline should handle the direct risk, and we can come back to the current PR after the tests are verified passing on main.

@buyaa-n buyaa-n added this to the 10.0.0 milestone Aug 14, 2024
@buyaa-n buyaa-n added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 14, 2024
@buyaa-n
Copy link
Member

buyaa-n commented Aug 21, 2024

This comes at roughly the same point as I write the tests for the five DirectoryControls being modified, and their behavioural changes. Instead of including all of these tests in this PR, I'll extract the majority of those tests (including cases where BerConverter's parsing is lax/nonconformant to standard) into their own PR as a baseline against the existing method. I think that having this baseline should handle the direct risk, and we can come back to the current PR after the tests are verified passing on main.

@edwardneal this is what we wanted to have before merging/validating this PR. Glad to hear that you are planning to do it, now the main branch opened for 10.0 contribution we can unblock this PR when the tests for behavioral changes and edge case scenarios added

@buyaa-n buyaa-n added the needs-author-action An issue or pull request that requires more info or actions from the author. label Aug 21, 2024
@edwardneal
Copy link
Contributor Author

edwardneal commented Aug 25, 2024

Thanks @buyaa-n. I'd put the tests on the back burner, but with 10.0 now in progress I'll pick this back up and should hopefully have a set of tests to review (edit: #107201) within the next few days.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.DirectoryServices breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. community-contribution Indicates that the PR has been added by a community member needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mono] Test failed on windows: System.DirectoryServices.Protocols.Tests.SortRequestControlTests
5 participants