Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: System.Xml.Serialization: Compiler.GetTempAssemblyName is not deterministic under .NET Core #46499

Merged
10 commits merged into from
Jul 18, 2021
Merged

Bugfix: System.Xml.Serialization: Compiler.GetTempAssemblyName is not deterministic under .NET Core #46499

10 commits merged into from
Jul 18, 2021

Conversation

TalAloni
Copy link
Contributor

@TalAloni TalAloni commented Jan 1, 2021

The implementation of String.GetHashCode() was persistent by default in .NET Framework. allowing to predictably name an XmlSerializers.{HashCode}.dll containing the pre-generated serializers.

In .NET Core / .NET 5, String.GetHashCode() is no longer persistent, so the value of ns.GetHashCode() will change during each run, preventing it from picking up the XmlSerializers dll correctly.

… persistent under .NET Core

The implementation of String.GetHashCode() was persistent by default in .NET Framework. allowing to predictably name an XmlSerializers.{HashCode}.dll containing the pre-generated serializers.
In .NET Core / .NET 5, String.GetHashCode() is no longer persistent, so the value of ns.GetHashCode() will change during each run, preventing it from picking up the XmlSerializers dll correctly.
@ghost
Copy link

ghost commented Jan 1, 2021

Tagging subscribers to this area: @buyaa-n, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

The implementation of String.GetHashCode() was persistent by default in .NET Framework. allowing to predictably name an XmlSerializers.{HashCode}.dll containing the pre-generated serializers.

In .NET Core / .NET 5, String.GetHashCode() is no longer persistent, so the value of ns.GetHashCode() will change during each run, preventing it from picking up the XmlSerializers dll correctly.

Author: TalAloni
Assignees: -
Labels:

area-System.Xml

Milestone: -

@krwq
Copy link
Member

krwq commented Jan 4, 2021

@TalAloni would truncated SHA256 work as a workaround for you? It can produce deterministic results.

@TalAloni
Copy link
Contributor Author

TalAloni commented Jan 4, 2021

Hi Krzysztof,
Note that many infrastructure developers (such as myself) use .NET Standard 2.0 assemblies that are consumed by .NET Framework 4.7.2 / 4.8 and .NET Core and .NET 5.
Using the same hash function as the .NET Framework has the additional benefit that the DLL will be picked-up by both the .NET Framework runtime and the .NET Core runtime, which I think simplifies matters.
(consider customers that are migrating to .NET core, this will make sure there is no regression due to the change in hash function).

using truncated SHA256 is a good solution but for backward compatibility sake I think there are benefits to using the same hash used by the .NET Framework.

@krwq
Copy link
Member

krwq commented Jan 4, 2021

cc: @GrabYourPitchforks

@TalAloni TalAloni changed the title Bugfix: System.Xml.Serialization: Compiler.GetTempAssemblyName is not persistent under .NET Core Bugfix: System.Xml.Serialization: Compiler.GetTempAssemblyName is not deterministic under .NET Core Jan 5, 2021
@StephenMolloy
Copy link
Member

I talked this over with the team and wanted to recommend a different approach. Most of this is going to be pretty obvious, but just bear with me while I think out loud for a minute.

It appears that this hash code is being appended only when working with a default namespace. Which makes sense - keeping assemblies separate for different namespaces. The hash code is meant to be a predictable and persistent way to keep namespaces separated without having to append an entire namespace string in the assembly name which could be ugly at best and could run into more impactful issues at worst.

However, the original implementation fell into the trap of using String.GetHashCode() which was never meant to be persistent. In fact, it is not consistent across .Net Core and .Net 4.8. It isn't even consistent across 32 and 64-bit implementations of 4.8. String.GetHashCode() kind of worked by accident here, but was not really ever appropriate for the function it is performing here.

Another thing to note is that generated assemblies do not really share between 4.8 and .Net Core due to netstandard.dll. So maintaining some resemblance with either 32-bit or 64-bit .Net 4.8 isn't much of a concern here.

So there is a bit of a "green field" opportunity to get this right this time. Criteria to note: We want something that produces a repeatable, persistent hash. The hash should be of reasonable length to avoid collision, but it is not required to be exactly 32-bits. Also, since this is not a hot code path where every bit of performance counts, we don't need to worry about being super fast. Finally, we're using this hash for naming purposes, not cryptographic purposes.

GetHashCode() meets all those criteria except for the most important - it's obviously not persistent. But rather than re-inventing an old wheel just because it feels familiar (although, this new wheel will not actually bring compatibility with the old cart)... lets use a tool that is proven and easily accessible. Use a truncated SHA512 hash. You could truncate to 32-bits as done previously. Or maybe use Span/Guid to truncate and get 128-bits with nice formatting?

Perhaps @mconnew has more thoughts?

Regardless, I believe you'll also want to change SGen to match this change as well. https://github.com/dotnet/runtime/blob/master/src/libraries/Microsoft.XmlSerializer.Generator/src/Sgen.cs#L523

@TalAloni
Copy link
Contributor Author

TalAloni commented Feb 19, 2021

Thanks for the feedback,

  1. "Another thing to note is that generated assemblies do not really share between 4.8 and .Net Core[..]. So maintaining some resemblance with either 32-bit or 64-bit .Net 4.8 isn't much of a concern here."
    I see it differently, I have a test project that targets both .NET Core and .NET Framework 64-bit - the same generated assembly is then used by both, this could only work using the approach I suggested (maintaining BC with .NET Framework 64-bit) - otherwise I'm forced to generate two assemblies - each with different name.

  2. If you and the team feel strongly about using truncated SHA512 then I will change the PR, but I feel that backward compatibility will be more useful in this case - it will "just work" when migrating from .NET Framework 64-bit which I believe is desirable.

  3. I intend to change SGen as well after this change is settled (in a separate PR).

@mconnew
Copy link
Member

mconnew commented Feb 19, 2021

@TalAloni, there are some scenarios where sharing a generated assembly between .NET Core and XmlSerializer can work, but it's very fragile and there are many scenarios which fail and there's simply no solution to the problem. You are better off pre-generating serializers for .NET Framework using .NET Framework tooling, and creating serialzers for Core using Core tooling. Having a different hash code used for .NET Framework and Core would actually help in this situation as you can have both dll's next to your application and each platform would ignore the pre-generated serializer from the other one.

@TalAloni
Copy link
Contributor Author

TalAloni commented Feb 20, 2021

Thank you.
Yes, I can see the advantages of starting over and doing this right this time, I have updated the PR to use truncated SHA512.
I opted to use UInt32 to avoid negative values.
(Note that BitConverter is architecture dependent so I couldn't use this class to convert the byte array to integer)

@TalAloni TalAloni closed this Feb 22, 2021
@TalAloni TalAloni reopened this Feb 22, 2021
Base automatically changed from master to main March 1, 2021 09:07
@TalAloni TalAloni requested a review from danmoseley June 29, 2021 19:12
@TalAloni TalAloni closed this Jun 29, 2021
@TalAloni TalAloni reopened this Jun 29, 2021
@StephenMolloy
Copy link
Member

lgtm
@HongGit or @mconnew, do either of you have any more comments on this one?

TalAloni and others added 2 commits July 9, 2021 19:17
…Compiler.cs

Co-authored-by: Stephen Toub <stoub@microsoft.com>
…Compiler.cs

Co-authored-by: Stephen Toub <stoub@microsoft.com>
@ghost
Copy link

ghost commented Jul 18, 2021

Hello @stephentoub!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit e8be7f2 into dotnet:main Jul 18, 2021
@TalAloni TalAloni deleted the xml_compiler_gettempassemblyname_bugfix branch August 7, 2021 15:42
StephenMolloy pushed a commit that referenced this pull request Aug 17, 2021
* Update GetTempAssemblyName according to #46499

In #46499 we corrected Compilation.GetTempAssemblyName in order for it to be not deterministic under .NET Core.
In this PR we update the generated filename to match the new logic.

* Update Microsoft.XmlSerializer.Generator.csproj

* Update SGen.cs

Avoid using external dependency

* Update Microsoft.XmlSerializer.Generator.csproj

Avoid using external dependency

* Update Sgen.cs

Fixed compilation
@ghost ghost locked as resolved and limited conversation to collaborators Sep 7, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants