Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Transactions (MSDTC) code blocking on disposal of Transaction Scope #76010

Closed
nathangreaves opened this issue Sep 22, 2022 · 12 comments · Fixed by #76310
Closed

Distributed Transactions (MSDTC) code blocking on disposal of Transaction Scope #76010

nathangreaves opened this issue Sep 22, 2022 · 12 comments · Fixed by #76310

Comments

@nathangreaves
Copy link

Description

When disposing of a TransactionScope object in .NET 7 code - where the transaction has been promoted to a distributed transaction and work has taken place in another process - the code blocks indefinitely.

Reproduction Steps

I encountered this when converting one of our .NET 4.8 ASP.NET web api applications to .NET 7, but I have written a simple console app that demonstrates this blocking behaviour - https://github.com/nathangreaves/NET7RC1-MSDTC-TestApp.
It contains 2 projects that have virtually the same code. One is targeting .NET 7 RC1, the other is .NET Framework 4.8.

The code starts a new TransactionScope, inserts a value into a database table, and writes the transmitter propagation token to a file on disk. When running the application a second time as a new instance, it will read this file, create a new TransactionScope (using the Transaction from the transmitter propogation token), and insert a value into a database table. It deletes the file following completion of the transaction. The first instance will then complete the outer transaction, and then repeat the whole process a second time.

The code in .NET Framework 4.8 works correctly, committing both transactions and continuing execution.
However the .NET 7 code blocks on disposing the outer transaction. Also the Inner Transaction throws the following exception which may be relevant:

System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
   at System.Threading.Monitor.Exit(Object obj)
   at System.Transactions.DependentTransaction.Complete()
   at System.Transactions.TransactionScope.InternalDispose()
   at System.Transactions.TransactionScope.Dispose()

However, if you run the applications in the following order:

  • Outer transaction using .NET 7 project
  • Inner transaction using .NET Framework 4.8 project
    The inner transaction completes successfully with the above exception, and the outer transaction (.NET 7) still blocks.

I attached the debugger to the .NET 7 project whilst it was blocked:
Debugger - Waiting on lock
This screenshot shows the code has blocked waiting for a lock in transactionScope.Dispose();

The following screenshots show the output of the console applications for all of the scenarios in terms of which version of .NET is running for the 2 instances of the console application:

Outer Transaction .NET 7, Inner Transaction .NET 7 - outer transaction blocks on dispose, inner transaction throws SynchronizationLockException exception:
Net7 Outer - Net7 Inner

Outer Transaction .NET 7, Inner Transaction .NET 4.8 - outer transaction blocks on dispose, inner transaction successful:
Net7 Outer - Net48 Inner

Outer Transaction .NET 4.8, Inner Transaction .NET 7 - outer transactions complete, inner transactions throw SynchronizationLockException on dispose (but still complete):
Net48 Outer - Net7 Inner

Outer Transaction .NET 4.8, Inner Transaction .NET 4.8 - outer and inner transactions complete and execution continues
Net48 Outer - Net48 Inner

Expected behavior

Execution does not block on transactionScope.Dispose() when completing a distributed transaction.
Execution does not throw exception System.Threading.SynchronizationLockException when completing a distributed transaction.

Actual behavior

When disposing of a TransactionScope object in .NET 7 code - where the transaction has been promoted to a distributed transaction and work has taken place in another process - the code blocks indefinitely.

Regression?

This code works correctly in .NET Framework 4.8

Known Workarounds

No response

Configuration

.NET 7 RC1
Windows Server 2019 Standard - 1809 (17763.2565)
x64

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 22, 2022
@roji roji self-assigned this Sep 22, 2022
@roji roji added bug and removed untriaged New issue has not been triaged by the area owner labels Sep 22, 2022
@roji
Copy link
Member

roji commented Sep 26, 2022

@nathangreaves thanks for testing out the rc, and thanks for the detailed report! I'll investigate this in the next few days.

roji added a commit to roji/runtime that referenced this issue Sep 28, 2022
* Retake lock when using a dependent transaction from a
  TransactionScope (dotnet#76010).
* Reset TransactionTransmitter and Receiver before reusing them
  (dotnet#76010).
* Increase MSDTC startup timeout from 2.5 to 30 seconds (dotnet#75822)

Fixes dotnet#76010
Fixes dotnet#75822
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 28, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 30, 2022
roji added a commit that referenced this issue Sep 30, 2022
* Retake lock when using a dependent transaction from a
  TransactionScope (#76010).
* Reset TransactionTransmitter and Receiver before reusing them
  (#76010).
* Increase MSDTC startup timeout from 2.5 to 30 seconds (#75822)

Fixes #76010
Fixes #75822
github-actions bot pushed a commit that referenced this issue Sep 30, 2022
* Retake lock when using a dependent transaction from a
  TransactionScope (#76010).
* Reset TransactionTransmitter and Receiver before reusing them
  (#76010).
* Increase MSDTC startup timeout from 2.5 to 30 seconds (#75822)

Fixes #76010
Fixes #75822
@roji
Copy link
Member

roji commented Sep 30, 2022

@nathangreaves thanks again for reporting this - a fix has been merged and is being backported for release for 7.0.

It would be very useful if you could retry with a daily build once #76425 is merged, to make sure that everything works well for you (let me know if you have technical difficulties getting the daily build etc.). If you run into any further bugs, I'll prioritize fixing them for the 7.0 release.

@roji
Copy link
Member

roji commented Sep 30, 2022

FYI note #76376, which will also get merged very soon and will require you to explicitly opt in in order to start a distributed transaction.

carlossanlop pushed a commit that referenced this issue Sep 30, 2022
* Distributed transaction fixes

* Retake lock when using a dependent transaction from a
  TransactionScope (#76010).
* Reset TransactionTransmitter and Receiver before reusing them
  (#76010).
* Increase MSDTC startup timeout from 2.5 to 30 seconds (#75822)

Fixes #76010
Fixes #75822

* Fix member naming

Co-authored-by: Shay Rojansky <roji@roji.org>
@nathangreaves
Copy link
Author

Hi @roji
Unfortunately I'm still experiencing these issues after downloading the daily builds from here: https://github.com/dotnet/installer#installers-and-binaries
I tried both the Release/7.0.1xx and Release/7.0.1xx-rc2 (at time of writing that's 7.0.100-rtm.22504.1 and 7.0.100-rc.2.22477.20 respectively) for x64.
In both cases I made sure to uninstall all other versions of .NET 7 (apart from the version that comes as part of VS 17.4 Preview as I can't remove that), and in both cases dotnet --info showed that the correct version was installed.
I also added the <add key="dotnet7" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet7/nuget/v3/index.json" /> package source.
I then ran the test app I created and referenced in the original bug report, and I still experienced the locking issue when the outer transaction disposed. Screenshots 1 and 3 above are still valid for showing the issue.
Would you like me to raise a new bug or will you re-open this one?

@roji
Copy link
Member

roji commented Oct 5, 2022

@nathangreaves thanks for trying this.

Screenshots 1 and 3 above are still valid for showing the issue.

Just to be on the safe side, are you saying you're no longer seeing the SynchronizationLockException, but only seeing blocking?

I suspect this may be because of the following notes on this page:

Note Since RC2 is a go live release, it's build and codeflow are being done internally. The latest public build is available below but it does not include updates to the runtime since around mid-September.

But let me follow up internally on this and see how to make sure you have the latest build.

@roji
Copy link
Member

roji commented Oct 5, 2022

I looked into this, and the latest rtm daily build (7.0.0-rtm.22478.9) corresponds to commit 3fed4a3, which does not yet contain the commit; the next daily build to appear should contain it. I'll check and update this issue once a build is available which contains the fix.

@nathangreaves
Copy link
Author

Just to be on the safe side, are you saying you're no longer seeing the SynchronizationLockException, but only seeing blocking?

I'm pretty sure that was the case but I no longer have .NET 7 installed on my machine so can't confirm right now.

I looked into this, and the latest rtm daily build (7.0.0-rtm.22478.9) corresponds to commit 3fed4a3, which does not yet contain the commit; the next daily build to appear should contain it. I'll check and update this issue once a build is available which contains the fix.

That probably explains it then, thanks for checking! I'll hopefully have some time next week to try out another go at a .NET 7 build if it's available by then.

@roji
Copy link
Member

roji commented Oct 7, 2022

@nathangreaves I've tried again today, and could indeed reproduce the hang with a newer daily build, which I confirmed contained the fix (the SynchronizationLockException was indeed gone thanks to the fix).

To make a long story short, I've tracked the source of the problem to what looks like a deadlock in SqlClient - see dotnet/SqlClient#1800. As far as I can tell, the deadlock should be possible in .NET Framework as well, although I can't see it happening there. Another interesting fact is that when I reference a locally-built System.Transactions DLL directly from your project, that also doesn't repro - even if using the official runtime built from the same runtime commit reproduces the hang very reliably (this is why I didn't see the hang when originally working on your repro). This is most likely because of the timing-sensitive nature of the deadlock.

@roji
Copy link
Member

roji commented Oct 9, 2022

FYI see dotnet/SqlClient#1800 (comment) for some more investigation.

Specifically, it's possible to work around this bug by ensuring the transaction is promoted before the first SqlClient escalation. In other words, inserting this line before doing any SqlClient operations makes it go away:

_ = TransactionInterop.GetTransmitterPropagationToken(transaction);
// If using an ambient transaction via TransactionScope:
_ = TransactionInterop.GetTransmitterPropagationToken(Transaction.Current!);

@nathangreaves
Copy link
Author

@roji thanks for this! Do you think it's likely a fix will make it into the final build for .net 7?

@roji
Copy link
Member

roji commented Oct 11, 2022

@nathangreaves the issue isn't on the .NET side, it's on the SqlClient side. I'll be in touch with them and do my best to make sure a fix is merged as soon as possible. Though note that you can at least work around this as above, in the meantime.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 10, 2022
@roji
Copy link
Member

roji commented Jan 20, 2023

@nathangreaves and others, I've tested this with the new Microsoft.Data.SqlClient 5.1.0 that came out today, and I can confirm that the deadlock is gone. Please give this a try and report how it goes!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants