Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Strong detection for List/Dictionary concurrent use in Debug #17076

Closed
wants to merge 3 commits into from

Conversation

benaadams
Copy link
Member

Bit ugly, but should do the trick?

Resolves: https://github.com/dotnet/coreclr/issues/17070

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@@ -1041,6 +1191,9 @@ ICollection IDictionary.Values
{
ThrowHelper.ThrowWrongKeyTypeArgumentException(key, typeof(TKey));
}
#if DEBUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Why bother with #if DEBUG here if the target method is marked [Conditional("DEBUG")]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do that it might be worth naming something like as DebugOnlyConcurrentAccessCheck so it's obvious it's debug only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure the Jit would elide all the int version = _version; calls (or C# delete them and not produce an error about not using it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change an check diffs as it reduces the number of #ifs considerably

@@ -214,14 +215,26 @@ public ValueCollection Values
{
get
{
#if DEBUG
int version = _version;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're using the existing _version field doesn't it have to be volatile or incremented with Interlocked?

@stephentoub

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DEBUG its incremented in ConcurrentAccessWriteCheck with

Interlocked.CompareExchange(ref _version, version + 1, version) != version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

if (Interlocked.Exchange(ref _version, version + 1) != version)
{
    ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
}

As don't care if it stops version being updated; just that it throws if the version wasn't correct

@benaadams benaadams force-pushed the dict-debig-concurrency branch from 1914ca9 to c9646e4 Compare March 21, 2018 00:16
@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@jkotas
Copy link
Member

jkotas commented Mar 21, 2018

The barriers from these interlocked operations can hide some types of the races as well.

Do we have any examples of bugs in CoreFX that were caused by invalid multi-threaded Dictionary use and that were in the product for a long time because of they were hard to hit and find? It would be nice to validate the effectiveness of this on some real world examples.

@benaadams
Copy link
Member Author

Its generating the asm for the [Conditional("DEBUG")] in Release; though not calling it, also not eliding the _version use :sad: back to #if DEBUG

Total bytes of diff: 1604 (0.05% of base)
    diff is a regression.

Total byte diff includes 61 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    2 unique methods,       61 unique bytes

Top file regressions by size (bytes):
        1604 : System.Private.CoreLib.dasm (0.05% of base)

1 total files with size differences (0 improved, 1 regressed), 0 unchanged.

Top method regessions by size (bytes):
          88 : System.Private.CoreLib.dasm - Dictionary`2:.ctor(ref,ref):this (58 methods)
          87 : System.Private.CoreLib.dasm - Dictionary`2:CopyTo(ref,int):this (29 methods)
          87 : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.IDictionary.get_Item(ref):ref:this (29 methods)
          75 : System.Private.CoreLib.dasm - Dictionary`2:TrimExcess(int):this (25 methods)
          58 : System.Private.CoreLib.dasm - Dictionary`2:Resize(int,bool):this (29 methods)

Top method improvements by size (bytes):
        -203 : System.Private.CoreLib.dasm - Dictionary`2:Clear():this (29 methods)

164 total methods with size differences (1 improved, 163 regressed), 17109 unchanged.

@benaadams
Copy link
Member Author

Total bytes of diff: -66 (0.00% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
         -66 : System.Private.CoreLib.dasm (0.00% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
          75 : System.Private.CoreLib.dasm - Dictionary`2:TrimExcess(int):this (25 methods)
          58 : System.Private.CoreLib.dasm - Dictionary`2:Resize(int,bool):this (29 methods)
          32 : System.Private.CoreLib.dasm - PseudoCustomAttribute:GetCustomAttributes(ref,ref,byref):ref (9 methods)
          32 : System.Private.CoreLib.dasm - PseudoCustomAttribute:IsDefined(ref,ref):bool (9 methods)
           8 : System.Private.CoreLib.dasm - Attribute:CopyToArrayList(ref,ref,ref)
           8 : System.Private.CoreLib.dasm - MemberInfoCache`1:PopulateInterfaces(struct):ref:this
           8 : System.Private.CoreLib.dasm - MemberInfoCache`1:PopulateEvents(struct,ref,ref,byref):this
           8 : System.Private.CoreLib.dasm - EventSource:DebugCheckEvent(byref,ref,ref,ref,ref,int)
           8 : System.Private.CoreLib.dasm - ManifestBuilder:AddEventParameter(ref,ref):this
           8 : System.Private.CoreLib.dasm - SerializationInfo:AddValueInternal(ref,ref,ref):this
           8 : System.Private.CoreLib.dasm - EventRegistrationTokenTable`1:AddEventHandlerNoLock(ref):struct:this

Top method improvements by size (bytes):
        -319 : System.Private.CoreLib.dasm - Dictionary`2:Clear():this (29 methods)

12 total methods with size differences (1 improved, 11 regressed), 17261 unchanged.

Regression in Dictionary is from incrementing _version in TrimExcess and Resize

Other regressions are from ContainsKey(TKey key) deciding to inline now that it gets the bool to a local and returns it; rather than returning the bool directly - which seems a bit weird?

@danmoseley
Copy link
Member

@jkotas this was motivated by https://github.com/dotnet/corefx/issues/28198. I assume there have been other cases periodically?

@benaadams benaadams force-pushed the dict-debig-concurrency branch from b6f9740 to 3363111 Compare March 21, 2018 05:52
@benaadams
Copy link
Member Author

Is there a way to run coreclr (checked or debug) against corefx outerloop?

@benaadams
Copy link
Member Author

Triggered some tests that were checking version didn't change

System.Collections.Tests.Dictionary_Generic_Tests_string_string.EnsureCapacity_Generic_RequestingLargerCapacity_DoesNotInvalidateEnumeration(count: 1) [FAIL]
   System.InvalidOperationException : Collection was modified; enumeration operation may not execute.
   Stack Trace:
         at System.Collections.Generic.Dictionary`2.KeyCollection.Enumerator.MoveNext()
      D:\j\workspace\x64_checked_w---d7295605\_\fx\src\System.Collections\tests\Generic\Dictionary\Dictionary.Generic.Tests.netcoreapp.cs(104,0): at System.Collections.Tests.Dictionary_Generic_Tests`2.EnsureCapacity_Generic_RequestingLargerCapacity_DoesNotInvalidateEnumeration(Int32 count)
System.Collections.Tests.Dictionary_Generic_Tests_string_string.EnsureCapacity_Generic_RequestingLargerCapacity_DoesNotInvalidateEnumeration(count: 75) [FAIL]
   System.InvalidOperationException : Collection was modified; enumeration operation may not execute.
   Stack Trace:
         at System.Collections.Generic.Dictionary`2.KeyCollection.Enumerator.MoveNext()
      D:\j\workspace\x64_checked_w---d7295605\_\fx\src\System.Collections\tests\Generic\Dictionary\Dictionary.Generic.Tests.netcoreapp.cs(104,0): at System.Collections.Tests.Dictionary_Generic_Tests`2.EnsureCapacity_Generic_RequestingLargerCapacity_DoesNotInvalidateEnumeration(Int32 count)

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@jkotas
Copy link
Member

jkotas commented Mar 21, 2018

@jkotas this was motivated by dotnet/corefx#28198

https://github.com/dotnet/corefx/issues/28198#issuecomment-374676611 says that we do not have any tests for the buggy code. If it is the case, this would not help us to find this issue faster.

@benaadams benaadams force-pushed the dict-debig-concurrency branch 2 times, most recently from 3b28f71 to 22fb6c0 Compare March 21, 2018 13:04
@benaadams benaadams changed the title Strong detection for Dictionary concurrent use in Debug Strong detection for List/Dictionary concurrent use in Debug Mar 21, 2018
@benaadams benaadams force-pushed the dict-debig-concurrency branch from 22fb6c0 to 956fdd2 Compare March 21, 2018 13:34
@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

benaadams commented Mar 21, 2018

Linux - CoreFx issues unrelated

Test Result (4 failures / ±0)
System.IO.Tests.FileInfo_GetSetTimes.TimesIncludeMillisecondPart_Linux
System.IO.Tests.File_GetSetTimes.TimesIncludeMillisecondPart_Linux
System.IO.Tests.DirectoryInfo_GetSetTimes.TimesIncludeMillisecondPart_Linux
System.IO.Tests.Directory_GetSetTimes.TimesIncludeMillisecondPart_Linux
## TimesIncludeMillisecondPart got a file time of 2018-03-21T14:57:32.0000000Z on ext2
## TimesIncludeMillisecondPart got a file time of 2018-03-21T14:57:33.0000000Z on ext2
## TimesIncludeMillisecondPart got a file time of 2018-03-21T14:57:34.0000000Z on ext2
## TimesIncludeMillisecondPart got a file time of 2018-03-21T14:57:36.0000000Z on ext2
## TimesIncludeMillisecondPart got a file time of 2018-03-21T14:57:37.0000000Z on ext2
 System.IO.Tests.FileInfo_GetSetTimes.TimesIncludeMillisecondPart_Linux [FAIL]
  Assert.All() Failure: 6 out of 6 items in the collection did not pass.
  [5]: Xunit.Sdk.NotEqualException: Assert.NotEqual() Failure
   Expected: Not 0
   Actual:   0
    at Xunit.Assert.NotEqual[T](T expected, T actual, IEqualityComparer`1 comparer)
    at System.IO.Tests.BaseGetSetTimes`1.<>c__DisplayClass9_0.<TimesIncludeMillisecondPart_Linux>b__0(TimeFunction function) in /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/src/System.IO.FileSystem/tests/Base/BaseGetSetTimes.cs:line 108
    at Xunit.Assert.All[T](IEnumerable`1 collection, Action`1 action)
  [4]: Xunit.Sdk.NotEqualException: Assert.NotEqual() Failure
   Expected: Not 0
   Actual:   0
    at Xunit.Assert.NotEqual[T](T expected, T actual, IEqualityComparer`1 comparer)
    at System.IO.Tests.BaseGetSetTimes`1.<>c__DisplayClass9_0.<TimesIncludeMillisecondPart_Linux>b__0(TimeFunction function) in /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/src/System.IO.FileSystem/tests/Base/BaseGetSetTimes.cs:line 108
    at Xunit.Assert.All[T](IEnumerable`1 collection, Action`1 action)

@danmoseley
Copy link
Member

@stephentoub thoughts about existence of real world examples?

@benaadams
Copy link
Member Author

Windows_NT x64 Checked Build and Test (Jit - CoreFx) failure

System.ArgumentException: ComposablePartDefinition of type 'System.ComponentModel.Composition.AttributedModel.ConcreteCPD' cannot be used in this context.
 Only part definitions produced by the ReflectionModelServices.CreatePartDefinition are supported.
Parameter name: partDefinition
 at System.ComponentModel.Composition.AttributedModelServices.CreatePart(ComposablePartDefinition partDefinition, Object attributedPart) in D:\j\workspace\x64_checked_w---d7295605\_\fx\src\System.ComponentModel.Composition\src\System\ComponentModel\Composition\AttributedModelServices.cs:line 49
 at System.ComponentModel.Composition.AttributedModel.AttributedModelServicesTests.<>c.<CreatePart_From_InvalidPartDefiniton_ShouldThrowArgumentException>b__2_0() in D:\j\workspace\x64_checked_w---d7295605\_\fx\src\System.ComponentModel.Composition\tests\System\ComponentModel\Composition\AttributedModelServicesTests.cs:line 65

@benaadams
Copy link
Member Author

@danmosemsft it didn't catch anything 😢

Is this change worth it?

@danmoseley
Copy link
Member

I figured stress runs would be more likely to catch something. As to whether it's worth it, I'm open minded, I'm curious whether @stephentoub has seen real cases.

Do you understand this gain - is it worth keeping?

Top method improvements by size (bytes):
        -319 : System.Private.CoreLib.dasm - Dictionary`2:Clear():this (29 methods)

@benaadams
Copy link
Member Author

Do you understand this gain - is it worth keeping?

Added PR for the gains :) #17096

@stephentoub
Copy link
Member

I'm curious whether @stephentoub has seen real cases.

I've seen real cases, e.g. the SQL one I fixed earlier in the week, but as Jan restated this wouldn't have caught that as we didn't have any tests that exercised those code paths, nevermind ones that would have exercised it in parallel.

I just did a search for "static Dictionary", "static readonly Dictionary", "readonly static Dictionary" in corefx (thankfully there's only one of the latter, as that's not the desired ordering), and did a cursory review of each's usage. A few things I noticed:

In short, across the large codebase that is corefx, it's possible this might catch something, but I wouldn't expect significant wins.

@jkotas
Copy link
Member

jkotas commented Jun 25, 2018

I have opened https://github.com/dotnet/corefx/issues/30651 and https://github.com/dotnet/corefx/issues/30650 on the issues that Stephen identified by code inspection.

@jkotas
Copy link
Member

jkotas commented Jun 25, 2018

Otherwise, I do not think that this is worth the extra DEBUG #ifdefs. I think that these ifdefs have about the same chance of catching problems when running debug CoreCLR build as introducing new ones.

@benaadams Thanks for the suggested change anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants