-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Done() call in BaseTestBaseline.Cleanup and added related fixes #4823
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output on your most recent build:
2020-02-10T23:01:35.8655896Z X Microsoft.ML.Tests.Transformers.TextFeaturizerTests.LdaWorkoutEstimatorCore [155ms]
2020-02-10T23:01:35.8656992Z Error Message:
2020-02-10T23:01:35.8706041Z System.AggregateException : One or more errors occurred. (Different values in column 1 of row 0) (Odd same failed)
2020-02-10T23:01:35.8706491Z ---- System.InvalidOperationException : Different values in column 1 of row 0
2020-02-10T23:01:35.8706738Z ---- System.InvalidOperationException : Odd same failed
2020-02-10T23:01:35.8707131Z Stack Trace:
2020-02-10T23:01:35.8707356Z at Microsoft.ML.RunTests.BaseTestBaseline.Done() in D:\a\1\s\test\Microsoft.ML.TestFramework\BaseTestBaseline.cs:line 161
2020-02-10T23:01:35.8707606Z at Microsoft.ML.RunTests.BaseTestBaseline.Cleanup() in D:\a\1\s\test\Microsoft.ML.TestFramework\BaseTestBaseline.cs:line 133
2020-02-10T23:01:35.8707856Z at Microsoft.ML.TestFramework.BaseTestClass.System.IDisposable.Dispose() in D:\a\1\s\test\Microsoft.ML.TestFramework\BaseTestClass.cs:line 76
2020-02-10T23:01:35.8708508Z at ReflectionAbstractionExtensions.DisposeTestClass(ITest test, Object testClass, IMessageBus messageBus, ExecutionTimer timer, CancellationTokenSource cancellationTokenSource) in C:\Dev\xunit\xunit\src\xunit.execution\Extensions\ReflectionAbstractionExtensions.cs:line 79
2020-02-10T23:01:35.8709014Z ----- Inner Stack Trace #1 (System.InvalidOperationException) -----
2020-02-10T23:01:35.8709656Z at Microsoft.ML.RunTests.BaseTestBaseline.Fail(String fmt, Object[] args) in D:\a\1\s\test\Microsoft.ML.TestFramework\BaseTestBaseline.cs:line 184
2020-02-10T23:01:35.8709929Z ----- Inner Stack Trace #2 (System.InvalidOperationException) -----
2020-02-10T23:01:35.8710190Z at Microsoft.ML.RunTests.BaseTestBaseline.Fail(String fmt, Object[] args) in D:\a\1\s\test\Microsoft.ML.TestFramework\BaseTestBaseline.cs:line 184
2020-02-10T23:01:35.8710408Z Standard Output Messages:
2020-02-10T23:01:35.8711219Z *** Failure: Different values in column 1 of row 0
2020-02-10T23:01:35.8711704Z *** Failure: Odd same failed
2020-02-10T23:01:35.8711990Z Test LdaWorkoutEstimatorCore: aborted: failed
Here is the data in question:
machinelearning/test/Microsoft.ML.Tests/Transformers/TextFeaturizerTests.cs
Lines 717 to 733 in 9d2b198
public void LdaWorkoutEstimatorCore() | |
{ | |
var ml = new MLContext(1); | |
var builder = new ArrayDataViewBuilder(Env); | |
var data = new[] | |
{ | |
new[] { (float)1.0, (float)0.0, (float)0.0 }, | |
new[] { (float)0.0, (float)1.0, (float)0.0 }, | |
new[] { (float)0.0, (float)0.0, (float)1.0 }, | |
}; | |
builder.AddColumn("F1V", NumberDataViewType.Single, data); | |
var srcView = builder.GetDataView(); | |
var est = ml.Transforms.Text.LatentDirichletAllocation("F1V"); | |
TestEstimatorCore(est, srcView); | |
} |
Is there a change your changes triggered this tests' failure? I'm not entirely sure how they're related in this case.
Output on your most recent build:
Here is the data in question: machinelearning/test/Microsoft.ML.Tests/Transformers/TextFeaturizerTests.cs Lines 717 to 733 in 9d2b198
Is there a change your changes triggered this tests' failure? I'm not entirely sure how they're related in this case. In reply to: 356336770 [](ancestors = 356336770) |
@@ -2086,7 +2086,7 @@ public sealed class NoArgumentsInput : CalibrateInputBase | |||
public sealed class FixedPlattInput : CalibrateInputBase | |||
{ | |||
[Argument(ArgumentType.AtMostOnce, ShortName = "slope", HelpText = "The slope parameter of the calibration function 1 / (1 + exp(-slope * x + offset)", SortOrder = 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-slope [](start = 141, length = 6)
Can you change this to exp(slope * x + offset)
? (and in the Offset
argument as well). #Resolved
@@ -2086,7 +2086,7 @@ public sealed class NoArgumentsInput : CalibrateInputBase | |||
public sealed class FixedPlattInput : CalibrateInputBase | |||
{ | |||
[Argument(ArgumentType.AtMostOnce, ShortName = "slope", HelpText = "The slope parameter of the calibration function 1 / (1 + exp(-slope * x + offset)", SortOrder = 1)] | |||
public Double Slope = 1; | |||
public Double Slope = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slope [](start = 26, length = 5)
FixedPlattCalibratorTrainer.Arguments
also has a Slope
field that defaults to 1. #Resolved
In this new implementation, we can get rid of this Done() method and remove all usage. These assertions are done automatically in Cleanup method #Resolved Refers to: test/Microsoft.ML.TestFramework/BaseTestBaseline.cs:145 in f86faac. [](commit_id = f86faac, deletion_comment = False) |
@@ -948,7 +949,7 @@ public void TestMulticlassEnsembleCombiner() | |||
predGetters[i](ref preds[i]); | |||
} | |||
if (scores.All(s => !float.IsNaN(s))) | |||
CompareNumbersWithTolerance(score, scores.Sum() / predCount); | |||
CompareNumbersWithTolerance(score, scores.Sum() / predCount, digitsOfPrecision: 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
digitsOfPrecision [](start = 89, length = 17)
why we use less precise here? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is 7 digits. All tests are not equally precise.
In reply to: 377899328 [](ancestors = 377899328)
This one test: LdaWorkoutEstimatorCore is failing on all configurations and seems like real fail #Resolved |
@yaeldekel Thanks for the info. I have added it to the list of skipped tests for now. In reply to: 585197699 [](ancestors = 585197699,584859078) |
There is a problem with the current implementation. The assertion happens in Cleanup() but when Cleanup is called, the test context is lost. I am still debating whether to remove Done completely or find a different way to force Done to be called in the context of the original function. For now, I am leaving it as is. In reply to: 584851784 [](ancestors = 584851784) Refers to: test/Microsoft.ML.TestFramework/BaseTestBaseline.cs:145 in f86faac. [](commit_id = f86faac, deletion_comment = False) |
…st of skipped tests
…nInMemoryDataZeroBaseIndex
@@ -485,10 +485,14 @@ public void OneClassMatrixFactorizationInMemoryDataZeroBaseIndex() | |||
var testPrediction = model.Transform(testDataView); | |||
|
|||
var testResults = mlContext.Data.CreateEnumerable<OneClassMatrixElementZeroBasedForScore>(testPrediction, false).ToList(); | |||
|
|||
// REVIEW: We are seeing lower precision on non-Windows platforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REVIEW [](start = 15, length = 6)
maybe use below tags in comments so we can find this easily from task list:
TODO TEST_STABILITY:
@@ -714,6 +714,7 @@ public void LdaWorkout() | |||
} | |||
|
|||
[Fact] | |||
[Trait("Category", "SkipInCI")] | |||
public void LdaWorkoutEstimatorCore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LdaWorkoutEstimatorCore [](start = 20, length = 23)
I haven't see this fail in full build and CI build in last 14 days, this should be low fail rate tests and I think it is okay to left this one enabled, if we see this fails often later we can go back to this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is similar to #4817 but standardizes the default value for Slope in PlattCalibrator to -1 instead of 1 since -1 seems to be used everywhere in the code. (If this PR is approved, I will be closing the other PR)