Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get around the exception caused because of #5506 fix? #5612

Closed
aforoughi1 opened this issue Feb 5, 2021 · 13 comments · Fixed by #5631
Closed

How to get around the exception caused because of #5506 fix? #5612

aforoughi1 opened this issue Feb 5, 2021 · 13 comments · Fixed by #5631
Labels
AutoML.NET Automating various steps of the machine learning process P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@aforoughi1
Copy link

Using the AutoML version 0.17.2 and 0.17.4, I get a few exceptions during SdcaRegression (simillar to #4363)
However, a new behaviour using 0.17.4, I get the AggregateException (changes because of #5445).

Exception during AutoML iteration: System.InvalidOperationException: The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
at Microsoft.ML.Trainers.OnlineLinearTrainer2.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent1
metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger)

System.AggregateException
HResult=0x80131500
Message=One or more errors occurred. (Operation was canceled.) (Operation was canceled.) (Operation was canceled.) (Operation was canceled.)
Source=System.Private.CoreLib
StackTrace:
at System.ThrowHelper.ThrowAggregateException(List1 exceptions) at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source) at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at Microsoft.ML.Trainers.FastTree.ThreadTaskManager.ThreadTask.RunTask() at Microsoft.ML.Trainers.FastTree.LeastSquaresRegressionTreeLearner.FindBestSplitOfRoot(Double[] targets) at Microsoft.ML.Trainers.FastTree.LeastSquaresRegressionTreeLearner.FitTargets(IChannel ch, Boolean[] activeFeatures, Double[] targets) at Microsoft.ML.Trainers.FastTree.RandomForestLeastSquaresTreeLearner.FitTargets(IChannel ch, Boolean[] activeFeatures, Double[] weightedtargets, Double[] targets, Double[] weights) at Microsoft.ML.Trainers.FastTree.RandomForestOptimizer.TrainingIteration(IChannel ch, Boolean[] activeFeatures) at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.Train(IChannel ch)
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.TrainCore(IChannel ch) at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable1 previousRuns)
at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric, IChannel logger) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IChannel logger, IEnumerable1 trainerAllowList) at Microsoft.ML.AutoML.Experiment2.Execute()
at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner)
at Microsoft.ML.AutoML.ExperimentBase2.ExecuteTrainValidate(IDataView trainData, ColumnInformation columnInfo, IDataView validationData, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler)
at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, IDataView validationData, String labelColumnName, IEstimator1 preFeaturizer, IProgress1 progressHandler) at AutoMLApp.Experiment2Template.Train() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 446 at AutoMLApp.MlModelTemplate.BuildModel() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 363 at AutoMLApp.MlExperimentsFactory.Experiment2Tasks(Kind kind, OutputLabels op, BinaryClassificationMetric optimizingMetric, List1 trainers) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 177
at AutoMLApp.MlExperimentsFactory.<>c__DisplayClass30_1.b__1() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 135
at AutoMLApp.MlExperimentsFactory.StartNew(String ticker, ExperimentElementCollection expColl, PredictionTestDataElement testData) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 109
at AutoMLApp.Program.Main(String[] args) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Program.cs:line 31

This exception was originally thrown at this call stack:
[External Code]

Inner Exception 1:
OperationCanceledException: Operation was canceled.

@michaelgsharp michaelgsharp added AutoML.NET Automating various steps of the machine learning process P2 Priority of the issue for triage purpose: Needs to be fixed at some point. labels Feb 11, 2021
@michaelgsharp
Copy link
Member

@JakeRadMSFT any ideas on this? I haven't changed anything in ML.NET itself that would cause any issues.

@aforoughi1
Copy link
Author

TestFor5506Issue.zip
please find the attached sample to reproduce it.

@aforoughi1
Copy link
Author

closed by mistake. please reopen.

@michaelgsharp
Copy link
Member

Hi @aforoughi1, would you be able to provide a sample project/data to help reproduce this?

Thanks!

@aforoughi1
Copy link
Author

aforoughi1 commented Feb 18, 2021 via email

@michaelgsharp
Copy link
Member

Actually, I just saw you already uploaded your sample. Let me take a look at it. My bad for missing it, sorry about that.

@michaelgsharp
Copy link
Member

Have you tested this code in prior versions of AutoML? Did it work before version 17.2?

@aforoughi1
Copy link
Author

aforoughi1 commented Feb 19, 2021 via email

@aforoughi1
Copy link
Author

#5445 fix, changes the behaviour. It seems to stop it to run to the end of the experiment. I have been using AutoML from preview phases and last working version was 0.17.2 and ml.net 1.5.2. It runs to the end with 0.17.2 and 1.5.4 too. However, it terminates with 0.17.4.

@michaelgsharp
Copy link
Member

So after looking into this I think I have found the cause. Are you building from source? Or taking a nuget dependency on this? If you are building form source there is a workaround. If not, I'll see if I can get this change in for the next release.

If you look here you will see were are checking for an OperationCanceledException, and if that is the issue we just catch it and return the results. In this case, there is some parallel training happening, so instead of a single OperationCanceledException, there are multiple of them. This causes them to be an AggregateException, which then is not handled the way it should. The fix will be to add another catch for the AggregateException, and if all the inner exceptions are the OperationCanceledException then we will make it behave the same way it does for a single OperationCanceledException .

The next release is currently set for March 2nd, so I'll see if I can have this fix in by then. You are also free to make the changes and submit a PR if you would like.

@aforoughi1
Copy link
Author

aforoughi1 commented Feb 22, 2021 via email

@michaelgsharp
Copy link
Member

So I have spent more time today looking into this. It seems like that is printed out, but you still get the final results back from AutoML right? Like even though you see this error printed to the console you are able to get a model back and use it, is that correct?

@aforoughi1
Copy link
Author

I get a null reference for the model.
sample code:
var experiment = mlContext.Auto().CreateRegressionExperiment(settings);
ExperimentResult experimentResult = null;
try
{
experimentResult = experiment.Execute(trainData: data, labelColumnName: "Target", progressHandler: new ProgressHandler());
}
catch (AggregateException exception)
{
foreach (Exception ex in exception.InnerExceptions)
{
Console.WriteLine(ex.ToString());
}
}
finally
{
ITransformer model = experimentResult.BestRun.Model;

            IDataView predictions = model.Transform(data);

            var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Target", scoreColumnName: "Score");
        }

@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AutoML.NET Automating various steps of the machine learning process P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants