forked from dotnet/machinelearning
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Conversion of ITrainer.Train returns predictor, accepts +TrainContext (…
…dotnet#522) * Conversion of ITrainer.Train returns predictor, accepts TrainContext * `ITrainer.Train` returns a predictor. There is no `CreatePredictor` method on the interface. * `ITrainer.Train` always accepts a `TrainContext`. Dataset type is no longer a generic parameter. This context object replaces the functionality previously offered by the combination of `ITrainer`, `IValidatingTrainer`, `IIncrementalTrainer`, and `IIncrementalValidatingTrainer`, which is now captured in one `ITrainer.Train` method with differently configured contexts. * All trainers updated to these two new idioms. Many trainers correspondingly improved to no longer be stateful objects. (The exceptions are those that are just too far gone to be done with less than herculean effort at refactoring them to no longer use instance fields for their computation. Most notably, LBFGS and FastTree based trainers.) * Utility code meant to deal with the complexity of the aforementioned `IT/IVT/IIT/IIVT` idiom reduced considerably. * Opportunistic improvements to `ITrainer` implementors where observed. * TrainerInfo introduction, ITrainerEx destruction * Remove `IMetaLinearTrainer`
- Loading branch information
Showing
61 changed files
with
853 additions
and
1,262 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using Microsoft.ML.Runtime.Data; | ||
|
||
namespace Microsoft.ML.Runtime | ||
{ | ||
/// <summary> | ||
/// Holds information relevant to trainers. Instances of this class are meant to be constructed and passed | ||
/// into <see cref="ITrainer{TPredictor}.Train(TrainContext)"/> or <see cref="ITrainer.Train(TrainContext)"/>. | ||
/// This holds at least a training set, as well as optioonally a predictor. | ||
/// </summary> | ||
public sealed class TrainContext | ||
{ | ||
/// <summary> | ||
/// The training set. Cannot be <c>null</c>. | ||
/// </summary> | ||
public RoleMappedData TrainingSet { get; } | ||
|
||
/// <summary> | ||
/// The validation set. Can be <c>null</c>. Note that passing a non-<c>null</c> validation set into | ||
/// a trainer that does not support validation sets should not be considered an error condition. It | ||
/// should simply be ignored in that case. | ||
/// </summary> | ||
public RoleMappedData ValidationSet { get; } | ||
|
||
/// <summary> | ||
/// The initial predictor, for incremental training. Note that if a <see cref="ITrainer"/> implementor | ||
/// does not support incremental training, then it can ignore it similarly to how one would ignore | ||
/// <see cref="ValidationSet"/>. However, if the trainer does support incremental training and there | ||
/// is something wrong with a non-<c>null</c> value of this, then the trainer ought to throw an exception. | ||
/// </summary> | ||
public IPredictor InitialPredictor { get; } | ||
|
||
|
||
/// <summary> | ||
/// Constructor, given a training set and optional other arguments. | ||
/// </summary> | ||
/// <param name="trainingSet">Will set <see cref="TrainingSet"/> to this value. This must be specified</param> | ||
/// <param name="validationSet">Will set <see cref="ValidationSet"/> to this value if specified</param> | ||
/// <param name="initialPredictor">Will set <see cref="InitialPredictor"/> to this value if specified</param> | ||
public TrainContext(RoleMappedData trainingSet, RoleMappedData validationSet = null, IPredictor initialPredictor = null) | ||
{ | ||
Contracts.CheckValue(trainingSet, nameof(trainingSet)); | ||
Contracts.CheckValueOrNull(validationSet); | ||
Contracts.CheckValueOrNull(initialPredictor); | ||
|
||
// REVIEW: Should there be code here to ensure that the role mappings between the two are compatible? | ||
// That is, all the role mappings are the same and the columns between them have identical types? | ||
|
||
TrainingSet = trainingSet; | ||
ValidationSet = validationSet; | ||
InitialPredictor = initialPredictor; | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
namespace Microsoft.ML.Runtime | ||
{ | ||
/// <summary> | ||
/// Instances of this class posses information about trainers, in terms of their requirements and capabilities. | ||
/// The intended usage is as the value for <see cref="ITrainer.Info"/>. | ||
/// </summary> | ||
public sealed class TrainerInfo | ||
{ | ||
// REVIEW: Ideally trainers should be able to communicate | ||
// something about the type of data they are capable of being trained | ||
// on, e.g., what ColumnKinds they want, how many of each, of what type, | ||
// etc. This interface seems like the most natural conduit for that sort | ||
// of extra information. | ||
|
||
/// <summary> | ||
/// Whether the trainer needs to see data in normalized form. Only non-parametric learners will tend to produce | ||
/// normalization here. | ||
/// </summary> | ||
public bool NeedNormalization { get; } | ||
|
||
/// <summary> | ||
/// Whether the trainer needs calibration to produce probabilities. As a general rule only trainers that produce | ||
/// binary classifier predictors that also do not have a natural probabilistic interpretation should have a | ||
/// <c>true</c> value here. | ||
/// </summary> | ||
public bool NeedCalibration { get; } | ||
|
||
/// <summary> | ||
/// Whether this trainer could benefit from a cached view of the data. Trainers that have few passes over the | ||
/// data, or that need to build their own custom data structure over the data, will have a <c>false</c> here. | ||
/// </summary> | ||
public bool WantCaching { get; } | ||
|
||
/// <summary> | ||
/// Whether the trainer supports validation sets via <see cref="TrainContext.ValidationSet"/>. Not implementing | ||
/// this interface and returning <c>true</c> from this property is an indication the trainer does not support | ||
/// that. | ||
/// </summary> | ||
public bool SupportsValidation { get; } | ||
|
||
/// <summary> | ||
/// Whether the trainer can support incremental trainers via <see cref="TrainContext.InitialPredictor"/>. Not | ||
/// implementing this interface and returning <c>true</c> from this property is an indication the trainer does | ||
/// not support that. | ||
/// </summary> | ||
public bool SupportsIncrementalTraining { get; } | ||
|
||
/// <summary> | ||
/// Initializes with the given parameters. The parameters have default values for the most typical values | ||
/// for most classical trainers. | ||
/// </summary> | ||
/// <param name="normalization">The value for the property <see cref="NeedNormalization"/></param> | ||
/// <param name="calibration">The value for the property <see cref="NeedCalibration"/></param> | ||
/// <param name="caching">The value for the property <see cref="WantCaching"/></param> | ||
/// <param name="supportValid">The value for the property <see cref="SupportsValidation"/></param> | ||
/// <param name="supportIncrementalTrain">The value for the property <see cref="SupportsIncrementalTraining"/></param> | ||
public TrainerInfo(bool normalization = true, bool calibration = false, bool caching = true, | ||
bool supportValid = false, bool supportIncrementalTrain = false) | ||
{ | ||
NeedNormalization = normalization; | ||
NeedCalibration = calibration; | ||
WantCaching = caching; | ||
SupportsValidation = supportValid; | ||
SupportsIncrementalTraining = supportIncrementalTrain; | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.