Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComponentCatalog design issues #208

Closed
jkotas opened this issue May 23, 2018 · 23 comments
Closed

ComponentCatalog design issues #208

jkotas opened this issue May 23, 2018 · 23 comments
Assignees
Labels
API Issues pertaining the friendly API

Comments

@jkotas
Copy link
Member

jkotas commented May 23, 2018

https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs

  • Enumerates all types in all loaded assemblies. This pattern is known to have poor performance characteristics that lead to long startup time.
  • Enumerates assemblies in application directory. This is not compatible with .NET Core app model. The app assemblies are not guaranteed to be in the application directory in .NET Core (e.g. they can be in one of the shared frameworks or in the assembly cache), or they may not exist at all (single file .exes planned for .NET Core, or .NET Native used for UWP apps).
  • A long list of hardcoded names to skip while enumerating:
    public static string[] FilePrefixesToAvoid = new string[] {

Does ML.NET really need its own dependency injection framework? Would it be worth looking at decoupling the dependency injection from the ML.Core, and ideally using one of the existing dependency injection frameworks instead of inventing yet another one?

@TomFinley
Copy link
Contributor

TomFinley commented May 23, 2018

So what do you think, MEF, or other?

Over the years we've toyed with the idea of replacing the scheme with MEF but never quite got beyond the "yeah might be nice some day" vagueness. @veikkoeeva mentioned the same idea in #136 actually... It has basically the same idea: multiple implementations of types associated with loadnames, even has a so-called DirectoryCatalog for cases where we want to keep the current system, allows other more explicit schemes for cases where you don't.

@jkotas
Copy link
Member Author

jkotas commented May 23, 2018

So what do you think, MEF, or other?

My top concern is to make it possible to use ML.Core without heavy-weight dependency injection frameworks.

I do not have opinion on which dependency injection framework is the right one. I believe that people typically like to have an option to bring their own.

@TomFinley
Copy link
Contributor

I see this is your first issue. Welcome @jkotas ! Sorry, I thought I understood the request, but I see I don't. The usage of a some sort of dependency injection framework is a pretty core central concept to how the entire system is able to work at all. Even in just the code that's in this repo so far, I see some hundreds of components advertising themselves as a loadable component and roughly one hundred calls to the component catalog in one place or another to instantiate the corresponding components; and we haven't even published the command line or GUI interfaces yet. I mean, let's just start with model loading. What's your plan?

@jkotas
Copy link
Member Author

jkotas commented May 23, 2018

I do not have a good feel for the overall ML.Core state yet, so I do not have a good detailed answer to your question.

I think the problem is similar to ASP.NET Core: The lower layers (building blocks) of ASP.NET Core do not have a hard dependency of dependency injection frameworks. The dependency injection frameworks come into play only for the higher levels of ASP.NET Core like MVC. If people want to use ASP.NET without paying the price for the dependency injection frameworks, they can still do that just fine.

@KrzysztofCwalina What do you think about the ML.Core being joined at the hip with a dependency injection framework (its own custom one currently)?

@TomFinley
Copy link
Contributor

Ah OK. Well, again, welcome, and I hope you'll spend some time getting to know the structure of the code a bit more. Let any of us know if you have questions. While I'm sure many libraries (including among them ASP.NET) don't benefit from dependency injection, this library does.

@KrzysztofCwalina
Copy link
Member

Yes, I in general worry that ML.NET has too many potentially independent technologies joined at the hip. I talked to @glebuk about it and we agreed to work on simplifying ML.Core.

@glebuk
Copy link
Contributor

glebuk commented May 25, 2018

ASP.NET and ML.NET are a bit different - the entire ML.NET section of the framework that deals with training and inference relies on having hundreds of independent components (trainers, transforms, loaders, loss functions and so forth) We want to have runtime component binding within the framework.

We should be able to drop a DLL and be able to use it, without having to recompile the common infra. The component catalog allows us to find and load needed components, it also allows us to do customization of a given ML deployment where only needed DLLs are placed.

Moreover, it is a component that allows a runtime binding for higher level APIs, such as GUIs, command line tools and language bindings.

Parts of the framework that do not deal with any components (IDV and related) do not need this, however any time you create a pipeline or try to use any components in any way you would need to load and find them. This is at a pretty low level of the overall architecture.
How should we achieve those goals if we don't have a DI/TLC Component Catalog?

@jkotas
Copy link
Member Author

jkotas commented May 28, 2018

The abstract concept of component catalog is ok. It is https://en.wikipedia.org/wiki/Service_locator_pattern with all its advantages and disadvantages.

The part that is not good is the messy hardcoded algorithm that populates the component catalog. I think that the best way to fix it would be do add explicit methods to register set of components that you would call at the start of the application. For example, the standard learners package can have method like:

public static class StandardLearnersExtensions
{
    public static ComponentCatalog AddStandardLearners(ComponentCatalog componentCatalog);
}

And one would register them by calling this method at the start of the application:

   componentCatalog.AddStandardLearners()

If you would like to keep the model where all services that happen to be present in the app are registered, it should be opt-in by calling an extension method like componentCatalog.AddComponentsByConvention() at the start of the application. And the implementation of AddComponentsByConvention should get the list of the assemblies that the application is composed from by calling proper APIs, not by enumerating files in a directory. The natural way to add .dlls to .NET Core app is by adding their package reference to the app .csproj file, not by manually dropping the .dll to the app directory.

Enumerating files in a directory and applying ad-hoc filter to them is not compatible with .NET Core and other .NET app models. I would not include this option in MI.NET. If you really need it to keep it around for existing users, I would move it into a obsoleted compat library that folks do not get by default.

@jkotas
Copy link
Member Author

jkotas commented May 28, 2018

If it helps, here is an example of how ASP.NET Core implements similar concept: https://github.com/aspnet/Mvc-Private/blob/34e4fbf92df2d181b441bbbde598b68d4f33d8b4/src/Microsoft.AspNetCore.Mvc.Formatters.Json/DependencyInjection/MvcJsonMvcCoreBuilderExtensions.cs#L15
These extension methods return its argument to make it easy to chain them with other similar methods. Typical ASP.NET apps have code like this on the initialization path:

services
    .AddMvcCore()
    .AddJsonFormatters()
    .AddSignalR()

@dsyme
Copy link
Contributor

dsyme commented Jul 31, 2018

@jkotas @KrzysztofCwalina I've given some feedback on how the ComponentCatalog is causing pain for F# scripting here: #600 (comment). It is also the root cause of #401 which required the awkward workaround of forcing artificial loads of some referenced assemblies:

let _load = 
    [ typeof<Microsoft.ML.Runtime.Transforms.TextAnalytics>
      typeof<Microsoft.ML.Runtime.FastTree.FastTree> ]

It is important that ML.NET work cleanly in conjunction with REPL scripting contexts (F# or C# and future possible REPL-hosted scripting in workbooks etc.). The use of DI and service-locator patterns generally makes this more painful, at least in conjunction to "normal" .NET libraries.

@Ivanidzo4ka and @eerhardt have replied with current thinking at #600 (comment)

We should be able to drop a DLL and be able to use it, without having to recompile the common infra. The component catalog allows us to find and load needed components, it also allows us to do customization of a given ML deployment where only needed DLLs are placed.

Moreover, it is a component that allows a runtime binding for higher level APIs, such as GUIs,
command line tools and language bindings.

Parts of the framework that do not deal with any components (IDV and related) do not need this, however any time you create a pipeline or try to use any components in any way you would need to load and find them. This is at a pretty low level of the overall architecture.

While these might have been reasonable problems to solve for TLC, my gut feeling is that these goals are not in scope for ML.NET, except perhaps for the specific "zip" scenario mentioned in #600 (comment). No .NET Core application allows "drop in DLL replacement" deployment models and if we did, we would allow that kind of deployment update for a much broader range of applications.

ML.NET is a machine learning framework for .NET, not a dynamic loader or dependency resolver and/or DI framework.

I'd have different feedback about IDV: that is solving a schematization problem which definitely needs to be solved for a machine learning framework and which, to my knowledge, is not solved elsewhere in the .NET core ecosystem. Ideally the solution would be extracted and used more widely.

@Ivanidzo4ka
Copy link
Contributor

Nothing brings more joy than throwing few principals under wheels of running train. @Zruty0 @TomFinley
I'm sure they have something to say :)

@dsyme
Copy link
Contributor

dsyme commented Jul 31, 2018

Nothing brings more joy than throwing few principals under wheels of running train. @Zruty0 @TomFinley I'm sure they have something to say :)

Well, you're still only at version 0.3. The train just left the first station :)

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 31, 2018

Enumerating files in a directory and applying ad-hoc filter to them is not compatible with .NET Core and other .NET app models. I would not include this option in MI.NET. If you really need it to keep it around for existing users, I would move it into a obsoleted compat library that folks do not get by default.

I agree, this is much cleaner than what we have now.

The 'DI everywhere' is a remnant of our command line background, and, as such, it should be consigned to the separate assembly loaded as a separate package (like Microsoft.ML.CommandLine or something).

The DLL scanner, I believe, should not be in the public domain at all.

We still need to address the need to load a model from a serialized file (the comment to #600 mentioned above).

@eerhardt
Copy link
Member

ComponentCatalog Design Issues

Problems (from Jan Kotas)

  1. Enumerates all types in all loaded assemblies. This pattern is known to have poor performance characteristics that lead to long startup time.
  2. Enumerates assemblies in application directory. This is not compatible with .NET Core app model. The app assemblies are not guaranteed to be in the application directory in .NET Core (e.g. they can be in one of the shared frameworks or in the assembly cache), or they may not exist at all (single file .exes planned for .NET Core, or .NET Native used for UWP apps).
  3. A long list of hardcoded names to skip while enumerating.
  4. Does ML.NET really need its own dependency injection framework? Would it be worth looking at decoupling the dependency injection from the ML.Core, and ideally using one of the existing dependency injection frameworks instead of inventing yet another one?

Discussion

Responses to the above problems:

  1. It doesn't enumerate all types in all loaded assemblies. Instead, it enumerates the assembly.GetCustomAttributes(typeof(LoadableClassAttributeBase)) of all loaded assemblies that reference Microsoft.ML. Still not ideal, but it is better than scanning all types of all assemblies.
  2. I fully agree this needs to change. We can't have this kind of policy this low in the library. If higher level tools (command-line, UI, etc) want to make policies about where/how to load assemblies, they can. But the core ML.NET library cannot have this policy.
  3. Same as above.
  4. This may be splitting hairs, but technically I don't view the ComponentCatalog as a dependency injection framework. It doesn't actually inject dependencies into a graph of objects. Instead, it is simply a catalog of components that can be instantiated. Granted, this is part of what DI frameworks provide (a catalog of components), but DI frameworks do much more. Ex. lifetime policies, injecting objects into graphs of objects.
    ML.NET has some specific requirements that make it hard to use the existing DI frameworks. One example is there is code that doesn't want just any old instance of a "foo" component. It wants a new "foo" component with these specific arguments passed to it. And then a few lines later, it wants another "foo" component with different arguments passed to it. And then later it wants a "bar" component with a completely different set of Types of arguments. I haven't seen this kind of support in existing DI frameworks in the BCL. Typically, existing DI frameworks expect the constructor parameters to be fulfilled by other components in the catalog. Not to be imperatively given by the caller who is requesting the component.
    I feel it is reasonable for ML.NET to have its own "Component Catalog" system for creating components.

Usages of the ComponentCatalog

The main usages of the ComponentCatalog today:

  1. In GUI types of scenarios, where the GUI presents a list of available components to the user.
  2. In the MAML command-line, where the command line uses a string "Name" to reference the component it intends on creating/using.
  3. In the "Entry Points" subsystem, which allows for non-.NET languages to interop with ML.NET (using a JSON document, which uses a string "Name" to reference the component).
  4. In the saving of "model" files, it writes a string "Name" to reference the component to load when the model is loaded.

Note It is also being used in the PipelineInference code to do things like suggest sweepers and "recipes"

For the first three, you could imagine that the ComponentCatalog can sit on top of the "core" ML.NET library. However, usage #4 is a very core concept in ML.NET, thus the catalog cannot be separated from the core library.

One option to solve #4 is to instead write the fully-qualified .NET Type name into the model file. However, the drawback here is that ML.NET would then be locked into that .NET Type forever for that component. The namespace and class name of these types could not be changed without breaking existing models. Using a string "Name" allows ML.NET to change the Type in the future, as necessary. Additionally, solving #4 doesn't solve #1-3, so there still needs to be some sort of ComponentCatalog. We might as well embrace it as a core concept in ML.NET since it solves all usages.

Proposal

  1. The core ML.NET library should stop enumerating/loading all assemblies in the directory where the ML.NET assembly exists. This is a policy that should be decided at a higher layer than the core library. ML.NET will continue to scan loaded assemblies for components to register in the catalog.
    However, this can't just be done without fixing model files. If a model file contains a component that hasn't been registered in the ComponentCatalog, it will fail to load. We can solve this by writing the Assembly.FullName of the component into the model file. Then when loading the model, we attempt to load the assembly (using the full assembly name, not Assembly.LoadFrom(path)), if it hasn't already been registered in the ComponentCatalog.

  2. Alternatively, we could move to a completely explicit registration process by removing both loading assemblies in the current directory and scanning the loaded assemblies. Code external to the ComponentCatalog would be responsible for registering the components up front before attempting to do anything that required the ComponentCatalog (like loading a model). This could be done with a public API like ComponentCatalog.Register(Assembly), which would scan the assembly for components and register them.
    However, this would change the code users need to write, since they would now be responsible for ensuring the components are registered.
    We could have some sort of "automatic registration" process using something like static initializers/construtors. When a component Type gets initialized by .NET, it would register its assembly with the ComponentCatalog. This would not be without issues though, because there will still be types/assemblies that haven't been loaded yet when attempting to load a model. So we could borrow from proposal Get a working build #1 above, and write the Assembly.FullName into the model, to ensure it is loaded before creating the component.

In the short-term my plan is to implement Proposal #1 above to get rid of the worst issues we have today. Proposal #2 can be built on top of Proposal #1 in the future, if we decide that scanning the loaded assemblies is no longer acceptable.

Thoughts? Feedback?

@jkotas
Copy link
Member Author

jkotas commented Sep 14, 2018

loaded assemblies

The set of loaded assemblies returned by AppDomain.GetAssemblies() is 100% deterministic. This API only returns assemblies that have been loaded so far.

This set is not full deterministic, e.g. it can vary based on code optimizations: More aggresive inlining will cause more assemblies to be loaded. I have seen cases where harmless JIT inliner change broke the app because of it caused different set of assemblies to be loaded and the AppDomain.GetAssemblies() did not return what the app expected.

I think the explicit registration and scanning only what is explicitly registered should be the baseline default model.

@eerhardt
Copy link
Member

We can go with the explicit registration model, if that is acceptable to @TomFinley, @Zruty0, and @GalOshri. It will affect the public API and examples.

One option for the explicit registration model is to embrace the Environment class that ML.NET has, and move the ComponentCatalog from being a static class to being an instance member of Environment.

Then we can have extension methods (similar to ASP.NET above) off of Environment to build up the registrations. So for example, a customer who is using some of the "standard" transforms, but also using LightGBM (a learner that does not come standard), they would write the following:

var env = new ConsoleEnvironment()  // existing
    .AddStandardComponents()        // new necessary line
    .AddLightGBMComponents();       // new necessary line

// existing code that interacts with standard components and LightGBM - possibly loading a model file.

Underneath the covers, these extension methods would work like the following:

public static TEnvironment AddStandardComponents<TEnvironment>(this TEnvironment env)
    where TEnvironment : IHostEnvironment  // or HostEnvironmentBase, whichever we decide to expose ComponentCatalog on
{
    env.ComponentCatalog.Register(typeof(TermTransform).Assembly);  // ML.Data
    env.ComponentCatalog.Register(typeof(CategoricalHashTransform).Assembly);  // ML.Transforms
    env.ComponentCatalog.Register(typeof(SdcaMultiClassTrainer).Assembly);  // ML.StandardLearners
}

And then LightGBM would define a similar extension method:

public static TEnvironment AddLightGBMComponents<TEnvironment>(this TEnvironment env)
    where TEnvironment : IHostEnvironment  // or HostEnvironmentBase, whichever we decide to expose ComponentCatalog on
{
    env.ComponentCatalog.Register(typeof(LightGbmBinaryPredictor).Assembly);
}

@eerhardt
Copy link
Member

New Proposal

After chatting with @TomFinley and @Zruty0, we've come up with a proposal that we think will address everyone's concerns. It is basically a hybrid of the above 2 proposals.

  1. We will move ComponentCatalog from being a static class to being an instance member on Environment. This has been a planned refactoring for ML.NET for a while, but hasn't been funded until now.
  2. We will completely remove any implicit scanning for components in ComponentCatalog itself. It will have public APIs to register components, but will not do any registering itself - neither by loading assemblies from disk, nor by scanning loaded assemblies.
  3. Other subsystems (like the GUI, command-line, Entry Points, and model loading) will be responsible for registering the components they require in the manner they require.
  4. During model saving, we will write the Assembly.FullName into the model file. We will then register that assembly with the env.ComponentCatalog when loading the model.
    • Any model that was saved with a previous version of ML.NET, and loaded using the API, will need to explicitly register the components before loading the model. (Or they will need to save the model again with a current version of ML.NET that will save the assembly names.)

Under normal circumstances, API users won't have to explicitly register components with the ComponentCatalog. Using the API to train models won't require looking up components from a catalog - you just create .NET objects like normal. Loading a trained model from disk will register the components inside of it by loading the Assembly and scanning it for LoadableClass assembly attributes.

@jkotas
Copy link
Member Author

jkotas commented Sep 17, 2018

During model saving, we will write the Assembly.FullName into the model file. We will then register that assembly with the env.ComponentCatalog when loading the model

This means that loading a model can trigger execution of arbitrary code that happens to be laying on disk. It has potential security issue if the model can come from untrusted place. Is it a problem?

@eerhardt
Copy link
Member

I think, in general, loading a model from an untrusted place is going to be a problem. I wouldn't recommend it.

As for the issue of loading the assembly using the assembly name, the assembly is going to have to be loadable in the app for it to be loaded (e.g. in .NET Core the TPA/.deps.json, or a custom loader). We won't load the assembly using a file path.

So you'd have to get the malicious assembly on the disk, get it loadable by the app, and then send a malicious model to it.

@eerhardt eerhardt self-assigned this Sep 17, 2018
@eerhardt eerhardt added the API Issues pertaining the friendly API label Sep 17, 2018
@GalOshri
Copy link
Contributor

Under normal circumstances, API users won't have to explicitly register components with the ComponentCatalog.

What are the abnormal circumstances where an API user will have to register components?

@eerhardt
Copy link
Member

What are the abnormal circumstances where an API user will have to register components?

Whenever they are using strings to reference components and their arguments and try to instantiate them. For example, using the ComponentCreation APIs:

/// <summary>
/// Creates a data transform from the 'LoadName{settings}' string.
/// </summary>
public static IDataTransform CreateTransform(this IHostEnvironment env, string settings, IDataView source)
{
Contracts.CheckValue(env, nameof(env));
env.CheckValue(source, nameof(source));
Type factoryType = typeof(IComponentFactory<IDataView, IDataTransform>);
return CreateCore<IDataTransform>(env, factoryType, typeof(SignatureDataTransform), settings, source);
}

@jkotas
Copy link
Member Author

jkotas commented Sep 17, 2018

What are the abnormal circumstances where an API user will have to register components?

Or when you want your app that references ML.NET to be IL Linker friendly.

@veikkoeeva
Copy link
Contributor

Or when you want your app that references ML.NET to be IL Linker friendly.

A note to interested lurkers (like me): It looks to me, the use cases are contrained devices such as mobile phones, Raspberries, but perhaps cloud deployments also. One large use case for linking could we WASM components now that Blazor is even deployed into production.

Dmitry-A pushed a commit to Dmitry-A/machinelearning that referenced this issue Apr 12, 2019
Dmitry-A added a commit that referenced this issue Apr 13, 2019
…ature branch (#3324)

* Initial commit

* ci test build

* forgot to save this one file

* Debug-Intrinsics isn't a valid config, trying windows-x64

* disabled tests for now

* disable tests attempt 2

* initial code push, no history, test project not in the build so is the internal client

* battling with warn as err

* test build

* test change

* make params for MLContext data extensions match ML.NET default names and values; update gitignore; nit rev for Benchmarking.cs (#5)

* Create README.md (#2)

* API folder changes (#6)

* comment out fast forest trainer, per discussion on ML.NET open issue #1983, for now, to run E2E w/o exceptions (#7)

* Make validation data param mandatory; remove GetFirstPipeline sample (#10)

* Make validation data param mandatory; remove GetFirstPipeline sample

* remove deprecated todo

* Create ISSUE_TEMPLATE.md & PULL_REQUEST_TEMPLATE.md (#12)

* Create ISSUE_TEMPLATE.md

* Create PULL_REQUEST_TEMPLATE.md

* NestedObject For pipeline (#14)

* add estimator extensions / catalog; add conversion from external to internal pipeline; transform clean-up; add back in test proj and fix build; refactor trainer ext name mappings (#15)

* Make validation data param mandatory; remove GetFirstPipeline sample

* remove deprecated todo

* add estimator extensions / catalog; add ability to go from external to internal pipeline; a lot of transform clean-up; add back in test proj and get it building; refactor trainer ext name mappings

* corrected the typo in readme (#16)

* make GetNextPipeline API w/ public Pipeline method on PipelineSuggester; write GetNextPipeline API test; fix public Pipeline object serialization; fix header inferencing bug; write test utils for fetching datasets (#18)

* get next pipeline API rev -- refactor API to consume column dimensions, purpose, type, and name instead of available trainers & transforms (#19)

* mark get next pipeline test as ignore for now (#20)

* fix dataview take util bug, add dataview skip util, add some UTs to increase code coverage (#21)

* fix dataview take util bug, add dataview skip util, add some UTs to increase code coverage

* add accuracy threshold on AutoFit test

* add null check to best pipeline on autofit result

* unit test additions (including user input validation testing); dead code removal for code coverage (including KDO & associated utils); misc fixes & revs (#22)

* add trainer extension tests, & misc fixes (#23)

* add estimator extension tests (#24)

* add conversions tests (#25)

* fix multiclass runs & add multiclass autofit UT (#27)

* add basic autofit regression test (#28)

* fix categorical transform bug (sometimes categorical features weren't concatenated to final features); add UT transforms; add PipelineNode equality & tests to serve as AutoML testing infra

* add example to readme (#26)

* add lightgbm args as nested properties (#33)

* fix bug where if one pipeline hyperparam optimization converges, run terminates (#36)

* add open-source headers to files; other nit clean-ups along the way (#35)

* Ungroup Columns in Column Inference (#40)

* Added sequential grouping of columns

* added ungrouping of column option

* reverted the file

* Misc fixes (#39)

* misc fixes -- fix bug where SMAC returning already-seen values; fix param encoding return bug in pipeline object model; nit clean-up AutoFit; return in pipeline suggester when sweeper has no next proposal; null ref fix in public object model pipeline suggester

* fix in BuildPipelineNodePropsLightGbm test, fix / use correct 'newTrainer' variable in PipelneSuggester

* SMAC perf improvement

* Removing the nuget.config and have build.props mention the nuget package sources. (#38)

* Added sequential grouping of columns

* removed nuget.config and have only props mentions the nuget sources

* reverted the file

* transform inferencing concat / ignore fixes (#41)

* make pipeline object model & other public classes internal (#43)

* handle SMAC exception when fewer trees were trained than requested (#44)

* Throw error on incorrect Label name in InferColumns API (#47)

* Added sequential grouping of columns

* reverted the file

* addded infer columns label name checking

* added column detection error

* removed unsed usings

* added quotes

* replace Where with Any clause

* replace Where with Any clause

* Set Nullable Auto params to null values (#50)

* Added sequential grouping of columns

* reverted the file

* added auto params as null

* change to the update fields method

* First public api propsal (#52)

* Includes following
1) Final proposal for 0.1 public API surface
2) Prefeaturization
3) Splitting train data into train and validate when validation data is null
4) Providing end to end samples one each for regression, binaryclassification and multiclass classification

* Incorporating code review feedbacks

* Revert "Set Nullable Auto params to null values" (#53)

* Revert "First public api propsal (#52)"

This reverts commit e4a64cf.

* Revert "Set Nullable Auto params to null values (#50)"

This reverts commit 41c663c.

* AutoFit return type is now an IEnumerable (#55)

AutoFit returns is now an IEnumerable - this enables many good things

Implementing variety of early stopping criteria (See sample)
Early discard of models that are no good. This improves memory usage efficiency. (See sample)
No need to implement a callback to get results back
Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample).

Also templatized the return type for better type safety through out the code.

* misc fixes & test additions, towards 0.1 release (#56)

* Enable UnitTests on build server (#57)

* 1) Making trainer name public (#62)

2) Fixing up samples to reflect it

*  Initial version of CLI tool for mlnet (#61)

* added global tool initial project

* removed unneccesary files, renamed files

* refactoring and added base abstract classes for trainer generator

* removed unused class

* Added classes for transforms

* added transform generate dummy classes

* more refactoring, added first transform

* more refactoring and added classes

* changed the project structure

* restructing added options class

* sln changes

* refactored options to different class:

* added more logic for code generation of class

* misc changes

* reverted file

* added commandline api package

* reverted sample

* added new command line api parser

* added normalization of column names

* Added command defaults and error message

* implementation of all trainers

* changed auto to null

* added all transform generators

* added error handling when args is empty and minor changes due to change in AutoML api names

* changed the name of param

* added new command line options and restructuring code

* renamed proj file and added solution

* Added code to generate usings, Fixed few bugs in the code

* added validation to the command line options

* changed project name

* Bug fixes due to API change in AutoML

* changed directory structure

* added test framework and basic tests

* added more tests

* added improvements to template and error handling

* renamed the estimator name

* fixed test case

* added comments

* added headers

* changed namespace and removed unneccesary properties from project

* Revert "changed namespace and removed unneccesary properties from project"

This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f.

* fixed test cases and renamed namespaces

* cleaned up proj file

* added folder structure

* added symbols/tokens for strings

* added more tests

* review comments

* modified test cases

* review comments

* change in the exception message

* normalized line endings

* made method private static

* simplified range building /optimization

* minor fix

* added header

* added static methods in command where necessary

* nit picks

*  made few methods static

* review comments

* nitpick

* remove line pragmas

* fix test case

* Use better AutiFit overload and ignore Multiclass (#64)

* Upgrading CLI to produce ML.NET V.10 APIs and bunch of Refactoring tasks (#65)

* Added sequential grouping of columns

* reverted the file

* upgrade to v .10 and refactoring

* added null check

* fixed unit tests

* review comments

* removed the settings change

* added regions

* fixed unit tests

* Upgrade ML.NET package to 0.10.0 (#70)

* Change in template to accomodate new API of TextLoader (#72)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* Enable gated check for mlnet.tests (#79)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* added run-tests.proj and referred it in build.proj

* CLI tool - make validation dataset optional and support for crossvalidation in generated code (#83)

* Added sequential grouping of columns

* reverted the file

* bug fixes, more logic to templates to support cross-validate

* formatting and fix type in consolehelper

* Added logic in templates

* revert settings

* benchmarking related changes (#63)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* fix fast forest learner (don't sweep over learning rate) (#88)

* Made changes to Have non-calibrated scoring for binary classifiers (#86)

* Added sequential grouping of columns

* reverted the file

* added calibration workaround

* removed print probability

* reverted settings

* rev ColumnInference API: can take label index; rev output object types; add tests (#89)

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (#99)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* publish nuget (#101)

* use dotnet-internal-temp agent for internal build

* use dotnet-internal feed

* Fix Codegen for columnConvert and ValueToKeyMapping transform and add individual transform tests (#95)

* Added sequential grouping of columns

* reverted the file

* fix usings for type convert

* added transforms tests

* review comments

* When generating usings choose only distinct usings directives (#94)

* Added sequential grouping of columns

* reverted the file

* Added code to have unique strings

* refactoring

* minor fix

* minor fix

* Autofit overloads + cancellation + progress callbacks

1) Introduce AutoFit overloads (basic and advanced)
2) AutoFit Cancellation
3) AutoFit progress callbacks

* Default the kfolds to value 5 in CLI generated code (#115)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* remove file

* added kfold param and defaulted to value

* changed type

* added for regression

* Remove extra ; from generated code (#114)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* removed extra ; from generated code

* removed file

* fix unit tests

* TimeoutInSeconds (#116)

Specifying timeout in seconds instead of minutes

* Added more command line args implementation to CLI tool and refactoring (#110)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* added git status

* reverted change

* added codegen options and refactoring

* minor fixes'

* renamed params, minor refactoring

* added tests for commandline and refactoring

* removed file

* added back the test case

* minor fixes

* Update src/mlnet.Test/CommandLineTests.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* review comments

*  capitalize the first character

* changed the name of test case

* remove unused directives

* Fail gracefully if unable to instantiate data view with swept parameters (#125)

* gracefully fail if fail to parse a datai

* rev

* validate AutoFit 'Features' column must be of type R4 (#132)

* Samples: exceptions / nits (#124)

* Logging support in CLI + Implementation of cmd args [--name,--output,--verbosity] (#121)

* addded logging and helper methods

* fixing code after merge

* added resx files, added logger framework, added logging messages

* added new options

* added spacing

* minor fixes

* change command description

* rename option, add headers, include new param in test

* formatted

* build fix

*  changed option name

* Added NlogConfig file

* added back config package

* fix tests

* added correct validation check (#137)

* Use CreateTextLoader<T>(..)  instead of CreateTextLoader(..) (#138)

* added support to loaddata by class in the generated code

* fix tests

* changed CreateTextLoader to ReadFromTextFile method. (#140)

* changed textloader to readfromtextfile method

* formatting

* exception fixes (#136)

* infer purpose of hidden columns as 'ignore' (#142)

* Added approval tests and bunch of refactoring of code and normalizing namespaces (#148)

* changed textloader to readfromtextfile method

* formatting

* added approval tests and refactoring of code

* removed few comments

* API 2.0 skeleton (#149)

Incorporating API review feedback

* The CV code should come before the training when there is no test dataset in generated code (#151)

* reorder cv code

* build fix

* fixed structure

* Format the generated code + bunch of misc tasks (#152)

* added formatting and minor changes for reordering cv

* fixing the template

* minor changes

* formatting changes

* fixed approval test

* removed unused nuget

* added missing value replacing

* added test for new transform

* fix test

* Update src/mlnet/Templates/Console/MLCodeGen.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Sanitize the column names in CLI (#162)

* added sanitization layer in CLI

* fix test

* changed exception.StackTrace to exception.ToString()

* fix package name (#168)

* Rev public API (#163)

* Rename TransformGeneratorBase .cs to TransformGeneratorBase.cs (#153)

* Fix minor version for the repository + remove Nlog config package (#171)

*  changed the minor version

* removed the nlog config package

* Added new test to columninfo and fixing up API (#178)

* Make optimizing metric customizable and add trainer whitelist functionality (#172)

* API rev (#181)

* propagate root MLContext thru AutoML (instead of creating our own) (#182)

* Enabling new command line args (#183)

* fix package name

* initial commit

* added more commandline args

* fixed tests

* added headers

* fix tests

* fix test

* rename 'AutoFitter' to 'Experiment' (#169)

* added tests (#187)

* rev InferColumns to accept ColumnInfo input param (#186)

* Implement argument --has-header and change usage of dataset (#194)

* added has header and fixed dataset and train dataset

* fix tests

* removed dummy command (#195)

* Fix bug for regression and sanitize input label from user (#198)

* removed dummy command

* sanitize label and fix template

* fix tests

* Do not generate code concatenating columns when the dataset has a single feature column (#191)

* Include some missed logging in the generated code.  (#199)

* added logging messages for generated code

* added log messages

* deleted file

* cleaning up proj files (#185)

* removed platform target

* removed platform target

* Some spaces and extra lines + bug in output path  (#204)

* nit picks

* nit picks

* fix test

* accept label from user input and provide in generated code (#205)

* Rev handling of weight / label columns (#203)

* migrate to private ML.NET nuget for latest bug fixes (#131)

* fix multiclass with nonstandard label (#207)

* Multiclass nondefault label test (#208)

* printing escaped chars + bug (#212)

* delete unused internal samples (#211)

* fix SMAC bug that causes multiclass sample to infinite loop (#209)

* Rev user input validation for new API (#210)

* added console message for exit and nit picks (#215)

* exit when exception encountered (#216)

* Seal API classes (and make EnableCaching internal) (#217)

* Suggested sample nits (feel free to ask for any of these to be reverted) (#219)

* User input column type validation (#218)

* upgrade commandline and renaming (#221)

* upgrade commandline and renaming

* renaming fields

* Make build.sh, init-tools.sh, & run.sh executable on OSX/Linux (#225)

*  CLI argument descriptions updated (#224)

* CLI argument descriptions updated

* No version in .csproj

* added flag to disable training code (#227)

* Exit if perfect model produced (#220)

* removed header (#228)

* removed header

* added auto generated header

* removed console read key (#229)

* Fix model path in generated file (#230)

* removed console read key

* fix model path

* fix test

* reorder samples (#231)

* remove rule that infers column purpose as categorical if # of distinct values is < 100 (#233)

* Null reference exception fix for finding best model when some runs have failed (#239)

* samples fixes (#238)

* fix for defaulting Averaged Perceptron # of iterations to 10 (#237)

* Bug bash feedback Feb 27. API changes and sample changes (#240)

* Bug bash feedback Feb 27. 
API changes 
Sample changes
Exception fix

* Samples / API rev from 2/27 bug bash feedback (#242)

* changed the directory structure for generated project (#243)

* changed the directory structure for generated project

* changed test

* upgraded commandline package

* Fix test file locations on OSX (#235)

* fix test file locations on OSX

* changing to Path.Combine()

* Additional Path.Combine()

* Remove ConsoleCodeGeneratorTests.GeneratedTrainCodeTest.received.txt

* Additional Path.Combine()

* add back in double comparison fix

* remove metrics agent NaN returns

* test fix

* test format fix

* mock out path

Thanks to @daholste for additional fixes!

* upgrade to latest ML.NET public surface (#246)

* Upgrade to ML.NET 0.11 (#247)

* initial changes

* fix lightgbm

* changed normalize method

* added tests

* fix tests

* fix test

* Private preview final API changes (#250)

* .NET framework design guidelines applied to public surface
* WhitelistedTrainers -> Trainers

* Add estimator to public API iteration result (#248)

* LightGBM pipeline serialization fix (#251)

* Change order that we search for TextLoader's parameters (#256)

* CLI IFileInfo null exception fix (#254)

* Averaged Perceptron pipeline serialization fix (#257)

* Upgrade command-line-api and default folder name change (#258)

* change in defautl folderName

* upgrade command line

* Update src/mlnet/Program.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* eliminate IFileInfo from CLI (#260)

* Rev samples towards private preview; ignored columns fix (#259)

* remove unused methods in consolehelper and nit picks in generated code (#261)

* nit picks

* change in console helper

* fix tests

* add space

* fix tests

* added nuget sources in generated csproj (#262)

* added nuget sources in csproj

* changed the structure in generated code

* space

* upgrade to mlnet 0.11 (#263)

* Formatting CLI metrics (#264)

Ensures space between printed metrics (also model counter). Right aligned metrics. Extended AUC to four digits.

* Add implementation of non -ova multi class trainers code gen (#267)

* added non ova multi class learners

* added tests

* test cases

* Add caching (#249)

* AdvancedExperimentSettings sample nits (#265)

* Add sampling key column (#268)

* Initial work for multi-class classification support for CLI (#226)

* Initial work for multi-class classification support for CLI

* String updates

* more strings

* Whitelist non-OVA multi-class learners

* Refactor the orchestration of AutoML calls (#272)

* Do not auto-group columns with suggested purpose = 'Ignore' (#273)

* Fix: during type inferencing, parse whitespace strings as NaN (#271)

* Printing additional metrics in CLI for binary classification (#274)

* Printing additional metrics in CLI for binary classification

* Update src/mlnet/Utilities/ConsolePrinter.cs

* Add API option to store models on disk (instead of in memory); fix IEstimator memory leak (#269)

* Print failed iterations in CLI (#275)

* change the type to float from double (#277)

* cache arg implementation in CLI (#280)

* cache implementation

* corrected the null case

* added tests for all cases

* Remove duplicate value-to-key mapping transform for multiclass string labels (#283)

* Add post-trainer transform SDK infra; add KeyToValueMapping transform to CLI; fix: for generated multiclass models, convert predicted label from key to original label column type (#286)

* Implement ignore columns command line arg (#290)

* normalize line endings

* added --ignore-columns

* null checks

* unit tests

* Print winning iteration and runtime in CLI (#288)

* Print best metric and runtime

* Print best metric and runtime

* Line endings in AutoMLEngine.cs

* Rename time column to duration to match Python SDK

* Revert to MicroAccuracy and MacroAccuracy spellings

* Revert spelling of BinaryClassificationMetricsAgent to BinaryMetricsAgent to reduce merge conflicts

* Revert spelling of MulticlassMetricsAgent to MultiMetricsAgent to reduce merge conflicts

* missed some files

* Fix merge conflict

* Update AutoMLEngine.cs

* Add MacOS & Linux to CI; MacOS & Linux test fixes (#293)

* MicroAccuracy as default for multi-class (#295)

Change default optimization metric for multi-class classification to MicroAccuracy (accuracy). Previously it was set to MacroAccuracy.

* Null exception for ignorecolumns in CLI (#294)

* Null exception for ignorecolumns in CLI

* Check if ignore-columns array has values (as the default is now a empty array)

* Emit caching flag in pipeline object model. (Includes SuggestedPipelineBuilder refactor & debug string fixes / refactor) (#296)

* removed sln (#297)

* Caching enabling in code gen part -2 (#298)

* add

* added caching codegen

* support comma separated values for --ignore-columns (#300)

* default initialization for ignore columns (#302)

* default initialization

* adde null check

* Codegen for multiclass non-ova (#303)

* changes to template

* multicalss codegen

* test cases

* fix test cases

* Generated Project new structure. (#305)

* added new templates

* writing files to disck

* change path

* added new templates

* misisng braces

* fix bugs

* format code

* added util methods for solution file creation and addition of projects to it

* added extra packages to project files

* new tests

* added correct path for sln

* build fix

* fix build

* include using system in prediction class (#307)

* added using

* fix test

* Random number generator is not thread safe (#310)

* Random number generator is not thread safe

* Another local random generator

* Missed a few references

* Referncing AutoMlUtils.random instead of a local RNG

* More refs to mail RNG; remove Float as per #1669

* Missed Random.cs

* Fix multiclass code gen (#314)

* compile error in codegen

* removes scores printing

* fix bugs

* fix test

* Fix compile error in codegen project (#319)

* removed redundant code

* fix test case

* Rev OVA pipeline node SDK output: wrap binary trainers as children inside parent OVA node (#317)

* Ova Multi class codegen support (#321)

* dummy

* multiova implementation

* fix tests

* remove inclusion list

* fix tests and console helper

* Rev run result trainer name for OVA: output different trainer name for each OVA + binary learner combination (#322)

* Rev run result trainer name for Ova: output different trainer name for each Ova + binary learner combination

* test fixes

* Console helper bug in generated code for multiclass (#323)

* fix

* fix test

* looping perlogclass

* fix test

* Initial version of Progress bar impl and CLI UI experience (#325)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* Setting model directory to temp directory (#327)

* Suggested changes to progress bar (#335)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* Rev Samples (#334)

* Telemetry2 (#333)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* CLI telemetry implementation

* Telemetry implementation

* delete unnecessary file and change file size bucket to actually log log2 instead of nearest ceil value

* add headers, remove comments

* one more header missing

* Fix progress bar in linux/osx (#336)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* change from task to thread

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Mem leak fix (#328)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* there is still investigation to be done but this fix works and solves memory leak problems

* minor refactor

* Upgrade ML.NET package (#343)

* Add cross-validation (CV), and auto-CV for small datasets; push common API experiment methods into base class (#287)

* restore old yml for internal pipeline so we can publish nuget again to devdiv stream (#344)

* Polishing the CLI UI part-1 (#338)

* formatting of pbar message

* Polishing the UI

* optimization

* rename variable

* Update src/mlnet/AutoML/AutoMLEngine.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* new message

* changed hhtp to https

* added iteration num + 1

* change string name and add color to artifacts

* change the message

* build errors

* added null checks

* added exception messsages to log file

* added exception messsages to log file

* CLI ML.NET version upgrade (#345)

* Sample revs; ColumnInformation property name revs; pre-featurizer fixes (#346)

* CLI -- consume logs from AutoML SDK (#349)

* Rename RunDetails --> RunDetail (#350)

* command line api upgrade and progress bar rendering bug (#366)

* added fix for all platforms progress bar

* upgrade nuget

* removed args from writeline

* change in the version (#368)

* fix few bugs in progressbar and verbosity (#374)

* fix few bugs in progressbar and verbosity

* removed unused name space

* Fix for folders with space in it while generating project (#376)

* support for folders with spaces

* added support for paths with space

* revert file

* change name of var

* remove spaces

* SMAC fix for minimizing metrics (#363)

* Formatting Regression metrics and progress bar display days. (#379)

* added progress bar day display and fix regression metrics

* fix formatting

* added total time

* formatted total time

* change command name and add pbar message (#380)

* change command name and add pbar message

* fix tests

* added aliases

* duplicate alias

* added another alias for task

* UI missing features (#382)

* added formatting changes

* added accuracy specifically

* downgrade the codepages (#384)

* Change in project structure (#385)

* initial changes

* Change in project structure

* correcting test

* change variable name

* fix tests

* fix tests

* fix more tests

* fix codegen errors

* adde log file message

* changed name of args

* change variable names

* fix test

* FileSizeBuckets in correct units (#387)

* Minor telemetry change to log in correct units and make our life easier in the future

* Use Ceiling instead of Round

* changed order (#388)

* prep work to transfer to ml.net (#389)

* move test projects to top level test subdir

* rename some projects to make naming consistent and make it build again

* fix test project refs

* Add AutoML components to build, fix issues related to that so it builds
Dmitry-A pushed a commit to Dmitry-A/machinelearning that referenced this issue Aug 22, 2019
harishsk added a commit that referenced this issue Sep 6, 2019
* Fixed build errors resulting from upgrade to VS2019 compilers

* Added additional message describing the previous fix

* Syncing upstream fork (#10)

* Throw error on incorrect Label name in InferColumns API (#47)

* Added sequential grouping of columns

* reverted the file

* addded infer columns label name checking

* added column detection error

* removed unsed usings

* added quotes

* replace Where with Any clause

* replace Where with Any clause

* Set Nullable Auto params to null values (#50)

* Added sequential grouping of columns

* reverted the file

* added auto params as null

* change to the update fields method

* First public api propsal (#52)

* Includes following
1) Final proposal for 0.1 public API surface
2) Prefeaturization
3) Splitting train data into train and validate when validation data is null
4) Providing end to end samples one each for regression, binaryclassification and multiclass classification

* Incorporating code review feedbacks

* Revert "Set Nullable Auto params to null values" (#53)

* Revert "First public api propsal (#52)"

This reverts commit e4a64cf4aeab13ee9e5bf0efe242da3270241bd7.

* Revert "Set Nullable Auto params to null values (#50)"

This reverts commit 41c663cd14247d44022f40cf2dce5977dbab282d.

* AutoFit return type is now an IEnumerable (#55)

AutoFit returns is now an IEnumerable - this enables many good things

Implementing variety of early stopping criteria (See sample)
Early discard of models that are no good. This improves memory usage efficiency. (See sample)
No need to implement a callback to get results back
Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample).

Also templatized the return type for better type safety through out the code.

* misc fixes & test additions, towards 0.1 release (#56)

* Enable UnitTests on build server (#57)

* 1) Making trainer name public (#62)

2) Fixing up samples to reflect it

*  Initial version of CLI tool for mlnet (#61)

* added global tool initial project

* removed unneccesary files, renamed files

* refactoring and added base abstract classes for trainer generator

* removed unused class

* Added classes for transforms

* added transform generate dummy classes

* more refactoring, added first transform

* more refactoring and added classes

* changed the project structure

* restructing added options class

* sln changes

* refactored options to different class:

* added more logic for code generation of class

* misc changes

* reverted file

* added commandline api package

* reverted sample

* added new command line api parser

* added normalization of column names

* Added command defaults and error message

* implementation of all trainers

* changed auto to null

* added all transform generators

* added error handling when args is empty and minor changes due to change in AutoML api names

* changed the name of param

* added new command line options and restructuring code

* renamed proj file and added solution

* Added code to generate usings, Fixed few bugs in the code

* added validation to the command line options

* changed project name

* Bug fixes due to API change in AutoML

* changed directory structure

* added test framework and basic tests

* added more tests

* added improvements to template and error handling

* renamed the estimator name

* fixed test case

* added comments

* added headers

* changed namespace and removed unneccesary properties from project

* Revert "changed namespace and removed unneccesary properties from project"

This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f.

* fixed test cases and renamed namespaces

* cleaned up proj file

* added folder structure

* added symbols/tokens for strings

* added more tests

* review comments

* modified test cases

* review comments

* change in the exception message

* normalized line endings

* made method private static

* simplified range building /optimization

* minor fix

* added header

* added static methods in command where necessary

* nit picks

*  made few methods static

* review comments

* nitpick

* remove line pragmas

* fix test case

* Use better AutiFit overload and ignore Multiclass (#64)

* Upgrading CLI to produce ML.NET V.10 APIs and bunch of Refactoring tasks (#65)

* Added sequential grouping of columns

* reverted the file

* upgrade to v .10 and refactoring

* added null check

* fixed unit tests

* review comments

* removed the settings change

* added regions

* fixed unit tests

* Upgrade ML.NET package to 0.10.0 (#70)

* Change in template to accomodate new API of TextLoader (#72)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* Enable gated check for mlnet.tests (#79)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* added run-tests.proj and referred it in build.proj

* CLI tool - make validation dataset optional and support for crossvalidation in generated code (#83)

* Added sequential grouping of columns

* reverted the file

* bug fixes, more logic to templates to support cross-validate

* formatting and fix type in consolehelper

* Added logic in templates

* revert settings

* benchmarking related changes (#63)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* fix fast forest learner (don't sweep over learning rate) (#88)

* Made changes to Have non-calibrated scoring for binary classifiers (#86)

* Added sequential grouping of columns

* reverted the file

* added calibration workaround

* removed print probability

* reverted settings

* rev ColumnInference API: can take label index; rev output object types; add tests (#89)

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (#99)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* publish nuget (#101)

* use dotnet-internal-temp agent for internal build

* use dotnet-internal feed

* Fix Codegen for columnConvert and ValueToKeyMapping transform and add individual transform tests (#95)

* Added sequential grouping of columns

* reverted the file

* fix usings for type convert

* added transforms tests

* review comments

* When generating usings choose only distinct usings directives (#94)

* Added sequential grouping of columns

* reverted the file

* Added code to have unique strings

* refactoring

* minor fix

* minor fix

* Autofit overloads + cancellation + progress callbacks

1) Introduce AutoFit overloads (basic and advanced)
2) AutoFit Cancellation
3) AutoFit progress callbacks

* Default the kfolds to value 5 in CLI generated code (#115)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* remove file

* added kfold param and defaulted to value

* changed type

* added for regression

* Remove extra ; from generated code (#114)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* removed extra ; from generated code

* removed file

* fix unit tests

* TimeoutInSeconds (#116)

Specifying timeout in seconds instead of minutes

* Added more command line args implementation to CLI tool and refactoring (#110)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* added git status

* reverted change

* added codegen options and refactoring

* minor fixes'

* renamed params, minor refactoring

* added tests for commandline and refactoring

* removed file

* added back the test case

* minor fixes

* Update src/mlnet.Test/CommandLineTests.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* review comments

*  capitalize the first character

* changed the name of test case

* remove unused directives

* Fail gracefully if unable to instantiate data view with swept parameters (#125)

* gracefully fail if fail to parse a datai

* rev

* validate AutoFit 'Features' column must be of type R4 (#132)

* Samples: exceptions / nits (#124)

* Logging support in CLI + Implementation of cmd args [--name,--output,--verbosity] (#121)

* addded logging and helper methods

* fixing code after merge

* added resx files, added logger framework, added logging messages

* added new options

* added spacing

* minor fixes

* change command description

* rename option, add headers, include new param in test

* formatted

* build fix

*  changed option name

* Added NlogConfig file

* added back config package

* fix tests

* added correct validation check (#137)

* Use CreateTextLoader<T>(..)  instead of CreateTextLoader(..) (#138)

* added support to loaddata by class in the generated code

* fix tests

* changed CreateTextLoader to ReadFromTextFile method. (#140)

* changed textloader to readfromtextfile method

* formatting

* exception fixes (#136)

* infer purpose of hidden columns as 'ignore' (#142)

* Added approval tests and bunch of refactoring of code and normalizing namespaces (#148)

* changed textloader to readfromtextfile method

* formatting

* added approval tests and refactoring of code

* removed few comments

* API 2.0 skeleton (#149)

Incorporating API review feedback

* The CV code should come before the training when there is no test dataset in generated code (#151)

* reorder cv code

* build fix

* fixed structure

* Format the generated code + bunch of misc tasks (#152)

* added formatting and minor changes for reordering cv

* fixing the template

* minor changes

* formatting changes

* fixed approval test

* removed unused nuget

* added missing value replacing

* added test for new transform

* fix test

* Update src/mlnet/Templates/Console/MLCodeGen.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Sanitize the column names in CLI (#162)

* added sanitization layer in CLI

* fix test

* changed exception.StackTrace to exception.ToString()

* fix package name (#168)

* Rev public API (#163)

* Rename TransformGeneratorBase .cs to TransformGeneratorBase.cs (#153)

* Fix minor version for the repository + remove Nlog config package (#171)

*  changed the minor version

* removed the nlog config package

* Added new test to columninfo and fixing up API (#178)

* Make optimizing metric customizable and add trainer whitelist functionality (#172)

* API rev (#181)

* propagate root MLContext thru AutoML (instead of creating our own) (#182)

* Enabling new command line args (#183)

* fix package name

* initial commit

* added more commandline args

* fixed tests

* added headers

* fix tests

* fix test

* rename 'AutoFitter' to 'Experiment' (#169)

* added tests (#187)

* rev InferColumns to accept ColumnInfo input param (#186)

* Implement argument --has-header and change usage of dataset (#194)

* added has header and fixed dataset and train dataset

* fix tests

* removed dummy command (#195)

* Fix bug for regression and sanitize input label from user (#198)

* removed dummy command

* sanitize label and fix template

* fix tests

* Do not generate code concatenating columns when the dataset has a single feature column (#191)

* Include some missed logging in the generated code.  (#199)

* added logging messages for generated code

* added log messages

* deleted file

* cleaning up proj files (#185)

* removed platform target

* removed platform target

* Some spaces and extra lines + bug in output path  (#204)

* nit picks

* nit picks

* fix test

* accept label from user input and provide in generated code (#205)

* Rev handling of weight / label columns (#203)

* migrate to private ML.NET nuget for latest bug fixes (#131)

* fix multiclass with nonstandard label (#207)

* Multiclass nondefault label test (#208)

* printing escaped chars + bug (#212)

* delete unused internal samples (#211)

* fix SMAC bug that causes multiclass sample to infinite loop (#209)

* Rev user input validation for new API (#210)

* added console message for exit and nit picks (#215)

* exit when exception encountered (#216)

* Seal API classes (and make EnableCaching internal) (#217)

* Suggested sample nits (feel free to ask for any of these to be reverted) (#219)

* User input column type validation (#218)

* upgrade commandline and renaming (#221)

* upgrade commandline and renaming

* renaming fields

* Make build.sh, init-tools.sh, & run.sh executable on OSX/Linux (#225)

*  CLI argument descriptions updated (#224)

* CLI argument descriptions updated

* No version in .csproj

* added flag to disable training code (#227)

* Exit if perfect model produced (#220)

* removed header (#228)

* removed header

* added auto generated header

* removed console read key (#229)

* Fix model path in generated file (#230)

* removed console read key

* fix model path

* fix test

* reorder samples (#231)

* remove rule that infers column purpose as categorical if # of distinct values is < 100 (#233)

* Null reference exception fix for finding best model when some runs have failed (#239)

* samples fixes (#238)

* fix for defaulting Averaged Perceptron # of iterations to 10 (#237)

* Bug bash feedback Feb 27. API changes and sample changes (#240)

* Bug bash feedback Feb 27. 
API changes 
Sample changes
Exception fix

* Samples / API rev from 2/27 bug bash feedback (#242)

* changed the directory structure for generated project (#243)

* changed the directory structure for generated project

* changed test

* upgraded commandline package

* Fix test file locations on OSX (#235)

* fix test file locations on OSX

* changing to Path.Combine()

* Additional Path.Combine()

* Remove ConsoleCodeGeneratorTests.GeneratedTrainCodeTest.received.txt

* Additional Path.Combine()

* add back in double comparison fix

* remove metrics agent NaN returns

* test fix

* test format fix

* mock out path

Thanks to @daholste for additional fixes!

* upgrade to latest ML.NET public surface (#246)

* Upgrade to ML.NET 0.11 (#247)

* initial changes

* fix lightgbm

* changed normalize method

* added tests

* fix tests

* fix test

* Private preview final API changes (#250)

* .NET framework design guidelines applied to public surface
* WhitelistedTrainers -> Trainers

* Add estimator to public API iteration result (#248)

* LightGBM pipeline serialization fix (#251)

* Change order that we search for TextLoader's parameters (#256)

* CLI IFileInfo null exception fix (#254)

* Averaged Perceptron pipeline serialization fix (#257)

* Upgrade command-line-api and default folder name change (#258)

* change in defautl folderName

* upgrade command line

* Update src/mlnet/Program.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* eliminate IFileInfo from CLI (#260)

* Rev samples towards private preview; ignored columns fix (#259)

* remove unused methods in consolehelper and nit picks in generated code (#261)

* nit picks

* change in console helper

* fix tests

* add space

* fix tests

* added nuget sources in generated csproj (#262)

* added nuget sources in csproj

* changed the structure in generated code

* space

* upgrade to mlnet 0.11 (#263)

* Formatting CLI metrics (#264)

Ensures space between printed metrics (also model counter). Right aligned metrics. Extended AUC to four digits.

* Add implementation of non -ova multi class trainers code gen (#267)

* added non ova multi class learners

* added tests

* test cases

* Add caching (#249)

* AdvancedExperimentSettings sample nits (#265)

* Add sampling key column (#268)

* Initial work for multi-class classification support for CLI (#226)

* Initial work for multi-class classification support for CLI

* String updates

* more strings

* Whitelist non-OVA multi-class learners

* Refactor the orchestration of AutoML calls (#272)

* Do not auto-group columns with suggested purpose = 'Ignore' (#273)

* Fix: during type inferencing, parse whitespace strings as NaN (#271)

* Printing additional metrics in CLI for binary classification (#274)

* Printing additional metrics in CLI for binary classification

* Update src/mlnet/Utilities/ConsolePrinter.cs

* Add API option to store models on disk (instead of in memory); fix IEstimator memory leak (#269)

* Print failed iterations in CLI (#275)

* change the type to float from double (#277)

* cache arg implementation in CLI (#280)

* cache implementation

* corrected the null case

* added tests for all cases

* Remove duplicate value-to-key mapping transform for multiclass string labels (#283)

* Add post-trainer transform SDK infra; add KeyToValueMapping transform to CLI; fix: for generated multiclass models, convert predicted label from key to original label column type (#286)

* Implement ignore columns command line arg (#290)

* normalize line endings

* added --ignore-columns

* null checks

* unit tests

* Print winning iteration and runtime in CLI (#288)

* Print best metric and runtime

* Print best metric and runtime

* Line endings in AutoMLEngine.cs

* Rename time column to duration to match Python SDK

* Revert to MicroAccuracy and MacroAccuracy spellings

* Revert spelling of BinaryClassificationMetricsAgent to BinaryMetricsAgent to reduce merge conflicts

* Revert spelling of MulticlassMetricsAgent to MultiMetricsAgent to reduce merge conflicts

* missed some files

* Fix merge conflict

* Update AutoMLEngine.cs

* Add MacOS & Linux to CI; MacOS & Linux test fixes (#293)

* MicroAccuracy as default for multi-class (#295)

Change default optimization metric for multi-class classification to MicroAccuracy (accuracy). Previously it was set to MacroAccuracy.

* Null exception for ignorecolumns in CLI (#294)

* Null exception for ignorecolumns in CLI

* Check if ignore-columns array has values (as the default is now a empty array)

* Emit caching flag in pipeline object model. (Includes SuggestedPipelineBuilder refactor & debug string fixes / refactor) (#296)

* removed sln (#297)

* Caching enabling in code gen part -2 (#298)

* add

* added caching codegen

* support comma separated values for --ignore-columns (#300)

* default initialization for ignore columns (#302)

* default initialization

* adde null check

* Codegen for multiclass non-ova (#303)

* changes to template

* multicalss codegen

* test cases

* fix test cases

* Generated Project new structure. (#305)

* added new templates

* writing files to disck

* change path

* added new templates

* misisng braces

* fix bugs

* format code

* added util methods for solution file creation and addition of projects to it

* added extra packages to project files

* new tests

* added correct path for sln

* build fix

* fix build

* include using system in prediction class (#307)

* added using

* fix test

* Random number generator is not thread safe (#310)

* Random number generator is not thread safe

* Another local random generator

* Missed a few references

* Referncing AutoMlUtils.random instead of a local RNG

* More refs to mail RNG; remove Float as per https://github.com/dotnet/machinelearning/issues/1669

* Missed Random.cs

* Fix multiclass code gen (#314)

* compile error in codegen

* removes scores printing

* fix bugs

* fix test

* Fix compile error in codegen project (#319)

* removed redundant code

* fix test case

* Rev OVA pipeline node SDK output: wrap binary trainers as children inside parent OVA node (#317)

* Ova Multi class codegen support (#321)

* dummy

* multiova implementation

* fix tests

* remove inclusion list

* fix tests and console helper

* Rev run result trainer name for OVA: output different trainer name for each OVA + binary learner combination (#322)

* Rev run result trainer name for Ova: output different trainer name for each Ova + binary learner combination

* test fixes

* Console helper bug in generated code for multiclass (#323)

* fix

* fix test

* looping perlogclass

* fix test

* Initial version of Progress bar impl and CLI UI experience (#325)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* Setting model directory to temp directory (#327)

* Suggested changes to progress bar (#335)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* Rev Samples (#334)

* Telemetry2 (#333)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* CLI telemetry implementation

* Telemetry implementation

* delete unnecessary file and change file size bucket to actually log log2 instead of nearest ceil value

* add headers, remove comments

* one more header missing

* Fix progress bar in linux/osx (#336)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* change from task to thread

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Mem leak fix (#328)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* there is still investigation to be done but this fix works and solves memory leak problems

* minor refactor

* Upgrade ML.NET package (#343)

* Add cross-validation (CV), and auto-CV for small datasets; push common API experiment methods into base class (#287)

* restore old yml for internal pipeline so we can publish nuget again to devdiv stream (#344)

* Polishing the CLI UI part-1 (#338)

* formatting of pbar message

* Polishing the UI

* optimization

* rename variable

* Update src/mlnet/AutoML/AutoMLEngine.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* new message

* changed hhtp to https

* added iteration num + 1

* change string name and add color to artifacts

* change the message

* build errors

* added null checks

* added exception messsages to log file

* added exception messsages to log file

* CLI ML.NET version upgrade (#345)

* Sample revs; ColumnInformation property name revs; pre-featurizer fixes (#346)

* CLI -- consume logs from AutoML SDK (#349)

* Rename RunDetails --> RunDetail (#350)

* command line api upgrade and progress bar rendering bug (#366)

* added fix for all platforms progress bar

* upgrade nuget

* removed args from writeline

* change in the version (#368)

* fix few bugs in progressbar and verbosity (#374)

* fix few bugs in progressbar and verbosity

* removed unused name space

* Fix for folders with space in it while generating project (#376)

* support for folders with spaces

* added support for paths with space

* revert file

* change name of var

* remove spaces

* SMAC fix for minimizing metrics (#363)

* Formatting Regression metrics and progress bar display days. (#379)

* added progress bar day display and fix regression metrics

* fix formatting

* added total time

* formatted total time

* change command name and add pbar message (#380)

* change command name and add pbar message

* fix tests

* added aliases

* duplicate alias

* added another alias for task

* UI missing features (#382)

* added formatting changes

* added accuracy specifically

* downgrade the codepages (#384)

* Change in project structure (#385)

* initial changes

* Change in project structure

* correcting test

* change variable name

* fix tests

* fix tests

* fix more tests

* fix codegen errors

* adde log file message

* changed name of args

* change variable names

* fix test

* FileSizeBuckets in correct units (#387)

* Minor telemetry change to log in correct units and make our life easier in the future

* Use Ceiling instead of Round

* changed order (#388)

* prep work to transfer to ml.net (#389)

* move test projects to top level test subdir

* rename some projects to make naming consistent and make it build again

* fix test project refs

* Add AutoML components to build, fix issues related to that so it builds

* fix test cases, remove AppInsights ref from AutoML (#3329)

* [AutoML] disable netfx build leg for now (#3331)

* disable netfx build leg for now

* disable netfx build leg for now.

* [AutoML] Add AutoML XML documentation to all public members; migrate AutoML projects & tests into ML.NET solution; AutoML test fixes (#3351)

* [AutoML] Rev AutoML public API; add required native references to AutoML projects (#3364)

* [AutoML] Minor changes to generated project in CLI based on feedback (#3371)

* nitpicks for generated project

* revert back the target framework

* [AutoML] Migrate AutoML back to its own solution, w/ NuGet dependencies (#3373)

* Migrate AutoML back to its own solution, w/ NuGet dependencies

* build project updates; parameter name revert

* dummy change

* Revert "dummy change"

This reverts commit 3e8574266f556a4d5b6805eb55b4d8b8b84cf355.

* [AutoML] publish AutoML package (#3383)

* publish AutoML package

* Only leave automl and mlnet tests to run

* publish AutoML package

* Only leave automl and mlnet tests to run

* fix build issues when ml.net is not building

* bump version to 0.3 since that's the one we're going to ship for build (#3416)

* [AutoML] temporarily disable all but x64 platforms -- don't want to do native builds and can't find a way around that with the current VSTS pipeline (#3420)

* disable steps but keep phases to keep vsts build pipeline happy (#3423)

* API docs for experimentation (#3484)

* fixed path bug and regression metrics correction (#3504)

* changed the casing of option alias as it conflicts with --help (#3554)

* [AutoML] Generated project - FastTree nuget package inclusion dynamically (#3567)

* added support for fast tree nuget pack inclusion in generated project

* fix testcase

* changed the tool name in telemetry message

* dummy commit

* remove space

* dummy commit to trigger build

* [AutoML] Add AutoML example code (#3458)

* AutoML PipelineSuggester: don't recommend pipelines from first-stage trainers that failed (#3593)

* InferColumns API: Validate all columns specified in column info exist in inferred data view (#3599)

* [AutoML] AutoML SDK API: validate schema types of input IDataView (#3597)

* [AutoML] If first three iterations all fail, short-circuit AutoML experiment (#3591)

* mlnet CLI nupkg creation/signing (#3606)

* mlnet CLI nupkg creation/signing

* relmove includeinpackage from mlnet csproj

* address PR comments -- some minor reshuffling of stuff

* publish symbols for mlnet CLI

* fix case in NLog.config

* [AutoML] rename Auto to AutoML in namespace and nuget (#3609)

* mlnet CLI nupkg creation/signing

* [AutoML] take dependency on a specific ml.net version (#3610)

* take dependency on a specific ml.net version

* catch up to spelling fix for OptimizationTolerance

* force a specific ml.net nuget version, fix typo (#3616)

* [AutoML] Fix error handling in CLI.  (#3618)

* fix error handling

* renaming variables

* [AutoML] turn off line pragmas in .tt files to play nice with signing (#3617)

* turn off line pragmas in .tt files to play nice with signing

* dedupe tags

* change the param name (#3619)

* [AutoML]  return null instead of null ref crash on Model property accessor (#3620)

* return null instead of null ref crash on Model property accessor

* [AutoML] Handling label column names which have space and exception logging (#3624)

* fix case of label with space and exception logging

* final handler

* revert file

* use Name instead of FullName for telemetry filename hash (#3633)

* renamed classes (#3634)

* change ML.NET dependency to 1.0 (#3639)

[AutoML] undo pinning ML.NET dependency

* set exploration time default in CLI to half hour (#3640)

* [AutoML] step 2 of removing pinned nupkg versions (#3642)

* InferColumns API that consumes label column index -- Only rename label column to 'Label' for headerless files (#3643)

* [AutoML] Upgrade ml.net package in generated code (#3644)

* upgrade the mlnet package in gen code

* Update src/mlnet/Templates/Console/ModelProject.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/Templates/Console/ModelProject.tt

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* added spaces

* [AutoML] Early stopping in CLI based on the exploration time (#3641)

* early stopping in CLI

* remove unused variables

* change back to thread

* remove sleep

* fix review comments

* remove ununsed usings

* format message

* collapse declaration

* remove unused param

* added environment.exit and removal of error message

* correction in message

* secs-> seconds

* exit code

* change value to 1

* reverse the declaration

* [AutoML] Change wording for CouldNotFinshOnTime message (#3655)

* set exploration time default in CLI to half hour

* [AutoML] Change wording for CouldNotFinshOnTime message

* [AutoML] Change wording for CouldNotFinshOnTime message

* even better wording for CouldNotFinshOnTime

* temp change to get around vsts publish failure (#3656)

* [AutoML] bump version to 0.4.0 (#3658)

* implement culture invariant strings (#3725)

* reset culture (#3730)

* [AutoML] Cross validation fixes; validate empty training / validation input data (#3794)

* [AutoML] Enable style cop rules & resolve errors (#3823)

* add task agnostic wrappers for autofit calls (#3860)

* [AutoML] CLI telemetry rev (#3789)

* delete automl .sln

* CLI -- regenerate templated CS files (#3954)

* [AutoML] Bump ML.NET package version to 1.2.0 in AutoML API and CLI; and AutoML package versions to 0.14.0 (#3958)

* Build AutoML NuGet package (#3961)

* Increment AutoML build version to 0.15.0 for preview. (#3968)

* added culture independent parsing (#3731)

* - convert tests to xunit
- take project level dependency on ML.NET components instead of nuget
- set up bestfriends relationship to ML.Core and remove some of the copies of util classes from AutoML.NET (more work needed to fully remove them, work item 4064)
- misc build script changes to address PR comments

* address issues only showing up in a couple configurations during CI build

* fix cut&paste error

* [AutoML] Bump version to ML.NET 1.3.1 in AutoML API and CLI and AutoML package version to 0.15.1 (#4071)

* bumped version

* change versions in nupkg

* revert version bump in branch props

* [AutoML] Fix for Exception thrown in cross val when one of the score equals infinity. (#4073)

* bumped version

* change versions in nupkg

* revert version bump in branch props

* added infinity fix

* changes signing (#4079)

* Addressed PR comments and build issues
- sync block on creating test data file (failed intermittently)
- removed classes we copied over from ML.Core and fixed their uses to de-dupe and use original ML.Core versions since we now have InternalsVisible and BestFriends
- Fixed nupkg creation  to use projects insted of public nuget version for AutoML
- Fixed a bunch of unit tests that didn't actually test what they were supposed to test, while removing cut&past code and dependencies.
- Few more misc small changes

* minor nit - removed unused folder ref

* Fix the .sln file for the right configurations.

* Fix mistake in .sln file

* test fixes and disable one test

* fix tests, re-add AutoML samples csproj

* bumped VS version to 16 in .sln, removed InternalsVisible for a dead assembly, removed unused references from AutoML test project

* Updated docs to include PredictedLabel member (#4107)

* Fixed build errors resulting from upgrade to VS2019 compilers

* Added additional message describing the previous fix

* Updated docs to include PredictedLabel member

* Added CODEOWNERS file in the .github/ folder. (#4140)

* Added CODEOWNERS file in the .github/ folder. This allows reviewers to review any changes in the machine learning repository

* Updated .github/CODEOWNERS with the team instead of individual reviewers

* Added AutoML team reviewers (#4144)

* Added CODEOWNERS file in the .github/ folder. This allows reviewers to review any changes in the machine learning repository

* Updated .github/CODEOWNERS with the team instead of individual reviewers

* Added AutoML team reviwers to files owned by AutoML team

* Added AutoML team reviwers to files owned by AutoML team

* Removed two files that don't exist for AutoML team in CODEOWNERS

* Build extension method to reload changes without specifying model name (#4146)

* Image classification preview 2. (#4151)

* Image classification preview 2.

* PR feedback.

* Add unit-test.

* Add unit-test.

* Add unit-test.

* Add unit-test.

* Use Path.Combine instead of Join.

* fix test dataset path.

* fix test dataset path.

* Improve test.

* Improve test.

* Increase epochs in tests.

* Disable test on Ubuntu.

* Move test to its own project.

* Move test to its own project.

* Move test to its own project.

* Move test to its own file.

* cleanup.

* Disable parallel execution of tensorflow tests.

* PR feedback.

* PR feedback.

* PR feedback.

* PR feedback.

* Prevent TF test to execute in parallel.

* PR feedback.

* Build error.

* clean up.

* Added export functionality for LpNormNormalizingTransformer

* Syncing upstream fork (#11)

* Throw error on incorrect Label name in InferColumns API (#47)

* Added sequential grouping of columns

* reverted the file

* addded infer columns label name checking

* added column detection error

* removed unsed usings

* added quotes

* replace Where with Any clause

* replace Where with Any clause

* Set Nullable Auto params to null values (#50)

* Added sequential grouping of columns

* reverted the file

* added auto params as null

* change to the update fields method

* First public api propsal (#52)

* Includes following
1) Final proposal for 0.1 public API surface
2) Prefeaturization
3) Splitting train data into train and validate when validation data is null
4) Providing end to end samples one each for regression, binaryclassification and multiclass classification

* Incorporating code review feedbacks

* Revert "Set Nullable Auto params to null values" (#53)

* Revert "First public api propsal (#52)"

This reverts commit e4a64cf4aeab13ee9e5bf0efe242da3270241bd7.

* Revert "Set Nullable Auto params to null values (#50)"

This reverts commit 41c663cd14247d44022f40cf2dce5977dbab282d.

* AutoFit return type is now an IEnumerable (#55)

AutoFit returns is now an IEnumerable - this enables many good things

Implementing variety of early stopping criteria (See sample)
Early discard of models that are no good. This improves memory usage efficiency. (See sample)
No need to implement a callback to get results back
Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample).

Also templatized the return type for better type safety through out the code.

* misc fixes & test additions, towards 0.1 release (#56)

* Enable UnitTests on build server (#57)

* 1) Making trainer name public (#62)

2) Fixing up samples to reflect it

*  Initial version of CLI tool for mlnet (#61)

* added global tool initial project

* removed unneccesary files, renamed files

* refactoring and added base abstract classes for trainer generator

* removed unused class

* Added classes for transforms

* added transform generate dummy classes

* more refactoring, added first transform

* more refactoring and added classes

* changed the project structure

* restructing added options class

* sln changes

* refactored options to different class:

* added more logic for code generation of class

* misc changes

* reverted file

* added commandline api package

* reverted sample

* added new command line api parser

* added normalization of column names

* Added command defaults and error message

* implementation of all trainers

* changed auto to null

* added all transform generators

* added error handling when args is empty and minor changes due to change in AutoML api names

* changed the name of param

* added new command line options and restructuring code

* renamed proj file and added solution

* Added code to generate usings, Fixed few bugs in the code

* added validation to the command line options

* changed project name

* Bug fixes due to API change in AutoML

* changed directory structure

* added test framework and basic tests

* added more tests

* added improvements to template and error handling

* renamed the estimator name

* fixed test case

* added comments

* added headers

* changed namespace and removed unneccesary properties from project

* Revert "changed namespace and removed unneccesary properties from project"

This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f.

* fixed test cases and renamed namespaces

* cleaned up proj file

* added folder structure

* added symbols/tokens for strings

* added more tests

* review comments

* modified test cases

* review comments

* change in the exception message

* normalized line endings

* made method private static

* simplified range building /optimization

* minor fix

* added header

* added static methods in command where necessary

* nit picks

*  made few methods static

* review comments

* nitpick

* remove line pragmas

* fix test case

* Use better AutiFit overload and ignore Multiclass (#64)

* Upgrading CLI to produce ML.NET V.10 APIs and bunch of Refactoring tasks (#65)

* Added sequential grouping of columns

* reverted the file

* upgrade to v .10 and refactoring

* added null check

* fixed unit tests

* review comments

* removed the settings change

* added regions

* fixed unit tests

* Upgrade ML.NET package to 0.10.0 (#70)

* Change in template to accomodate new API of TextLoader (#72)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* Enable gated check for mlnet.tests (#79)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* added run-tests.proj and referred it in build.proj

* CLI tool - make validation dataset optional and support for crossvalidation in generated code (#83)

* Added sequential grouping of columns

* reverted the file

* bug fixes, more logic to templates to support cross-validate

* formatting and fix type in consolehelper

* Added logic in templates

* revert settings

* benchmarking related changes (#63)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* fix fast forest learner (don't sweep over learning rate) (#88)

* Made changes to Have non-calibrated scoring for binary classifiers (#86)

* Added sequential grouping of columns

* reverted the file

* added calibration workaround

* removed print probability

* reverted settings

* rev ColumnInference API: can take label index; rev output object types; add tests (#89)

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (#99)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* publish nuget (#101)

* use dotnet-internal-temp agent for internal build

* use dotnet-internal feed

* Fix Codegen for columnConvert and ValueToKeyMapping transform and add individual transform tests (#95)

* Added sequential grouping of columns

* reverted the file

* fix usings for type convert

* added transforms tests

* review comments

* When generating usings choose only distinct usings directives (#94)

* Added sequential grouping of columns

* reverted the file

* Added code to have unique strings

* refactoring

* minor fix

* minor fix

* Autofit overloads + cancellation + progress callbacks

1) Introduce AutoFit overloads (basic and advanced)
2) AutoFit Cancellation
3) AutoFit progress callbacks

* Default the kfolds to value 5 in CLI generated code (#115)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* remove file

* added kfold param and defaulted to value

* changed type

* added for regression

* Remove extra ; from generated code (#114)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* removed extra ; from generated code

* removed file

* fix unit tests

* TimeoutInSeconds (#116)

Specifying timeout in seconds instead of minutes

* Added more command line args implementation to CLI tool and refactoring (#110)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* added git status

* reverted change

* added codegen options and refactoring

* minor fixes'

* renamed params, minor refactoring

* added tests for commandline and refactoring

* removed file

* added back the test case

* minor fixes

* Update src/mlnet.Test/CommandLineTests.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* review comments

*  capitalize the first character

* changed the name of test case

* remove unused directives

* Fail gracefully if unable to instantiate data view with swept parameters (#125)

* gracefully fail if fail to parse a datai

* rev

* validate AutoFit 'Features' column must be of type R4 (#132)

* Samples: exceptions / nits (#124)

* Logging support in CLI + Implementation of cmd args [--name,--output,--verbosity] (#121)

* addded logging and helper methods

* fixing code after merge

* added resx files, added logger framework, added logging messages

* added new options

* added spacing

* minor fixes

* change command description

* rename option, add headers, include new param in test

* formatted

* build fix

*  changed option name

* Added NlogConfig file

* added back config package

* fix tests

* added correct validation check (#137)

* Use CreateTextLoader<T>(..)  instead of CreateTextLoader(..) (#138)

* added support to loaddata by class in the generated code

* fix tests

* changed CreateTextLoader to ReadFromTextFile method. (#140)

* changed textloader to readfromtextfile method

* formatting

* exception fixes (#136)

* infer purpose of hidden columns as 'ignore' (#142)

* Added approval tests and bunch of refactoring of code and normalizing namespaces (#148)

* changed textloader to readfromtextfile method

* formatting

* added approval tests and refactoring of code

* removed few comments

* API 2.0 skeleton (#149)

Incorporating API review feedback

* The CV code should come before the training when there is no test dataset in generated code (#151)

* reorder cv code

* build fix

* fixed structure

* Format the generated code + bunch of misc tasks (#152)

* added formatting and minor changes for reordering cv

* fixing the template

* minor changes

* formatting changes

* fixed approval test

* removed unused nuget

* added missing value replacing

* added test for new transform

* fix test

* Update src/mlnet/Templates/Console/MLCodeGen.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Sanitize the column names in CLI (#162)

* added sanitization layer in CLI

* fix test

* changed exception.StackTrace to exception.ToString()

* fix package name (#168)

* Rev public API (#163)

* Rename TransformGeneratorBase .cs to TransformGeneratorBase.cs (#153)

* Fix minor version for the repository + remove Nlog config package (#171)

*  changed the minor version

* removed the nlog config package

* Added new test to columninfo and fixing up API (#178)

* Make optimizing metric customizable and add trainer whitelist functionality (#172)

* API rev (#181)

* propagate root MLContext thru AutoML (instead of creating our own) (#182)

* Enabling new command line args (#183)

* fix package name

* initial commit

* added more commandline args

* fixed tests

* added headers

* fix tests

* fix test

* rename 'AutoFitter' to 'Experiment' (#169)

* added tests (#187)

* rev InferColumns to accept ColumnInfo input param (#186)

* Implement argument --has-header and change usage of dataset (#194)

* added has header and fixed dataset and train dataset

* fix tests

* removed dummy command (#195)

* Fix bug for regression and sanitize input label from user (#198)

* removed dummy command

* sanitize label and fix template

* fix tests

* Do not generate code concatenating columns when the dataset has a single feature column (#191)

* Include some missed logging in the generated code.  (#199)

* added logging messages for generated code

* added log messages

* deleted file

* cleaning up proj files (#185)

* removed platform target

* removed platform target

* Some spaces and extra lines + bug in output path  (#204)

* nit picks

* nit picks

* fix test

* accept label from user input and provide in generated code (#205)

* Rev handling of weight / label columns (#203)

* migrate to private ML.NET nuget for latest bug fixes (#131)

* fix multiclass with nonstandard label (#207)

* Multiclass nondefault label test (#208)

* printing escaped chars + bug (#212)

* delete unused internal samples (#211)

* fix SMAC bug that causes multiclass sample to infinite loop (#209)

* Rev user input validation for new API (#210)

* added console message for exit and nit picks (#215)

* exit when exception encountered (#216)

* Seal API classes (and make EnableCaching internal) (#217)

* Suggested sample nits (feel free to ask for any of these to be reverted) (#219)

* User input column type validation (#218)

* upgrade commandline and renaming (#221)

* upgrade commandline and renaming

* renaming fields

* Make build.sh, init-tools.sh, & run.sh executable on OSX/Linux (#225)

*  CLI argument descriptions updated (#224)

* CLI argument descriptions updated

* No version in .csproj

* added flag to disable training code (#227)

* Exit if perfect model produced (#220)

* removed header (#228)

* removed header

* added auto generated header

* removed console read key (#229)

* Fix model path in generated file (#230)

* removed console read key

* fix model path

* fix test

* reorder samples (#231)

* remove rule that infers column purpose as categorical if # of distinct values is < 100 (#233)

* Null reference exception fix for finding best model when some runs have failed (#239)

* samples fixes (#238)

* fix for defaulting Averaged Perceptron # of iterations to 10 (#237)

* Bug bash feedback Feb 27. API changes and sample changes (#240)

* Bug bash feedback Feb 27. 
API changes 
Sample changes
Exception fix

* Samples / API rev from 2/27 bug bash feedback (#242)

* changed the directory structure for generated project (#243)

* changed the directory structure for generated project

* changed test

* upgraded commandline package

* Fix test file locations on OSX (#235)

* fix test file locations on OSX

* changing to Path.Combine()

* Additional Path.Combine()

* Remove ConsoleCodeGeneratorTests.GeneratedTrainCodeTest.received.txt

* Additional Path.Combine()

* add back in double comparison fix

* remove metrics agent NaN returns

* test fix

* test format fix

* mock out path

Thanks to @daholste for additional fixes!

* upgrade to latest ML.NET public surface (#246)

* Upgrade to ML.NET 0.11 (#247)

* initial changes

* fix lightgbm

* changed normalize method

* added tests

* fix tests

* fix test

* Private preview final API changes (#250)

* .NET framework design guidelines applied to public surface
* WhitelistedTrainers -> Trainers

* Add estimator to public API iteration result (#248)

* LightGBM pipeline serialization fix (#251)

* Change order that we search for TextLoader's parameters (#256)

* CLI IFileInfo null exception fix (#254)

* Averaged Perceptron pipeline serialization fix (#257)

* Upgrade command-line-api and default folder name change (#258)

* change in defautl folderName

* upgrade command line

* Update src/mlnet/Program.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* eliminate IFileInfo from CLI (#260)

* Rev samples towards private preview; ignored columns fix (#259)

* remove unused methods in consolehelper and nit picks in generated code (#261)

* nit picks

* change in console helper

* fix tests

* add space

* fix tests

* added nuget sources in generated csproj (#262)

* added nuget sources in csproj

* changed the structure in generated code

* space

* upgrade to mlnet 0.11 (#263)

* Formatting CLI metrics (#264)

Ensures space between printed metrics (also model counter). Right aligned metrics. Extended AUC to four digits.

* Add implementation of non -ova multi class trainers code gen (#267)

* added non ova multi class learners

* added tests

* test cases

* Add caching (#249)

* AdvancedExperimentSettings sample nits (#265)

* Add sampling key column (#268)

* Initial work for multi-class classification support for CLI (#226)

* Initial work for multi-class classification support for CLI

* String updates

* more strings

* Whitelist non-OVA multi-class learners

* Refactor the orchestration of AutoML calls (#272)

* Do not auto-group columns with suggested purpose = 'Ignore' (#273)

* Fix: during type inferencing, parse whitespace strings as NaN (#271)

* Printing additional metrics in CLI for binary classification (#274)

* Printing additional metrics in CLI for binary classification

* Update src/mlnet/Utilities/ConsolePrinter.cs

* Add API option to store models on disk (instead of in memory); fix IEstimator memory leak (#269)

* Print failed iterations in CLI (#275)

* change the type to float from double (#277)

* cache arg implementation in CLI (#280)

* cache implementation

* corrected the null case

* added tests for all cases

* Remove duplicate value-to-key mapping transform for multiclass string labels (#283)

* Add post-trainer transform SDK infra; add KeyToValueMapping transform to CLI; fix: for generated multiclass models, convert predicted label from key to original label column type (#286)

* Implement ignore columns command line arg (#290)

* normalize line endings

* added --ignore-columns

* null checks

* unit tests

* Print winning iteration and runtime in CLI (#288)

* Print best metric and runtime

* Print best metric and runtime

* Line endings in AutoMLEngine.cs

* Rename time column to duration to match Python SDK

* Revert to MicroAccuracy and MacroAccuracy spellings

* Revert spelling of BinaryClassificationMetricsAgent to BinaryMetricsAgent to reduce merge conflicts

* Revert spelling of MulticlassMetricsAgent to MultiMetricsAgent to reduce merge conflicts

* missed some files

* Fix merge conflict

* Update AutoMLEngine.cs

* Add MacOS & Linux to CI; MacOS & Linux test fixes (#293)

* MicroAccuracy as default for multi-class (#295)

Change default optimization metric for multi-class classification to MicroAccuracy (accuracy). Previously it was set to MacroAccuracy.

* Null exception for ignorecolumns in CLI (#294)

* Null exception for ignorecolumns in CLI

* Check if ignore-columns array has values (as the default is now a empty array)

* Emit caching flag in pipeline object model. (Includes SuggestedPipelineBuilder refactor & debug string fixes / refactor) (#296)

* removed sln (#297)

* Caching enabling in code gen part -2 (#298)

* add

* added caching codegen

* support comma separated values for --ignore-columns (#300)

* default initialization for ignore columns (#302)

* default initialization

* adde null check

* Codegen for multiclass non-ova (#303)

* changes to template

* multicalss codegen

* test cases

* fix test cases

* Generated Project new structure. (#305)

* added new templates

* writing files to disck

* change path

* added new templates

* misisng braces

* fix bugs

* format code

* added util methods for solution file creation and addition of projects to it

* added extra packages to project files

* new tests

* added correct path for sln

* build fix

* fix build

* include using system in prediction class (#307)

* added using

* fix test

* Random number generator is not thread safe (#310)

* Random number generator is not thread safe

* Another local random generator

* Missed a few references

* Referncing AutoMlUtils.random instead of a local RNG

* More refs to mail RNG; remove Float as per https://github.com/dotnet/machinelearning/issues/1669

* Missed Random.cs

* Fix multiclass code gen (#314)

* compile error in codegen

* removes scores printing

* fix bugs

* fix test

* Fix compile error in codegen project (#319)

* removed redundant code

* fix test case

* Rev OVA pipeline node SDK output: wrap binary trainers as children inside parent OVA node (#317)

* Ova Multi class codegen support (#321)

* dummy

* multiova implementation

* fix tests

* remove inclusion list

* fix tests and console helper

* Rev run result trainer name for OVA: output different trainer name for each OVA + binary learner combination (#322)

* Rev run result trainer name for Ova: output different trainer name for each Ova + binary learner combination

* test fixes

* Console helper bug in generated code for multiclass (#323)

* fix

* fix test

* looping perlogclass

* fix test

* Initial version of Progress bar impl and CLI UI experience (#325)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* Setting model directory to temp directory (#327)

* Suggested changes to progress bar (#335)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* Rev Samples (#334)

* Telemetry2 (#333)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* CLI telemetry implementation

* Telemetry implementation

* delete unnecessary file and change file size bucket to actually log log2 instead of nearest ceil value

* add headers, remove comments

* one more header missing

* Fix progress bar in linux/osx (#336)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* change from task to thread

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Mem leak fix (#328)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* there is still investigation to be done but this fix works and solves memory leak problems

* minor refactor

* Upgrade ML.NET package (#343)

* Add cross-validation (CV), and auto-CV for small datasets; push common API experiment methods into base class (#287)

* restore old yml for internal pipeline so we can publish nuget again to devdiv stream (#344)

* Polishing the CLI UI part-1 (#338)

* formatting of pbar message

* Polishing the UI

* optimization

* rename variable

* Update src/mlnet/AutoML/AutoMLEngine.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* new message

* changed hhtp to https

* added iteration num + 1

* change string name and add color to artifacts

* change the message

* build errors

* added null checks

* added exception messsages to log file

* added exception messsages to log file

* CLI ML.NET version upgrade (#345)

* Sample revs; ColumnInformation property name revs; pre-featurizer fixes (#346)

* CLI -- consume logs from AutoML SDK (#349)

* Rename RunDetails --> RunDetail (#350)

* command line api upgrade and progress bar rendering bug (#366)

* added fix for all platforms progress bar

* upgrade nuget

* removed args from writeline

* change in the version (#368)

* fix few bugs in progressbar and verbosity (#374)

* fix few bugs in progressbar and verbosity

* removed unused name space

* Fix for folders with space in it while generating project (#376)

* support for folders with spaces

* added support for paths with space

* revert file

* change name of var

* remove spaces

* SMAC fix for minimizing metrics (#363)

* Formatting Regression metrics and progress bar display days. (#379)

* added progress bar day display and fix regression metrics

* fix formatting

* added total time

* formatted total time

* change command name and add pbar message (#380)

* change command name and add pbar message

* fix tests

* added aliases

* duplicate alias

* added another alias for task

* UI missing features (#382)

* added formatting changes

* added accuracy specifically

* downgrade the codepages (#384)

* Change in project structure (#385)

* initial changes

* Change in project structure

* correcting test

* change variable name

* fix tests

* fix tests

* fix more tests

* fix codegen errors

* adde log file message

* changed name of args

* change variable names

* fix test

* FileSizeBuckets in correct units (#387)

* Minor telemetry change to log in correct units and make our life easier in the future

* Use Ceiling instead of Round

* changed order (#388)

* prep work to transfer to ml.net (#389)

* move test projects to top level test subdir

* rename some projects to make naming consistent and make it build again

* fix test project refs

* Add AutoML components to build, fix issues related to that so it builds

* fix test cases, remove AppInsights ref from AutoML (#3329)

* [AutoML] disable netfx build leg for now (#3331)

* disable netfx build leg for now

* disable netfx build leg for now.

* [AutoML] Add AutoML XML documentation to all public members; migrate AutoML projects & tests into ML.NET solution; AutoML test fixes (#3351)

* [AutoML] Rev AutoML public API; add required native references to AutoML projects (#3364)

* [AutoML] Minor changes to generated project in CLI based on feedback (#3371)

* nitpicks for generated project

* revert back the target framework

* [AutoML] Migrate AutoML back to its own s…
harishsk added a commit that referenced this issue Sep 11, 2019
* Fixed build errors resulting from upgrade to VS2019 compilers

* Added additional message describing the previous fix

* Syncing upstream fork (#10)

* Throw error on incorrect Label name in InferColumns API (#47)

* Added sequential grouping of columns

* reverted the file

* addded infer columns label name checking

* added column detection error

* removed unsed usings

* added quotes

* replace Where with Any clause

* replace Where with Any clause

* Set Nullable Auto params to null values (#50)

* Added sequential grouping of columns

* reverted the file

* added auto params as null

* change to the update fields method

* First public api propsal (#52)

* Includes following
1) Final proposal for 0.1 public API surface
2) Prefeaturization
3) Splitting train data into train and validate when validation data is null
4) Providing end to end samples one each for regression, binaryclassification and multiclass classification

* Incorporating code review feedbacks

* Revert "Set Nullable Auto params to null values" (#53)

* Revert "First public api propsal (#52)"

This reverts commit e4a64cf4aeab13ee9e5bf0efe242da3270241bd7.

* Revert "Set Nullable Auto params to null values (#50)"

This reverts commit 41c663cd14247d44022f40cf2dce5977dbab282d.

* AutoFit return type is now an IEnumerable (#55)

AutoFit returns is now an IEnumerable - this enables many good things

Implementing variety of early stopping criteria (See sample)
Early discard of models that are no good. This improves memory usage efficiency. (See sample)
No need to implement a callback to get results back
Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample).

Also templatized the return type for better type safety through out the code.

* misc fixes & test additions, towards 0.1 release (#56)

* Enable UnitTests on build server (#57)

* 1) Making trainer name public (#62)

2) Fixing up samples to reflect it

*  Initial version of CLI tool for mlnet (#61)

* added global tool initial project

* removed unneccesary files, renamed files

* refactoring and added base abstract classes for trainer generator

* removed unused class

* Added classes for transforms

* added transform generate dummy classes

* more refactoring, added first transform

* more refactoring and added classes

* changed the project structure

* restructing added options class

* sln changes

* refactored options to different class:

* added more logic for code generation of class

* misc changes

* reverted file

* added commandline api package

* reverted sample

* added new command line api parser

* added normalization of column names

* Added command defaults and error message

* implementation of all trainers

* changed auto to null

* added all transform generators

* added error handling when args is empty and minor changes due to change in AutoML api names

* changed the name of param

* added new command line options and restructuring code

* renamed proj file and added solution

* Added code to generate usings, Fixed few bugs in the code

* added validation to the command line options

* changed project name

* Bug fixes due to API change in AutoML

* changed directory structure

* added test framework and basic tests

* added more tests

* added improvements to template and error handling

* renamed the estimator name

* fixed test case

* added comments

* added headers

* changed namespace and removed unneccesary properties from project

* Revert "changed namespace and removed unneccesary properties from project"

This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f.

* fixed test cases and renamed namespaces

* cleaned up proj file

* added folder structure

* added symbols/tokens for strings

* added more tests

* review comments

* modified test cases

* review comments

* change in the exception message

* normalized line endings

* made method private static

* simplified range building /optimization

* minor fix

* added header

* added static methods in command where necessary

* nit picks

*  made few methods static

* review comments

* nitpick

* remove line pragmas

* fix test case

* Use better AutiFit overload and ignore Multiclass (#64)

* Upgrading CLI to produce ML.NET V.10 APIs and bunch of Refactoring tasks (#65)

* Added sequential grouping of columns

* reverted the file

* upgrade to v .10 and refactoring

* added null check

* fixed unit tests

* review comments

* removed the settings change

* added regions

* fixed unit tests

* Upgrade ML.NET package to 0.10.0 (#70)

* Change in template to accomodate new API of TextLoader (#72)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* Enable gated check for mlnet.tests (#79)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* added run-tests.proj and referred it in build.proj

* CLI tool - make validation dataset optional and support for crossvalidation in generated code (#83)

* Added sequential grouping of columns

* reverted the file

* bug fixes, more logic to templates to support cross-validate

* formatting and fix type in consolehelper

* Added logic in templates

* revert settings

* benchmarking related changes (#63)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* fix fast forest learner (don't sweep over learning rate) (#88)

* Made changes to Have non-calibrated scoring for binary classifiers (#86)

* Added sequential grouping of columns

* reverted the file

* added calibration workaround

* removed print probability

* reverted settings

* rev ColumnInference API: can take label index; rev output object types; add tests (#89)

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (#99)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* publish nuget (#101)

* use dotnet-internal-temp agent for internal build

* use dotnet-internal feed

* Fix Codegen for columnConvert and ValueToKeyMapping transform and add individual transform tests (#95)

* Added sequential grouping of columns

* reverted the file

* fix usings for type convert

* added transforms tests

* review comments

* When generating usings choose only distinct usings directives (#94)

* Added sequential grouping of columns

* reverted the file

* Added code to have unique strings

* refactoring

* minor fix

* minor fix

* Autofit overloads + cancellation + progress callbacks

1) Introduce AutoFit overloads (basic and advanced)
2) AutoFit Cancellation
3) AutoFit progress callbacks

* Default the kfolds to value 5 in CLI generated code (#115)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* remove file

* added kfold param and defaulted to value

* changed type

* added for regression

* Remove extra ; from generated code (#114)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* removed extra ; from generated code

* removed file

* fix unit tests

* TimeoutInSeconds (#116)

Specifying timeout in seconds instead of minutes

* Added more command line args implementation to CLI tool and refactoring (#110)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* added git status

* reverted change

* added codegen options and refactoring

* minor fixes'

* renamed params, minor refactoring

* added tests for commandline and refactoring

* removed file

* added back the test case

* minor fixes

* Update src/mlnet.Test/CommandLineTests.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* review comments

*  capitalize the first character

* changed the name of test case

* remove unused directives

* Fail gracefully if unable to instantiate data view with swept parameters (#125)

* gracefully fail if fail to parse a datai

* rev

* validate AutoFit 'Features' column must be of type R4 (#132)

* Samples: exceptions / nits (#124)

* Logging support in CLI + Implementation of cmd args [--name,--output,--verbosity] (#121)

* addded logging and helper methods

* fixing code after merge

* added resx files, added logger framework, added logging messages

* added new options

* added spacing

* minor fixes

* change command description

* rename option, add headers, include new param in test

* formatted

* build fix

*  changed option name

* Added NlogConfig file

* added back config package

* fix tests

* added correct validation check (#137)

* Use CreateTextLoader<T>(..)  instead of CreateTextLoader(..) (#138)

* added support to loaddata by class in the generated code

* fix tests

* changed CreateTextLoader to ReadFromTextFile method. (#140)

* changed textloader to readfromtextfile method

* formatting

* exception fixes (#136)

* infer purpose of hidden columns as 'ignore' (#142)

* Added approval tests and bunch of refactoring of code and normalizing namespaces (#148)

* changed textloader to readfromtextfile method

* formatting

* added approval tests and refactoring of code

* removed few comments

* API 2.0 skeleton (#149)

Incorporating API review feedback

* The CV code should come before the training when there is no test dataset in generated code (#151)

* reorder cv code

* build fix

* fixed structure

* Format the generated code + bunch of misc tasks (#152)

* added formatting and minor changes for reordering cv

* fixing the template

* minor changes

* formatting changes

* fixed approval test

* removed unused nuget

* added missing value replacing

* added test for new transform

* fix test

* Update src/mlnet/Templates/Console/MLCodeGen.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Sanitize the column names in CLI (#162)

* added sanitization layer in CLI

* fix test

* changed exception.StackTrace to exception.ToString()

* fix package name (#168)

* Rev public API (#163)

* Rename TransformGeneratorBase .cs to TransformGeneratorBase.cs (#153)

* Fix minor version for the repository + remove Nlog config package (#171)

*  changed the minor version

* removed the nlog config package

* Added new test to columninfo and fixing up API (#178)

* Make optimizing metric customizable and add trainer whitelist functionality (#172)

* API rev (#181)

* propagate root MLContext thru AutoML (instead of creating our own) (#182)

* Enabling new command line args (#183)

* fix package name

* initial commit

* added more commandline args

* fixed tests

* added headers

* fix tests

* fix test

* rename 'AutoFitter' to 'Experiment' (#169)

* added tests (#187)

* rev InferColumns to accept ColumnInfo input param (#186)

* Implement argument --has-header and change usage of dataset (#194)

* added has header and fixed dataset and train dataset

* fix tests

* removed dummy command (#195)

* Fix bug for regression and sanitize input label from user (#198)

* removed dummy command

* sanitize label and fix template

* fix tests

* Do not generate code concatenating columns when the dataset has a single feature column (#191)

* Include some missed logging in the generated code.  (#199)

* added logging messages for generated code

* added log messages

* deleted file

* cleaning up proj files (#185)

* removed platform target

* removed platform target

* Some spaces and extra lines + bug in output path  (#204)

* nit picks

* nit picks

* fix test

* accept label from user input and provide in generated code (#205)

* Rev handling of weight / label columns (#203)

* migrate to private ML.NET nuget for latest bug fixes (#131)

* fix multiclass with nonstandard label (#207)

* Multiclass nondefault label test (#208)

* printing escaped chars + bug (#212)

* delete unused internal samples (#211)

* fix SMAC bug that causes multiclass sample to infinite loop (#209)

* Rev user input validation for new API (#210)

* added console message for exit and nit picks (#215)

* exit when exception encountered (#216)

* Seal API classes (and make EnableCaching internal) (#217)

* Suggested sample nits (feel free to ask for any of these to be reverted) (#219)

* User input column type validation (#218)

* upgrade commandline and renaming (#221)

* upgrade commandline and renaming

* renaming fields

* Make build.sh, init-tools.sh, & run.sh executable on OSX/Linux (#225)

*  CLI argument descriptions updated (#224)

* CLI argument descriptions updated

* No version in .csproj

* added flag to disable training code (#227)

* Exit if perfect model produced (#220)

* removed header (#228)

* removed header

* added auto generated header

* removed console read key (#229)

* Fix model path in generated file (#230)

* removed console read key

* fix model path

* fix test

* reorder samples (#231)

* remove rule that infers column purpose as categorical if # of distinct values is < 100 (#233)

* Null reference exception fix for finding best model when some runs have failed (#239)

* samples fixes (#238)

* fix for defaulting Averaged Perceptron # of iterations to 10 (#237)

* Bug bash feedback Feb 27. API changes and sample changes (#240)

* Bug bash feedback Feb 27. 
API changes 
Sample changes
Exception fix

* Samples / API rev from 2/27 bug bash feedback (#242)

* changed the directory structure for generated project (#243)

* changed the directory structure for generated project

* changed test

* upgraded commandline package

* Fix test file locations on OSX (#235)

* fix test file locations on OSX

* changing to Path.Combine()

* Additional Path.Combine()

* Remove ConsoleCodeGeneratorTests.GeneratedTrainCodeTest.received.txt

* Additional Path.Combine()

* add back in double comparison fix

* remove metrics agent NaN returns

* test fix

* test format fix

* mock out path

Thanks to @daholste for additional fixes!

* upgrade to latest ML.NET public surface (#246)

* Upgrade to ML.NET 0.11 (#247)

* initial changes

* fix lightgbm

* changed normalize method

* added tests

* fix tests

* fix test

* Private preview final API changes (#250)

* .NET framework design guidelines applied to public surface
* WhitelistedTrainers -> Trainers

* Add estimator to public API iteration result (#248)

* LightGBM pipeline serialization fix (#251)

* Change order that we search for TextLoader's parameters (#256)

* CLI IFileInfo null exception fix (#254)

* Averaged Perceptron pipeline serialization fix (#257)

* Upgrade command-line-api and default folder name change (#258)

* change in defautl folderName

* upgrade command line

* Update src/mlnet/Program.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* eliminate IFileInfo from CLI (#260)

* Rev samples towards private preview; ignored columns fix (#259)

* remove unused methods in consolehelper and nit picks in generated code (#261)

* nit picks

* change in console helper

* fix tests

* add space

* fix tests

* added nuget sources in generated csproj (#262)

* added nuget sources in csproj

* changed the structure in generated code

* space

* upgrade to mlnet 0.11 (#263)

* Formatting CLI metrics (#264)

Ensures space between printed metrics (also model counter). Right aligned metrics. Extended AUC to four digits.

* Add implementation of non -ova multi class trainers code gen (#267)

* added non ova multi class learners

* added tests

* test cases

* Add caching (#249)

* AdvancedExperimentSettings sample nits (#265)

* Add sampling key column (#268)

* Initial work for multi-class classification support for CLI (#226)

* Initial work for multi-class classification support for CLI

* String updates

* more strings

* Whitelist non-OVA multi-class learners

* Refactor the orchestration of AutoML calls (#272)

* Do not auto-group columns with suggested purpose = 'Ignore' (#273)

* Fix: during type inferencing, parse whitespace strings as NaN (#271)

* Printing additional metrics in CLI for binary classification (#274)

* Printing additional metrics in CLI for binary classification

* Update src/mlnet/Utilities/ConsolePrinter.cs

* Add API option to store models on disk (instead of in memory); fix IEstimator memory leak (#269)

* Print failed iterations in CLI (#275)

* change the type to float from double (#277)

* cache arg implementation in CLI (#280)

* cache implementation

* corrected the null case

* added tests for all cases

* Remove duplicate value-to-key mapping transform for multiclass string labels (#283)

* Add post-trainer transform SDK infra; add KeyToValueMapping transform to CLI; fix: for generated multiclass models, convert predicted label from key to original label column type (#286)

* Implement ignore columns command line arg (#290)

* normalize line endings

* added --ignore-columns

* null checks

* unit tests

* Print winning iteration and runtime in CLI (#288)

* Print best metric and runtime

* Print best metric and runtime

* Line endings in AutoMLEngine.cs

* Rename time column to duration to match Python SDK

* Revert to MicroAccuracy and MacroAccuracy spellings

* Revert spelling of BinaryClassificationMetricsAgent to BinaryMetricsAgent to reduce merge conflicts

* Revert spelling of MulticlassMetricsAgent to MultiMetricsAgent to reduce merge conflicts

* missed some files

* Fix merge conflict

* Update AutoMLEngine.cs

* Add MacOS & Linux to CI; MacOS & Linux test fixes (#293)

* MicroAccuracy as default for multi-class (#295)

Change default optimization metric for multi-class classification to MicroAccuracy (accuracy). Previously it was set to MacroAccuracy.

* Null exception for ignorecolumns in CLI (#294)

* Null exception for ignorecolumns in CLI

* Check if ignore-columns array has values (as the default is now a empty array)

* Emit caching flag in pipeline object model. (Includes SuggestedPipelineBuilder refactor & debug string fixes / refactor) (#296)

* removed sln (#297)

* Caching enabling in code gen part -2 (#298)

* add

* added caching codegen

* support comma separated values for --ignore-columns (#300)

* default initialization for ignore columns (#302)

* default initialization

* adde null check

* Codegen for multiclass non-ova (#303)

* changes to template

* multicalss codegen

* test cases

* fix test cases

* Generated Project new structure. (#305)

* added new templates

* writing files to disck

* change path

* added new templates

* misisng braces

* fix bugs

* format code

* added util methods for solution file creation and addition of projects to it

* added extra packages to project files

* new tests

* added correct path for sln

* build fix

* fix build

* include using system in prediction class (#307)

* added using

* fix test

* Random number generator is not thread safe (#310)

* Random number generator is not thread safe

* Another local random generator

* Missed a few references

* Referncing AutoMlUtils.random instead of a local RNG

* More refs to mail RNG; remove Float as per https://github.com/dotnet/machinelearning/issues/1669

* Missed Random.cs

* Fix multiclass code gen (#314)

* compile error in codegen

* removes scores printing

* fix bugs

* fix test

* Fix compile error in codegen project (#319)

* removed redundant code

* fix test case

* Rev OVA pipeline node SDK output: wrap binary trainers as children inside parent OVA node (#317)

* Ova Multi class codegen support (#321)

* dummy

* multiova implementation

* fix tests

* remove inclusion list

* fix tests and console helper

* Rev run result trainer name for OVA: output different trainer name for each OVA + binary learner combination (#322)

* Rev run result trainer name for Ova: output different trainer name for each Ova + binary learner combination

* test fixes

* Console helper bug in generated code for multiclass (#323)

* fix

* fix test

* looping perlogclass

* fix test

* Initial version of Progress bar impl and CLI UI experience (#325)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* Setting model directory to temp directory (#327)

* Suggested changes to progress bar (#335)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* Rev Samples (#334)

* Telemetry2 (#333)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* CLI telemetry implementation

* Telemetry implementation

* delete unnecessary file and change file size bucket to actually log log2 instead of nearest ceil value

* add headers, remove comments

* one more header missing

* Fix progress bar in linux/osx (#336)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* change from task to thread

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Mem leak fix (#328)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* there is still investigation to be done but this fix works and solves memory leak problems

* minor refactor

* Upgrade ML.NET package (#343)

* Add cross-validation (CV), and auto-CV for small datasets; push common API experiment methods into base class (#287)

* restore old yml for internal pipeline so we can publish nuget again to devdiv stream (#344)

* Polishing the CLI UI part-1 (#338)

* formatting of pbar message

* Polishing the UI

* optimization

* rename variable

* Update src/mlnet/AutoML/AutoMLEngine.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* new message

* changed hhtp to https

* added iteration num + 1

* change string name and add color to artifacts

* change the message

* build errors

* added null checks

* added exception messsages to log file

* added exception messsages to log file

* CLI ML.NET version upgrade (#345)

* Sample revs; ColumnInformation property name revs; pre-featurizer fixes (#346)

* CLI -- consume logs from AutoML SDK (#349)

* Rename RunDetails --> RunDetail (#350)

* command line api upgrade and progress bar rendering bug (#366)

* added fix for all platforms progress bar

* upgrade nuget

* removed args from writeline

* change in the version (#368)

* fix few bugs in progressbar and verbosity (#374)

* fix few bugs in progressbar and verbosity

* removed unused name space

* Fix for folders with space in it while generating project (#376)

* support for folders with spaces

* added support for paths with space

* revert file

* change name of var

* remove spaces

* SMAC fix for minimizing metrics (#363)

* Formatting Regression metrics and progress bar display days. (#379)

* added progress bar day display and fix regression metrics

* fix formatting

* added total time

* formatted total time

* change command name and add pbar message (#380)

* change command name and add pbar message

* fix tests

* added aliases

* duplicate alias

* added another alias for task

* UI missing features (#382)

* added formatting changes

* added accuracy specifically

* downgrade the codepages (#384)

* Change in project structure (#385)

* initial changes

* Change in project structure

* correcting test

* change variable name

* fix tests

* fix tests

* fix more tests

* fix codegen errors

* adde log file message

* changed name of args

* change variable names

* fix test

* FileSizeBuckets in correct units (#387)

* Minor telemetry change to log in correct units and make our life easier in the future

* Use Ceiling instead of Round

* changed order (#388)

* prep work to transfer to ml.net (#389)

* move test projects to top level test subdir

* rename some projects to make naming consistent and make it build again

* fix test project refs

* Add AutoML components to build, fix issues related to that so it builds

* fix test cases, remove AppInsights ref from AutoML (#3329)

* [AutoML] disable netfx build leg for now (#3331)

* disable netfx build leg for now

* disable netfx build leg for now.

* [AutoML] Add AutoML XML documentation to all public members; migrate AutoML projects & tests into ML.NET solution; AutoML test fixes (#3351)

* [AutoML] Rev AutoML public API; add required native references to AutoML projects (#3364)

* [AutoML] Minor changes to generated project in CLI based on feedback (#3371)

* nitpicks for generated project

* revert back the target framework

* [AutoML] Migrate AutoML back to its own solution, w/ NuGet dependencies (#3373)

* Migrate AutoML back to its own solution, w/ NuGet dependencies

* build project updates; parameter name revert

* dummy change

* Revert "dummy change"

This reverts commit 3e8574266f556a4d5b6805eb55b4d8b8b84cf355.

* [AutoML] publish AutoML package (#3383)

* publish AutoML package

* Only leave automl and mlnet tests to run

* publish AutoML package

* Only leave automl and mlnet tests to run

* fix build issues when ml.net is not building

* bump version to 0.3 since that's the one we're going to ship for build (#3416)

* [AutoML] temporarily disable all but x64 platforms -- don't want to do native builds and can't find a way around that with the current VSTS pipeline (#3420)

* disable steps but keep phases to keep vsts build pipeline happy (#3423)

* API docs for experimentation (#3484)

* fixed path bug and regression metrics correction (#3504)

* changed the casing of option alias as it conflicts with --help (#3554)

* [AutoML] Generated project - FastTree nuget package inclusion dynamically (#3567)

* added support for fast tree nuget pack inclusion in generated project

* fix testcase

* changed the tool name in telemetry message

* dummy commit

* remove space

* dummy commit to trigger build

* [AutoML] Add AutoML example code (#3458)

* AutoML PipelineSuggester: don't recommend pipelines from first-stage trainers that failed (#3593)

* InferColumns API: Validate all columns specified in column info exist in inferred data view (#3599)

* [AutoML] AutoML SDK API: validate schema types of input IDataView (#3597)

* [AutoML] If first three iterations all fail, short-circuit AutoML experiment (#3591)

* mlnet CLI nupkg creation/signing (#3606)

* mlnet CLI nupkg creation/signing

* relmove includeinpackage from mlnet csproj

* address PR comments -- some minor reshuffling of stuff

* publish symbols for mlnet CLI

* fix case in NLog.config

* [AutoML] rename Auto to AutoML in namespace and nuget (#3609)

* mlnet CLI nupkg creation/signing

* [AutoML] take dependency on a specific ml.net version (#3610)

* take dependency on a specific ml.net version

* catch up to spelling fix for OptimizationTolerance

* force a specific ml.net nuget version, fix typo (#3616)

* [AutoML] Fix error handling in CLI.  (#3618)

* fix error handling

* renaming variables

* [AutoML] turn off line pragmas in .tt files to play nice with signing (#3617)

* turn off line pragmas in .tt files to play nice with signing

* dedupe tags

* change the param name (#3619)

* [AutoML]  return null instead of null ref crash on Model property accessor (#3620)

* return null instead of null ref crash on Model property accessor

* [AutoML] Handling label column names which have space and exception logging (#3624)

* fix case of label with space and exception logging

* final handler

* revert file

* use Name instead of FullName for telemetry filename hash (#3633)

* renamed classes (#3634)

* change ML.NET dependency to 1.0 (#3639)

[AutoML] undo pinning ML.NET dependency

* set exploration time default in CLI to half hour (#3640)

* [AutoML] step 2 of removing pinned nupkg versions (#3642)

* InferColumns API that consumes label column index -- Only rename label column to 'Label' for headerless files (#3643)

* [AutoML] Upgrade ml.net package in generated code (#3644)

* upgrade the mlnet package in gen code

* Update src/mlnet/Templates/Console/ModelProject.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/Templates/Console/ModelProject.tt

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* added spaces

* [AutoML] Early stopping in CLI based on the exploration time (#3641)

* early stopping in CLI

* remove unused variables

* change back to thread

* remove sleep

* fix review comments

* remove ununsed usings

* format message

* collapse declaration

* remove unused param

* added environment.exit and removal of error message

* correction in message

* secs-> seconds

* exit code

* change value to 1

* reverse the declaration

* [AutoML] Change wording for CouldNotFinshOnTime message (#3655)

* set exploration time default in CLI to half hour

* [AutoML] Change wording for CouldNotFinshOnTime message

* [AutoML] Change wording for CouldNotFinshOnTime message

* even better wording for CouldNotFinshOnTime

* temp change to get around vsts publish failure (#3656)

* [AutoML] bump version to 0.4.0 (#3658)

* implement culture invariant strings (#3725)

* reset culture (#3730)

* [AutoML] Cross validation fixes; validate empty training / validation input data (#3794)

* [AutoML] Enable style cop rules & resolve errors (#3823)

* add task agnostic wrappers for autofit calls (#3860)

* [AutoML] CLI telemetry rev (#3789)

* delete automl .sln

* CLI -- regenerate templated CS files (#3954)

* [AutoML] Bump ML.NET package version to 1.2.0 in AutoML API and CLI; and AutoML package versions to 0.14.0 (#3958)

* Build AutoML NuGet package (#3961)

* Increment AutoML build version to 0.15.0 for preview. (#3968)

* added culture independent parsing (#3731)

* - convert tests to xunit
- take project level dependency on ML.NET components instead of nuget
- set up bestfriends relationship to ML.Core and remove some of the copies of util classes from AutoML.NET (more work needed to fully remove them, work item 4064)
- misc build script changes to address PR comments

* address issues only showing up in a couple configurations during CI build

* fix cut&paste error

* [AutoML] Bump version to ML.NET 1.3.1 in AutoML API and CLI and AutoML package version to 0.15.1 (#4071)

* bumped version

* change versions in nupkg

* revert version bump in branch props

* [AutoML] Fix for Exception thrown in cross val when one of the score equals infinity. (#4073)

* bumped version

* change versions in nupkg

* revert version bump in branch props

* added infinity fix

* changes signing (#4079)

* Addressed PR comments and build issues
- sync block on creating test data file (failed intermittently)
- removed classes we copied over from ML.Core and fixed their uses to de-dupe and use original ML.Core versions since we now have InternalsVisible and BestFriends
- Fixed nupkg creation  to use projects insted of public nuget version for AutoML
- Fixed a bunch of unit tests that didn't actually test what they were supposed to test, while removing cut&past code and dependencies.
- Few more misc small changes

* minor nit - removed unused folder ref

* Fix the .sln file for the right configurations.

* Fix mistake in .sln file

* test fixes and disable one test

* fix tests, re-add AutoML samples csproj

* bumped VS version to 16 in .sln, removed InternalsVisible for a dead assembly, removed unused references from AutoML test project

* Updated docs to include PredictedLabel member (#4107)

* Fixed build errors resulting from upgrade to VS2019 compilers

* Added additional message describing the previous fix

* Updated docs to include PredictedLabel member

* Added CODEOWNERS file in the .github/ folder. (#4140)

* Added CODEOWNERS file in the .github/ folder. This allows reviewers to review any changes in the machine learning repository

* Updated .github/CODEOWNERS with the team instead of individual reviewers

* Added AutoML team reviewers (#4144)

* Added CODEOWNERS file in the .github/ folder. This allows reviewers to review any changes in the machine learning repository

* Updated .github/CODEOWNERS with the team instead of individual reviewers

* Added AutoML team reviwers to files owned by AutoML team

* Added AutoML team reviwers to files owned by AutoML team

* Removed two files that don't exist for AutoML team in CODEOWNERS

* Build extension method to reload changes without specifying model name (#4146)

* Image classification preview 2. (#4151)

* Image classification preview 2.

* PR feedback.

* Add unit-test.

* Add unit-test.

* Add unit-test.

* Add unit-test.

* Use Path.Combine instead of Join.

* fix test dataset path.

* fix test dataset path.

* Improve test.

* Improve test.

* Increase epochs in tests.

* Disable test on Ubuntu.

* Move test to its own project.

* Move test to its own project.

* Move test to its own project.

* Move test to its own file.

* cleanup.

* Disable parallel execution of tensorflow tests.

* PR feedback.

* PR feedback.

* PR feedback.

* PR feedback.

* Prevent TF test to execute in parallel.

* PR feedback.

* Build error.

* clean up.

* Syncing upstream fork (#11)

* Throw error on incorrect Label name in InferColumns API (#47)

* Added sequential grouping of columns

* reverted the file

* addded infer columns label name checking

* added column detection error

* removed unsed usings

* added quotes

* replace Where with Any clause

* replace Where with Any clause

* Set Nullable Auto params to null values (#50)

* Added sequential grouping of columns

* reverted the file

* added auto params as null

* change to the update fields method

* First public api propsal (#52)

* Includes following
1) Final proposal for 0.1 public API surface
2) Prefeaturization
3) Splitting train data into train and validate when validation data is null
4) Providing end to end samples one each for regression, binaryclassification and multiclass classification

* Incorporating code review feedbacks

* Revert "Set Nullable Auto params to null values" (#53)

* Revert "First public api propsal (#52)"

This reverts commit e4a64cf4aeab13ee9e5bf0efe242da3270241bd7.

* Revert "Set Nullable Auto params to null values (#50)"

This reverts commit 41c663cd14247d44022f40cf2dce5977dbab282d.

* AutoFit return type is now an IEnumerable (#55)

AutoFit returns is now an IEnumerable - this enables many good things

Implementing variety of early stopping criteria (See sample)
Early discard of models that are no good. This improves memory usage efficiency. (See sample)
No need to implement a callback to get results back
Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample).

Also templatized the return type for better type safety through out the code.

* misc fixes & test additions, towards 0.1 release (#56)

* Enable UnitTests on build server (#57)

* 1) Making trainer name public (#62)

2) Fixing up samples to reflect it

*  Initial version of CLI tool for mlnet (#61)

* added global tool initial project

* removed unneccesary files, renamed files

* refactoring and added base abstract classes for trainer generator

* removed unused class

* Added classes for transforms

* added transform generate dummy classes

* more refactoring, added first transform

* more refactoring and added classes

* changed the project structure

* restructing added options class

* sln changes

* refactored options to different class:

* added more logic for code generation of class

* misc changes

* reverted file

* added commandline api package

* reverted sample

* added new command line api parser

* added normalization of column names

* Added command defaults and error message

* implementation of all trainers

* changed auto to null

* added all transform generators

* added error handling when args is empty and minor changes due to change in AutoML api names

* changed the name of param

* added new command line options and restructuring code

* renamed proj file and added solution

* Added code to generate usings, Fixed few bugs in the code

* added validation to the command line options

* changed project name

* Bug fixes due to API change in AutoML

* changed directory structure

* added test framework and basic tests

* added more tests

* added improvements to template and error handling

* renamed the estimator name

* fixed test case

* added comments

* added headers

* changed namespace and removed unneccesary properties from project

* Revert "changed namespace and removed unneccesary properties from project"

This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f.

* fixed test cases and renamed namespaces

* cleaned up proj file

* added folder structure

* added symbols/tokens for strings

* added more tests

* review comments

* modified test cases

* review comments

* change in the exception message

* normalized line endings

* made method private static

* simplified range building /optimization

* minor fix

* added header

* added static methods in command where necessary

* nit picks

*  made few methods static

* review comments

* nitpick

* remove line pragmas

* fix test case

* Use better AutiFit overload and ignore Multiclass (#64)

* Upgrading CLI to produce ML.NET V.10 APIs and bunch of Refactoring tasks (#65)

* Added sequential grouping of columns

* reverted the file

* upgrade to v .10 and refactoring

* added null check

* fixed unit tests

* review comments

* removed the settings change

* added regions

* fixed unit tests

* Upgrade ML.NET package to 0.10.0 (#70)

* Change in template to accomodate new API of TextLoader (#72)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* Enable gated check for mlnet.tests (#79)

* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* added run-tests.proj and referred it in build.proj

* CLI tool - make validation dataset optional and support for crossvalidation in generated code (#83)

* Added sequential grouping of columns

* reverted the file

* bug fixes, more logic to templates to support cross-validate

* formatting and fix type in consolehelper

* Added logic in templates

* revert settings

* benchmarking related changes (#63)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* fix fast forest learner (don't sweep over learning rate) (#88)

* Made changes to Have non-calibrated scoring for binary classifiers (#86)

* Added sequential grouping of columns

* reverted the file

* added calibration workaround

* removed print probability

* reverted settings

* rev ColumnInference API: can take label index; rev output object types; add tests (#89)

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (#99)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* publish nuget (#101)

* use dotnet-internal-temp agent for internal build

* use dotnet-internal feed

* Fix Codegen for columnConvert and ValueToKeyMapping transform and add individual transform tests (#95)

* Added sequential grouping of columns

* reverted the file

* fix usings for type convert

* added transforms tests

* review comments

* When generating usings choose only distinct usings directives (#94)

* Added sequential grouping of columns

* reverted the file

* Added code to have unique strings

* refactoring

* minor fix

* minor fix

* Autofit overloads + cancellation + progress callbacks

1) Introduce AutoFit overloads (basic and advanced)
2) AutoFit Cancellation
3) AutoFit progress callbacks

* Default the kfolds to value 5 in CLI generated code (#115)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* remove file

* added kfold param and defaulted to value

* changed type

* added for regression

* Remove extra ; from generated code (#114)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* removed extra ; from generated code

* removed file

* fix unit tests

* TimeoutInSeconds (#116)

Specifying timeout in seconds instead of minutes

* Added more command line args implementation to CLI tool and refactoring (#110)

* Added sequential grouping of columns

* reverted the file

* Set up CI with Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* added git status

* reverted change

* added codegen options and refactoring

* minor fixes'

* renamed params, minor refactoring

* added tests for commandline and refactoring

* removed file

* added back the test case

* minor fixes

* Update src/mlnet.Test/CommandLineTests.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* review comments

*  capitalize the first character

* changed the name of test case

* remove unused directives

* Fail gracefully if unable to instantiate data view with swept parameters (#125)

* gracefully fail if fail to parse a datai

* rev

* validate AutoFit 'Features' column must be of type R4 (#132)

* Samples: exceptions / nits (#124)

* Logging support in CLI + Implementation of cmd args [--name,--output,--verbosity] (#121)

* addded logging and helper methods

* fixing code after merge

* added resx files, added logger framework, added logging messages

* added new options

* added spacing

* minor fixes

* change command description

* rename option, add headers, include new param in test

* formatted

* build fix

*  changed option name

* Added NlogConfig file

* added back config package

* fix tests

* added correct validation check (#137)

* Use CreateTextLoader<T>(..)  instead of CreateTextLoader(..) (#138)

* added support to loaddata by class in the generated code

* fix tests

* changed CreateTextLoader to ReadFromTextFile method. (#140)

* changed textloader to readfromtextfile method

* formatting

* exception fixes (#136)

* infer purpose of hidden columns as 'ignore' (#142)

* Added approval tests and bunch of refactoring of code and normalizing namespaces (#148)

* changed textloader to readfromtextfile method

* formatting

* added approval tests and refactoring of code

* removed few comments

* API 2.0 skeleton (#149)

Incorporating API review feedback

* The CV code should come before the training when there is no test dataset in generated code (#151)

* reorder cv code

* build fix

* fixed structure

* Format the generated code + bunch of misc tasks (#152)

* added formatting and minor changes for reordering cv

* fixing the template

* minor changes

* formatting changes

* fixed approval test

* removed unused nuget

* added missing value replacing

* added test for new transform

* fix test

* Update src/mlnet/Templates/Console/MLCodeGen.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Sanitize the column names in CLI (#162)

* added sanitization layer in CLI

* fix test

* changed exception.StackTrace to exception.ToString()

* fix package name (#168)

* Rev public API (#163)

* Rename TransformGeneratorBase .cs to TransformGeneratorBase.cs (#153)

* Fix minor version for the repository + remove Nlog config package (#171)

*  changed the minor version

* removed the nlog config package

* Added new test to columninfo and fixing up API (#178)

* Make optimizing metric customizable and add trainer whitelist functionality (#172)

* API rev (#181)

* propagate root MLContext thru AutoML (instead of creating our own) (#182)

* Enabling new command line args (#183)

* fix package name

* initial commit

* added more commandline args

* fixed tests

* added headers

* fix tests

* fix test

* rename 'AutoFitter' to 'Experiment' (#169)

* added tests (#187)

* rev InferColumns to accept ColumnInfo input param (#186)

* Implement argument --has-header and change usage of dataset (#194)

* added has header and fixed dataset and train dataset

* fix tests

* removed dummy command (#195)

* Fix bug for regression and sanitize input label from user (#198)

* removed dummy command

* sanitize label and fix template

* fix tests

* Do not generate code concatenating columns when the dataset has a single feature column (#191)

* Include some missed logging in the generated code.  (#199)

* added logging messages for generated code

* added log messages

* deleted file

* cleaning up proj files (#185)

* removed platform target

* removed platform target

* Some spaces and extra lines + bug in output path  (#204)

* nit picks

* nit picks

* fix test

* accept label from user input and provide in generated code (#205)

* Rev handling of weight / label columns (#203)

* migrate to private ML.NET nuget for latest bug fixes (#131)

* fix multiclass with nonstandard label (#207)

* Multiclass nondefault label test (#208)

* printing escaped chars + bug (#212)

* delete unused internal samples (#211)

* fix SMAC bug that causes multiclass sample to infinite loop (#209)

* Rev user input validation for new API (#210)

* added console message for exit and nit picks (#215)

* exit when exception encountered (#216)

* Seal API classes (and make EnableCaching internal) (#217)

* Suggested sample nits (feel free to ask for any of these to be reverted) (#219)

* User input column type validation (#218)

* upgrade commandline and renaming (#221)

* upgrade commandline and renaming

* renaming fields

* Make build.sh, init-tools.sh, & run.sh executable on OSX/Linux (#225)

*  CLI argument descriptions updated (#224)

* CLI argument descriptions updated

* No version in .csproj

* added flag to disable training code (#227)

* Exit if perfect model produced (#220)

* removed header (#228)

* removed header

* added auto generated header

* removed console read key (#229)

* Fix model path in generated file (#230)

* removed console read key

* fix model path

* fix test

* reorder samples (#231)

* remove rule that infers column purpose as categorical if # of distinct values is < 100 (#233)

* Null reference exception fix for finding best model when some runs have failed (#239)

* samples fixes (#238)

* fix for defaulting Averaged Perceptron # of iterations to 10 (#237)

* Bug bash feedback Feb 27. API changes and sample changes (#240)

* Bug bash feedback Feb 27. 
API changes 
Sample changes
Exception fix

* Samples / API rev from 2/27 bug bash feedback (#242)

* changed the directory structure for generated project (#243)

* changed the directory structure for generated project

* changed test

* upgraded commandline package

* Fix test file locations on OSX (#235)

* fix test file locations on OSX

* changing to Path.Combine()

* Additional Path.Combine()

* Remove ConsoleCodeGeneratorTests.GeneratedTrainCodeTest.received.txt

* Additional Path.Combine()

* add back in double comparison fix

* remove metrics agent NaN returns

* test fix

* test format fix

* mock out path

Thanks to @daholste for additional fixes!

* upgrade to latest ML.NET public surface (#246)

* Upgrade to ML.NET 0.11 (#247)

* initial changes

* fix lightgbm

* changed normalize method

* added tests

* fix tests

* fix test

* Private preview final API changes (#250)

* .NET framework design guidelines applied to public surface
* WhitelistedTrainers -> Trainers

* Add estimator to public API iteration result (#248)

* LightGBM pipeline serialization fix (#251)

* Change order that we search for TextLoader's parameters (#256)

* CLI IFileInfo null exception fix (#254)

* Averaged Perceptron pipeline serialization fix (#257)

* Upgrade command-line-api and default folder name change (#258)

* change in defautl folderName

* upgrade command line

* Update src/mlnet/Program.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* eliminate IFileInfo from CLI (#260)

* Rev samples towards private preview; ignored columns fix (#259)

* remove unused methods in consolehelper and nit picks in generated code (#261)

* nit picks

* change in console helper

* fix tests

* add space

* fix tests

* added nuget sources in generated csproj (#262)

* added nuget sources in csproj

* changed the structure in generated code

* space

* upgrade to mlnet 0.11 (#263)

* Formatting CLI metrics (#264)

Ensures space between printed metrics (also model counter). Right aligned metrics. Extended AUC to four digits.

* Add implementation of non -ova multi class trainers code gen (#267)

* added non ova multi class learners

* added tests

* test cases

* Add caching (#249)

* AdvancedExperimentSettings sample nits (#265)

* Add sampling key column (#268)

* Initial work for multi-class classification support for CLI (#226)

* Initial work for multi-class classification support for CLI

* String updates

* more strings

* Whitelist non-OVA multi-class learners

* Refactor the orchestration of AutoML calls (#272)

* Do not auto-group columns with suggested purpose = 'Ignore' (#273)

* Fix: during type inferencing, parse whitespace strings as NaN (#271)

* Printing additional metrics in CLI for binary classification (#274)

* Printing additional metrics in CLI for binary classification

* Update src/mlnet/Utilities/ConsolePrinter.cs

* Add API option to store models on disk (instead of in memory); fix IEstimator memory leak (#269)

* Print failed iterations in CLI (#275)

* change the type to float from double (#277)

* cache arg implementation in CLI (#280)

* cache implementation

* corrected the null case

* added tests for all cases

* Remove duplicate value-to-key mapping transform for multiclass string labels (#283)

* Add post-trainer transform SDK infra; add KeyToValueMapping transform to CLI; fix: for generated multiclass models, convert predicted label from key to original label column type (#286)

* Implement ignore columns command line arg (#290)

* normalize line endings

* added --ignore-columns

* null checks

* unit tests

* Print winning iteration and runtime in CLI (#288)

* Print best metric and runtime

* Print best metric and runtime

* Line endings in AutoMLEngine.cs

* Rename time column to duration to match Python SDK

* Revert to MicroAccuracy and MacroAccuracy spellings

* Revert spelling of BinaryClassificationMetricsAgent to BinaryMetricsAgent to reduce merge conflicts

* Revert spelling of MulticlassMetricsAgent to MultiMetricsAgent to reduce merge conflicts

* missed some files

* Fix merge conflict

* Update AutoMLEngine.cs

* Add MacOS & Linux to CI; MacOS & Linux test fixes (#293)

* MicroAccuracy as default for multi-class (#295)

Change default optimization metric for multi-class classification to MicroAccuracy (accuracy). Previously it was set to MacroAccuracy.

* Null exception for ignorecolumns in CLI (#294)

* Null exception for ignorecolumns in CLI

* Check if ignore-columns array has values (as the default is now a empty array)

* Emit caching flag in pipeline object model. (Includes SuggestedPipelineBuilder refactor & debug string fixes / refactor) (#296)

* removed sln (#297)

* Caching enabling in code gen part -2 (#298)

* add

* added caching codegen

* support comma separated values for --ignore-columns (#300)

* default initialization for ignore columns (#302)

* default initialization

* adde null check

* Codegen for multiclass non-ova (#303)

* changes to template

* multicalss codegen

* test cases

* fix test cases

* Generated Project new structure. (#305)

* added new templates

* writing files to disck

* change path

* added new templates

* misisng braces

* fix bugs

* format code

* added util methods for solution file creation and addition of projects to it

* added extra packages to project files

* new tests

* added correct path for sln

* build fix

* fix build

* include using system in prediction class (#307)

* added using

* fix test

* Random number generator is not thread safe (#310)

* Random number generator is not thread safe

* Another local random generator

* Missed a few references

* Referncing AutoMlUtils.random instead of a local RNG

* More refs to mail RNG; remove Float as per https://github.com/dotnet/machinelearning/issues/1669

* Missed Random.cs

* Fix multiclass code gen (#314)

* compile error in codegen

* removes scores printing

* fix bugs

* fix test

* Fix compile error in codegen project (#319)

* removed redundant code

* fix test case

* Rev OVA pipeline node SDK output: wrap binary trainers as children inside parent OVA node (#317)

* Ova Multi class codegen support (#321)

* dummy

* multiova implementation

* fix tests

* remove inclusion list

* fix tests and console helper

* Rev run result trainer name for OVA: output different trainer name for each OVA + binary learner combination (#322)

* Rev run result trainer name for Ova: output different trainer name for each Ova + binary learner combination

* test fixes

* Console helper bug in generated code for multiclass (#323)

* fix

* fix test

* looping perlogclass

* fix test

* Initial version of Progress bar impl and CLI UI experience (#325)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* Setting model directory to temp directory (#327)

* Suggested changes to progress bar (#335)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* Rev Samples (#334)

* Telemetry2 (#333)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* CLI telemetry implementation

* Telemetry implementation

* delete unnecessary file and change file size bucket to actually log log2 instead of nearest ceil value

* add headers, remove comments

* one more header missing

* Fix progress bar in linux/osx (#336)

* progressbar

* added progressbar and refactoring

* reverted

* revert sign assembly

* added headers and removed exception rethrow

* bug fixes and updates to UI

* added friendly name printing for metric

* formatting

* change from task to thread

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Mem leak fix (#328)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)

* tweak queue in vsts-ci.yml

* there is still investigation to be done but this fix works and solves memory leak problems

* minor refactor

* Upgrade ML.NET package (#343)

* Add cross-validation (CV), and auto-CV for small datasets; push common API experiment methods into base class (#287)

* restore old yml for internal pipeline so we can publish nuget again to devdiv stream (#344)

* Polishing the CLI UI part-1 (#338)

* formatting of pbar message

* Polishing the UI

* optimization

* rename variable

* Update src/mlnet/AutoML/AutoMLEngine.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs

Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com>

* new message

* changed hhtp to https

* added iteration num + 1

* change string name and add color to artifacts

* change the message

* build errors

* added null checks

* added exception messsages to log file

* added exception messsages to log file

* CLI ML.NET version upgrade (#345)

* Sample revs; ColumnInformation property name revs; pre-featurizer fixes (#346)

* CLI -- consume logs from AutoML SDK (#349)

* Rename RunDetails --> RunDetail (#350)

* command line api upgrade and progress bar rendering bug (#366)

* added fix for all platforms progress bar

* upgrade nuget

* removed args from writeline

* change in the version (#368)

* fix few bugs in progressbar and verbosity (#374)

* fix few bugs in progressbar and verbosity

* removed unused name space

* Fix for folders with space in it while generating project (#376)

* support for folders with spaces

* added support for paths with space

* revert file

* change name of var

* remove spaces

* SMAC fix for minimizing metrics (#363)

* Formatting Regression metrics and progress bar display days. (#379)

* added progress bar day display and fix regression metrics

* fix formatting

* added total time

* formatted total time

* change command name and add pbar message (#380)

* change command name and add pbar message

* fix tests

* added aliases

* duplicate alias

* added another alias for task

* UI missing features (#382)

* added formatting changes

* added accuracy specifically

* downgrade the codepages (#384)

* Change in project structure (#385)

* initial changes

* Change in project structure

* correcting test

* change variable name

* fix tests

* fix tests

* fix more tests

* fix codegen errors

* adde log file message

* changed name of args

* change variable names

* fix test

* FileSizeBuckets in correct units (#387)

* Minor telemetry change to log in correct units and make our life easier in the future

* Use Ceiling instead of Round

* changed order (#388)

* prep work to transfer to ml.net (#389)

* move test projects to top level test subdir

* rename some projects to make naming consistent and make it build again

* fix test project refs

* Add AutoML components to build, fix issues related to that so it builds

* fix test cases, remove AppInsights ref from AutoML (#3329)

* [AutoML] disable netfx build leg for now (#3331)

* disable netfx build leg for now

* disable netfx build leg for now.

* [AutoML] Add AutoML XML documentation to all public members; migrate AutoML projects & tests into ML.NET solution; AutoML test fixes (#3351)

* [AutoML] Rev AutoML public API; add required native references to AutoML projects (#3364)

* [AutoML] Minor changes to generated project in CLI based on feedback (#3371)

* nitpicks for generated project

* revert back the target framework

* [AutoML] Migrate AutoML back to its own solution, w/ NuGet dependencies (#3373)

* Migrate AutoML back to its own solut…
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

10 participants