Fixed the syntax of cited example. #2

zeahmed · 2018-05-04T18:12:55Z

No description provided.

sandyarmstrong · 2018-05-04T18:37:58Z

README.md

+var pipeline = new LearningPipeline();
+pipeline.Add(new TextLoader<SentimentData>(dataPath, separator: ","));
+pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
+pipeline.Add(new FastTreeBinaryClassifier());


Good catch. Maybe add a newline after this, too, to break it up a bit?

eerhardt · 2018-05-04T19:49:13Z

@dotnet-bot test this please

glebuk · 2018-05-04T20:06:41Z

@glebuk is added to the review. #Closed

glebuk

glebuk · 2018-05-04T20:16:07Z

README.md

-    .Add(new TextFeaturizer("Features", "SentimentText")
-    .Add(new FastTreeBinaryClassifier()
-     Add(new PredictedLabelColumnOriginalValueConverter(PredictedLabelColumn = "PredictedLabel"});
+var pipeline = new LearningPipeline();


Opened issue #7 to add the functionality to support simplified syntax. #Pending

Update local fork

@Ivanidzo4ka

* First attempt at removing extra code comments * Round #2 * Removing Microsoft.ML.InternalStreams per comment on #513 * Address notes from @Ivanidzo4ka * Remove TreeOrderedCandidatesSearch * Remove whitespace and reinstate commented out tests

@Ivanidzo4ka

…#514) * First attempt at removing extra code comments * Round #2 * Removing Microsoft.ML.InternalStreams per comment on dotnet#513 * Address notes from @Ivanidzo4ka * Remove TreeOrderedCandidatesSearch * Remove whitespace and reinstate commented out tests

Latest dotnet/master

* Added placeholder * Cleaned up Infos (replaced with ColumnPairs) * Added ColumnInfo * Added all the Create() methods. * Added Mapper * Commented out the EntryPoint * Added PcaEstimator2 * PcaWorkout test passes * Added pigsty api * Fixed EntryPoint * Fixed the arguments * Fixed tests and added pigsty test * Deleted Wrapped PCA transform * Float -> float * Cleaned docstrings * Removed some unnecessary checks * Simplified unnecessary code * Moved some fields to ColumnInfo for simplifications * Simplified weight columns * Address PR comments #1 * Addressed PR comments #2 * Moved the static test * PR comments #3 * Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo. * PR comments * PR comments * Updated manifest for entrypoint PcaCalculator * Fixed schema exceptions

@daholste

…ature branch (#3324) * Initial commit * ci test build * forgot to save this one file * Debug-Intrinsics isn't a valid config, trying windows-x64 * disabled tests for now * disable tests attempt 2 * initial code push, no history, test project not in the build so is the internal client * battling with warn as err * test build * test change * make params for MLContext data extensions match ML.NET default names and values; update gitignore; nit rev for Benchmarking.cs (#5) * Create README.md (#2) * API folder changes (#6) * comment out fast forest trainer, per discussion on ML.NET open issue #1983, for now, to run E2E w/o exceptions (#7) * Make validation data param mandatory; remove GetFirstPipeline sample (#10) * Make validation data param mandatory; remove GetFirstPipeline sample * remove deprecated todo * Create ISSUE_TEMPLATE.md & PULL_REQUEST_TEMPLATE.md (#12) * Create ISSUE_TEMPLATE.md * Create PULL_REQUEST_TEMPLATE.md * NestedObject For pipeline (#14) * add estimator extensions / catalog; add conversion from external to internal pipeline; transform clean-up; add back in test proj and fix build; refactor trainer ext name mappings (#15) * Make validation data param mandatory; remove GetFirstPipeline sample * remove deprecated todo * add estimator extensions / catalog; add ability to go from external to internal pipeline; a lot of transform clean-up; add back in test proj and get it building; refactor trainer ext name mappings * corrected the typo in readme (#16) * make GetNextPipeline API w/ public Pipeline method on PipelineSuggester; write GetNextPipeline API test; fix public Pipeline object serialization; fix header inferencing bug; write test utils for fetching datasets (#18) * get next pipeline API rev -- refactor API to consume column dimensions, purpose, type, and name instead of available trainers & transforms (#19) * mark get next pipeline test as ignore for now (#20) * fix dataview take util bug, add dataview skip util, add some UTs to increase code coverage (#21) * fix dataview take util bug, add dataview skip util, add some UTs to increase code coverage * add accuracy threshold on AutoFit test * add null check to best pipeline on autofit result * unit test additions (including user input validation testing); dead code removal for code coverage (including KDO & associated utils); misc fixes & revs (#22) * add trainer extension tests, & misc fixes (#23) * add estimator extension tests (#24) * add conversions tests (#25) * fix multiclass runs & add multiclass autofit UT (#27) * add basic autofit regression test (#28) * fix categorical transform bug (sometimes categorical features weren't concatenated to final features); add UT transforms; add PipelineNode equality & tests to serve as AutoML testing infra * add example to readme (#26) * add lightgbm args as nested properties (#33) * fix bug where if one pipeline hyperparam optimization converges, run terminates (#36) * add open-source headers to files; other nit clean-ups along the way (#35) * Ungroup Columns in Column Inference (#40) * Added sequential grouping of columns * added ungrouping of column option * reverted the file * Misc fixes (#39) * misc fixes -- fix bug where SMAC returning already-seen values; fix param encoding return bug in pipeline object model; nit clean-up AutoFit; return in pipeline suggester when sweeper has no next proposal; null ref fix in public object model pipeline suggester * fix in BuildPipelineNodePropsLightGbm test, fix / use correct 'newTrainer' variable in PipelneSuggester * SMAC perf improvement * Removing the nuget.config and have build.props mention the nuget package sources. (#38) * Added sequential grouping of columns * removed nuget.config and have only props mentions the nuget sources * reverted the file * transform inferencing concat / ignore fixes (#41) * make pipeline object model & other public classes internal (#43) * handle SMAC exception when fewer trees were trained than requested (#44) * Throw error on incorrect Label name in InferColumns API (#47) * Added sequential grouping of columns * reverted the file * addded infer columns label name checking * added column detection error * removed unsed usings * added quotes * replace Where with Any clause * replace Where with Any clause * Set Nullable Auto params to null values (#50) * Added sequential grouping of columns * reverted the file * added auto params as null * change to the update fields method * First public api propsal (#52) * Includes following 1) Final proposal for 0.1 public API surface 2) Prefeaturization 3) Splitting train data into train and validate when validation data is null 4) Providing end to end samples one each for regression, binaryclassification and multiclass classification * Incorporating code review feedbacks * Revert "Set Nullable Auto params to null values" (#53) * Revert "First public api propsal (#52)" This reverts commit e4a64cf. * Revert "Set Nullable Auto params to null values (#50)" This reverts commit 41c663c. * AutoFit return type is now an IEnumerable (#55) AutoFit returns is now an IEnumerable - this enables many good things Implementing variety of early stopping criteria (See sample) Early discard of models that are no good. This improves memory usage efficiency. (See sample) No need to implement a callback to get results back Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample). Also templatized the return type for better type safety through out the code. * misc fixes & test additions, towards 0.1 release (#56) * Enable UnitTests on build server (#57) * 1) Making trainer name public (#62) 2) Fixing up samples to reflect it * Initial version of CLI tool for mlnet (#61) * added global tool initial project * removed unneccesary files, renamed files * refactoring and added base abstract classes for trainer generator * removed unused class * Added classes for transforms * added transform generate dummy classes * more refactoring, added first transform * more refactoring and added classes * changed the project structure * restructing added options class * sln changes * refactored options to different class: * added more logic for code generation of class * misc changes * reverted file * added commandline api package * reverted sample * added new command line api parser * added normalization of column names * Added command defaults and error message * implementation of all trainers * changed auto to null * added all transform generators * added error handling when args is empty and minor changes due to change in AutoML api names * changed the name of param * added new command line options and restructuring code * renamed proj file and added solution * Added code to generate usings, Fixed few bugs in the code * added validation to the command line options * changed project name * Bug fixes due to API change in AutoML * changed directory structure * added test framework and basic tests * added more tests * added improvements to template and error handling * renamed the estimator name * fixed test case * added comments * added headers * changed namespace and removed unneccesary properties from project * Revert "changed namespace and removed unneccesary properties from project" This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f. * fixed test cases and renamed namespaces * cleaned up proj file * added folder structure * added symbols/tokens for strings * added more tests * review comments * modified test cases * review comments * change in the exception message * normalized line endings * made method private static * simplified range building /optimization * minor fix * added header * added static methods in command where necessary * nit picks * made few methods static * review comments * nitpick * remove line pragmas * fix test case * Use better AutiFit overload and ignore Multiclass (#64) * Upgrading CLI to produce ML.NET V.10 APIs and bunch of Refactoring tasks (#65) * Added sequential grouping of columns * reverted the file * upgrade to v .10 and refactoring * added null check * fixed unit tests * review comments * removed the settings change * added regions * fixed unit tests * Upgrade ML.NET package to 0.10.0 (#70) * Change in template to accomodate new API of TextLoader (#72) * Added sequential grouping of columns * reverted the file * changed to new API of Text Loader * changed signature * added params for taking additional settings * changes to codegen params * refactoring of templates and fixing errors * Enable gated check for mlnet.tests (#79) * Added sequential grouping of columns * reverted the file * changed to new API of Text Loader * changed signature * added params for taking additional settings * changes to codegen params * refactoring of templates and fixing errors * added run-tests.proj and referred it in build.proj * CLI tool - make validation dataset optional and support for crossvalidation in generated code (#83) * Added sequential grouping of columns * reverted the file * bug fixes, more logic to templates to support cross-validate * formatting and fix type in consolehelper * Added logic in templates * revert settings * benchmarking related changes (#63) * Create test.txt * Create test.txt * changes needed for benchmarking * forgot one file * merge conflict fix * fix build break * back out my version of the fix for Label column issue and fix the original fix * bogus file removal * undo SuggestedPipeline change * remove labelCol from pipeline suggester * fix build break * fix fast forest learner (don't sweep over learning rate) (#88) * Made changes to Have non-calibrated scoring for binary classifiers (#86) * Added sequential grouping of columns * reverted the file * added calibration workaround * removed print probability * reverted settings * rev ColumnInference API: can take label index; rev output object types; add tests (#89) * rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (#99) * Create test.txt * Create test.txt * changes needed for benchmarking * forgot one file * merge conflict fix * fix build break * back out my version of the fix for Label column issue and fix the original fix * bogus file removal * undo SuggestedPipeline change * remove labelCol from pipeline suggester * fix build break * rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline) * publish nuget (#101) * use dotnet-internal-temp agent for internal build * use dotnet-internal feed * Fix Codegen for columnConvert and ValueToKeyMapping transform and add individual transform tests (#95) * Added sequential grouping of columns * reverted the file * fix usings for type convert * added transforms tests * review comments * When generating usings choose only distinct usings directives (#94) * Added sequential grouping of columns * reverted the file * Added code to have unique strings * refactoring * minor fix * minor fix * Autofit overloads + cancellation + progress callbacks 1) Introduce AutoFit overloads (basic and advanced) 2) AutoFit Cancellation 3) AutoFit progress callbacks * Default the kfolds to value 5 in CLI generated code (#115) * Added sequential grouping of columns * reverted the file * Set up CI with Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * remove file * added kfold param and defaulted to value * changed type * added for regression * Remove extra ; from generated code (#114) * Added sequential grouping of columns * reverted the file * Set up CI with Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * removed extra ; from generated code * removed file * fix unit tests * TimeoutInSeconds (#116) Specifying timeout in seconds instead of minutes * Added more command line args implementation to CLI tool and refactoring (#110) * Added sequential grouping of columns * reverted the file * Set up CI with Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * added git status * reverted change * added codegen options and refactoring * minor fixes' * renamed params, minor refactoring * added tests for commandline and refactoring * removed file * added back the test case * minor fixes * Update src/mlnet.Test/CommandLineTests.cs Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com> * review comments * capitalize the first character * changed the name of test case * remove unused directives * Fail gracefully if unable to instantiate data view with swept parameters (#125) * gracefully fail if fail to parse a datai * rev * validate AutoFit 'Features' column must be of type R4 (#132) * Samples: exceptions / nits (#124) * Logging support in CLI + Implementation of cmd args [--name,--output,--verbosity] (#121) * addded logging and helper methods * fixing code after merge * added resx files, added logger framework, added logging messages * added new options * added spacing * minor fixes * change command description * rename option, add headers, include new param in test * formatted * build fix * changed option name * Added NlogConfig file * added back config package * fix tests * added correct validation check (#137) * Use CreateTextLoader<T>(..) instead of CreateTextLoader(..) (#138) * added support to loaddata by class in the generated code * fix tests * changed CreateTextLoader to ReadFromTextFile method. (#140) * changed textloader to readfromtextfile method * formatting * exception fixes (#136) * infer purpose of hidden columns as 'ignore' (#142) * Added approval tests and bunch of refactoring of code and normalizing namespaces (#148) * changed textloader to readfromtextfile method * formatting * added approval tests and refactoring of code * removed few comments * API 2.0 skeleton (#149) Incorporating API review feedback * The CV code should come before the training when there is no test dataset in generated code (#151) * reorder cv code * build fix * fixed structure * Format the generated code + bunch of misc tasks (#152) * added formatting and minor changes for reordering cv * fixing the template * minor changes * formatting changes * fixed approval test * removed unused nuget * added missing value replacing * added test for new transform * fix test * Update src/mlnet/Templates/Console/MLCodeGen.cs Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com> * Sanitize the column names in CLI (#162) * added sanitization layer in CLI * fix test * changed exception.StackTrace to exception.ToString() * fix package name (#168) * Rev public API (#163) * Rename TransformGeneratorBase .cs to TransformGeneratorBase.cs (#153) * Fix minor version for the repository + remove Nlog config package (#171) * changed the minor version * removed the nlog config package * Added new test to columninfo and fixing up API (#178) * Make optimizing metric customizable and add trainer whitelist functionality (#172) * API rev (#181) * propagate root MLContext thru AutoML (instead of creating our own) (#182) * Enabling new command line args (#183) * fix package name * initial commit * added more commandline args * fixed tests * added headers * fix tests * fix test * rename 'AutoFitter' to 'Experiment' (#169) * added tests (#187) * rev InferColumns to accept ColumnInfo input param (#186) * Implement argument --has-header and change usage of dataset (#194) * added has header and fixed dataset and train dataset * fix tests * removed dummy command (#195) * Fix bug for regression and sanitize input label from user (#198) * removed dummy command * sanitize label and fix template * fix tests * Do not generate code concatenating columns when the dataset has a single feature column (#191) * Include some missed logging in the generated code. (#199) * added logging messages for generated code * added log messages * deleted file * cleaning up proj files (#185) * removed platform target * removed platform target * Some spaces and extra lines + bug in output path (#204) * nit picks * nit picks * fix test * accept label from user input and provide in generated code (#205) * Rev handling of weight / label columns (#203) * migrate to private ML.NET nuget for latest bug fixes (#131) * fix multiclass with nonstandard label (#207) * Multiclass nondefault label test (#208) * printing escaped chars + bug (#212) * delete unused internal samples (#211) * fix SMAC bug that causes multiclass sample to infinite loop (#209) * Rev user input validation for new API (#210) * added console message for exit and nit picks (#215) * exit when exception encountered (#216) * Seal API classes (and make EnableCaching internal) (#217) * Suggested sample nits (feel free to ask for any of these to be reverted) (#219) * User input column type validation (#218) * upgrade commandline and renaming (#221) * upgrade commandline and renaming * renaming fields * Make build.sh, init-tools.sh, & run.sh executable on OSX/Linux (#225) * CLI argument descriptions updated (#224) * CLI argument descriptions updated * No version in .csproj * added flag to disable training code (#227) * Exit if perfect model produced (#220) * removed header (#228) * removed header * added auto generated header * removed console read key (#229) * Fix model path in generated file (#230) * removed console read key * fix model path * fix test * reorder samples (#231) * remove rule that infers column purpose as categorical if # of distinct values is < 100 (#233) * Null reference exception fix for finding best model when some runs have failed (#239) * samples fixes (#238) * fix for defaulting Averaged Perceptron # of iterations to 10 (#237) * Bug bash feedback Feb 27. API changes and sample changes (#240) * Bug bash feedback Feb 27. API changes Sample changes Exception fix * Samples / API rev from 2/27 bug bash feedback (#242) * changed the directory structure for generated project (#243) * changed the directory structure for generated project * changed test * upgraded commandline package * Fix test file locations on OSX (#235) * fix test file locations on OSX * changing to Path.Combine() * Additional Path.Combine() * Remove ConsoleCodeGeneratorTests.GeneratedTrainCodeTest.received.txt * Additional Path.Combine() * add back in double comparison fix * remove metrics agent NaN returns * test fix * test format fix * mock out path Thanks to @daholste for additional fixes! * upgrade to latest ML.NET public surface (#246) * Upgrade to ML.NET 0.11 (#247) * initial changes * fix lightgbm * changed normalize method * added tests * fix tests * fix test * Private preview final API changes (#250) * .NET framework design guidelines applied to public surface * WhitelistedTrainers -> Trainers * Add estimator to public API iteration result (#248) * LightGBM pipeline serialization fix (#251) * Change order that we search for TextLoader's parameters (#256) * CLI IFileInfo null exception fix (#254) * Averaged Perceptron pipeline serialization fix (#257) * Upgrade command-line-api and default folder name change (#258) * change in defautl folderName * upgrade command line * Update src/mlnet/Program.cs Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com> * eliminate IFileInfo from CLI (#260) * Rev samples towards private preview; ignored columns fix (#259) * remove unused methods in consolehelper and nit picks in generated code (#261) * nit picks * change in console helper * fix tests * add space * fix tests * added nuget sources in generated csproj (#262) * added nuget sources in csproj * changed the structure in generated code * space * upgrade to mlnet 0.11 (#263) * Formatting CLI metrics (#264) Ensures space between printed metrics (also model counter). Right aligned metrics. Extended AUC to four digits. * Add implementation of non -ova multi class trainers code gen (#267) * added non ova multi class learners * added tests * test cases * Add caching (#249) * AdvancedExperimentSettings sample nits (#265) * Add sampling key column (#268) * Initial work for multi-class classification support for CLI (#226) * Initial work for multi-class classification support for CLI * String updates * more strings * Whitelist non-OVA multi-class learners * Refactor the orchestration of AutoML calls (#272) * Do not auto-group columns with suggested purpose = 'Ignore' (#273) * Fix: during type inferencing, parse whitespace strings as NaN (#271) * Printing additional metrics in CLI for binary classification (#274) * Printing additional metrics in CLI for binary classification * Update src/mlnet/Utilities/ConsolePrinter.cs * Add API option to store models on disk (instead of in memory); fix IEstimator memory leak (#269) * Print failed iterations in CLI (#275) * change the type to float from double (#277) * cache arg implementation in CLI (#280) * cache implementation * corrected the null case * added tests for all cases * Remove duplicate value-to-key mapping transform for multiclass string labels (#283) * Add post-trainer transform SDK infra; add KeyToValueMapping transform to CLI; fix: for generated multiclass models, convert predicted label from key to original label column type (#286) * Implement ignore columns command line arg (#290) * normalize line endings * added --ignore-columns * null checks * unit tests * Print winning iteration and runtime in CLI (#288) * Print best metric and runtime * Print best metric and runtime * Line endings in AutoMLEngine.cs * Rename time column to duration to match Python SDK * Revert to MicroAccuracy and MacroAccuracy spellings * Revert spelling of BinaryClassificationMetricsAgent to BinaryMetricsAgent to reduce merge conflicts * Revert spelling of MulticlassMetricsAgent to MultiMetricsAgent to reduce merge conflicts * missed some files * Fix merge conflict * Update AutoMLEngine.cs * Add MacOS & Linux to CI; MacOS & Linux test fixes (#293) * MicroAccuracy as default for multi-class (#295) Change default optimization metric for multi-class classification to MicroAccuracy (accuracy). Previously it was set to MacroAccuracy. * Null exception for ignorecolumns in CLI (#294) * Null exception for ignorecolumns in CLI * Check if ignore-columns array has values (as the default is now a empty array) * Emit caching flag in pipeline object model. (Includes SuggestedPipelineBuilder refactor & debug string fixes / refactor) (#296) * removed sln (#297) * Caching enabling in code gen part -2 (#298) * add * added caching codegen * support comma separated values for --ignore-columns (#300) * default initialization for ignore columns (#302) * default initialization * adde null check * Codegen for multiclass non-ova (#303) * changes to template * multicalss codegen * test cases * fix test cases * Generated Project new structure. (#305) * added new templates * writing files to disck * change path * added new templates * misisng braces * fix bugs * format code * added util methods for solution file creation and addition of projects to it * added extra packages to project files * new tests * added correct path for sln * build fix * fix build * include using system in prediction class (#307) * added using * fix test * Random number generator is not thread safe (#310) * Random number generator is not thread safe * Another local random generator * Missed a few references * Referncing AutoMlUtils.random instead of a local RNG * More refs to mail RNG; remove Float as per #1669 * Missed Random.cs * Fix multiclass code gen (#314) * compile error in codegen * removes scores printing * fix bugs * fix test * Fix compile error in codegen project (#319) * removed redundant code * fix test case * Rev OVA pipeline node SDK output: wrap binary trainers as children inside parent OVA node (#317) * Ova Multi class codegen support (#321) * dummy * multiova implementation * fix tests * remove inclusion list * fix tests and console helper * Rev run result trainer name for OVA: output different trainer name for each OVA + binary learner combination (#322) * Rev run result trainer name for Ova: output different trainer name for each Ova + binary learner combination * test fixes * Console helper bug in generated code for multiclass (#323) * fix * fix test * looping perlogclass * fix test * Initial version of Progress bar impl and CLI UI experience (#325) * progressbar * added progressbar and refactoring * reverted * revert sign assembly * added headers and removed exception rethrow * Setting model directory to temp directory (#327) * Suggested changes to progress bar (#335) * progressbar * added progressbar and refactoring * reverted * revert sign assembly * added headers and removed exception rethrow * bug fixes and updates to UI * added friendly name printing for metric * formatting * Rev Samples (#334) * Telemetry2 (#333) * Create test.txt * Create test.txt * changes needed for benchmarking * forgot one file * merge conflict fix * fix build break * back out my version of the fix for Label column issue and fix the original fix * bogus file removal * undo SuggestedPipeline change * remove labelCol from pipeline suggester * fix build break * rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline) * tweak queue in vsts-ci.yml * CLI telemetry implementation * Telemetry implementation * delete unnecessary file and change file size bucket to actually log log2 instead of nearest ceil value * add headers, remove comments * one more header missing * Fix progress bar in linux/osx (#336) * progressbar * added progressbar and refactoring * reverted * revert sign assembly * added headers and removed exception rethrow * bug fixes and updates to UI * added friendly name printing for metric * formatting * change from task to thread * Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com> * Mem leak fix (#328) * Create test.txt * Create test.txt * changes needed for benchmarking * forgot one file * merge conflict fix * fix build break * back out my version of the fix for Label column issue and fix the original fix * bogus file removal * undo SuggestedPipeline change * remove labelCol from pipeline suggester * fix build break * rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline) * tweak queue in vsts-ci.yml * there is still investigation to be done but this fix works and solves memory leak problems * minor refactor * Upgrade ML.NET package (#343) * Add cross-validation (CV), and auto-CV for small datasets; push common API experiment methods into base class (#287) * restore old yml for internal pipeline so we can publish nuget again to devdiv stream (#344) * Polishing the CLI UI part-1 (#338) * formatting of pbar message * Polishing the UI * optimization * rename variable * Update src/mlnet/AutoML/AutoMLEngine.cs Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com> * Update src/mlnet/CodeGenerator/CodeGenerationHelper.cs Co-Authored-By: srsaggam <41802116+srsaggam@users.noreply.github.com> * new message * changed hhtp to https * added iteration num + 1 * change string name and add color to artifacts * change the message * build errors * added null checks * added exception messsages to log file * added exception messsages to log file * CLI ML.NET version upgrade (#345) * Sample revs; ColumnInformation property name revs; pre-featurizer fixes (#346) * CLI -- consume logs from AutoML SDK (#349) * Rename RunDetails --> RunDetail (#350) * command line api upgrade and progress bar rendering bug (#366) * added fix for all platforms progress bar * upgrade nuget * removed args from writeline * change in the version (#368) * fix few bugs in progressbar and verbosity (#374) * fix few bugs in progressbar and verbosity * removed unused name space * Fix for folders with space in it while generating project (#376) * support for folders with spaces * added support for paths with space * revert file * change name of var * remove spaces * SMAC fix for minimizing metrics (#363) * Formatting Regression metrics and progress bar display days. (#379) * added progress bar day display and fix regression metrics * fix formatting * added total time * formatted total time * change command name and add pbar message (#380) * change command name and add pbar message * fix tests * added aliases * duplicate alias * added another alias for task * UI missing features (#382) * added formatting changes * added accuracy specifically * downgrade the codepages (#384) * Change in project structure (#385) * initial changes * Change in project structure * correcting test * change variable name * fix tests * fix tests * fix more tests * fix codegen errors * adde log file message * changed name of args * change variable names * fix test * FileSizeBuckets in correct units (#387) * Minor telemetry change to log in correct units and make our life easier in the future * Use Ceiling instead of Round * changed order (#388) * prep work to transfer to ml.net (#389) * move test projects to top level test subdir * rename some projects to make naming consistent and make it build again * fix test project refs * Add AutoML components to build, fix issues related to that so it builds

fix TensorflowUtil.GetModelSchema

* Merge from main repository (#1) * update tensorflow.net to 0.20.0 (#5404) * upgrade to 3.1 * write inline data using invariantCulture * upodate tensorflow * update Microsoft.ML.Vision * fix test && comment * udpate tensorflow.net to 0.20.1 * update tf major version * downgrade tf runtime to 1.14.1 * Update Dependencies.props * Update Dependencies.props * update tffact to stop running test on linux with glibc < 2.3) * fix TensorFlowTransformInputShapeTest * use tf.v1 api * fix comment: * fix building error * fix test * fix nit * remove linq Co-authored-by: BigBigMiao <BigBigMiao@github.com> * ProduceWordBags Onnx Export Fix (#5435) * fix for issue * fix documentation * aligning test * adding back line * aligning fix Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * [SrCnnEntireAnomalyDetector] Upgrade boundary calculation and expected value calculation (#5436) * adjust expected value * update boundary calculation * fix boundary * adjust default values * fix percent case * fix error in anomaly score calculation Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> * Update OnnxRuntime to 1.5.2 (#5439) * Added prerelease feed and updated to 1.5.2 * Remove prerelease feed * Updated docs * Update doc * Fixed MacOS CI Pipeline builds (#5457) * Added MacOS Homebrew bug fix * nit fix * Improving error message (#5444) * better error fix * revisions Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * Fixed MacOS daily & nightly builds due to Homebrew bug (#5467) * Fixed MacOS nightly builds due to Homebrew bug * Edit workaround * Remove untapping of python2 * Nit edit * Remove installation of mono-libgdiplus * try installing mono-libgdiplus * unlink python 3.8 * Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment (#5163) * Fix issue when parsing float string fails on pl-PL culture set * Added InvariantCulture float parsing as per CodeReview request * Update src/Microsoft.ML.AutoML/Sweepers/SweeperProbabilityUtils.cs Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Update Parameters.cs * Added PL test * Added multiple cultures * debugging CI failure * Debug runSpecific * Revert "Debug runSpecific" This reverts commit 95b728099415cacbe8cf3819ec51ce50cec94eb2. * Removed LightGBM and addressed comments * Increased time * Increase time * Increased time Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> * handle exception during GetNextPipeline for AutoML (#5455) * handle exception during GetNextPipeline for AutoML * take comments * Changing LoadRawImages Sample (#5460) replacing example Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (#5445) * Use ctx.CalncelExecution() to fix AutoML max-time experiment bug * Added unit test for checking canceled experiment * Nit fix * Different run time on Linux * Review * Testing four ouput * Used reflection to test for contexts being canceled * Reviews * Reviews * Added main MLContext listener-timer * Added PRNG on _context, held onto timers for avoiding GC * Addressed reviews * Unit test edits * Increase run time of experiment to guarantee probabilities * Edited unit test to check produced schema of next run model's predictions * Remove scheme check as different CI builds result in varying schemas * Decrease max experiment time unit test time * Added Timers * Increase second timer time, edit unit test * Added try catch for OperationCanceledException in Execute() * Add AggregateException try catch to slow unit tests for parallel testing * Reviews * Final reviews * Added LightGBMFact to binary classification test * Removed extra Operation Stopped exception try catch * Add back OperationCanceledException to Experiment.cs * fix issue 5020, allow ML.NET to load tf model with primitive input and output column (#5468) * handle exception during GetNextPipeline for AutoML * take comments * Enable TesnflowTransformer take primitive type as input column * undo unnecessary changes * add test * update on test * remove unnecessary line * take comments * maxModels instead of time for AutoML unit test (#5471) Uses the internal `maxModels` parameter instead of `MaxExperimentTimeInSeconds` for the exit criteria of AutoML. This is to increase the test stability in case the test is run on a slower machine. * Disabling AutoFitMaxExperimentTimeTest Disabling AutoFitMaxExperimentTimeTest * Fix AutoFitMaxExperimentTimeTest (#5506) *Fixed test Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> * Fix SR anomaly score calculation at beginning (#5502) * adjust expected value * update boundary calculation * fix boundary * adjust default values * fix percent case * fix error in anomaly score calculation * adjust score calculation for first & second points * fix sr do not report anomaly at beginning * fix a issue in batch process * remove a unused parameter Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> * Merge arcade to master (#5525) * Initial commit for Arcade migration * Added omitted files * Changed strong name signing to use the same key for shipping and test assemblies * arcade linux build (#5423) * arcade linux build * put file execution permission change into source control * The `-test` command for windows. Nuget packages (#5464) * working on testing * testing updates * tests almost working * build changes * all tests should be working * changes from PR comments * fixes for .net 3.1 * Fixed extension check. Removed <PackageId> where not needed * Removed pkg folder and updated paths. * Added test key. (#5475) * Added test key. * Update PublicKey.cs Removed extra newline. * Update ComponentCatalog.cs Fixed 3 spaces to 4. * Windows CI working (#5477) * ci testing changes * comments from pr * Added Linux & Mac changes for Arcade (#5479) * Initial Windows, Linux, Macos builds test * Add Linux/MacOS specific CI requirements * Run Arcade CI tests on MacOS/Linux * Fix final package building * Add benchmark download to benchmars .csporj file * Print detailed status of each unit test * Install CentOS & Ubuntu build dependencies * Use container names to differenciate between Ubuntu & CentOS * Remove sudo usage in CentOS * Fix Linux build dependencies * Add -y param to apt install * Remove installation of Linux dependencies * Minor additions * Rename Benchmarks to PerformanceTests for Arcade * Changes * Added benchmark doc changes * Pre-merge changes * Fixing failing Arcade Windows Builds (#5482) * Try Windows build single quote fix * Remove %20 * Added variable space value * Using variables for spacing * Added space values as job parameters * Try conditional variables again * fix official builds * Revert "fix official builds" This reverts commit 7dbbdc7b946f4f48db5452887ad9bf53616a37e8. * fixing tensorflow rebase issue * Fixes for many of the CI builds. (#5496) * yml log changes * Fix NetFX builds by ensuring assembly version is set correctly and not to Arcade default of 42.42.42.42 (#5503) * Fixed official builds for Arcade SDK (#5512) * Added fixes for official builds * Make .sh files executable * fix mkl nuget issue Co-authored-by: Frank Dong <frdong@microsoft.com> * fix code generator tests failure (#5520) * Added fixes for official builds * Make .sh files executable * fix mkl nuget issue * fix code generate test fails * only add necessary dependency Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com> * Fixed memory leaks from OnnxTransformer (#5518) * Fixed memory leak from OnnxTransformer and related x86 build fixes * Reverting x86 build related fixes to focus only on the memory leaks * Updated docs * Reverted OnnxRuntimeOutputCatcher to private class * Addressed code review comments * Refactored OnnxTransform back to using MapperBase based on code review comments * Handle integration tests and nightly build testing (#5509) * Make -integrationTests work * Update .yml file * Added the TargetArchitecture properties * Try out -integrationTest * Missed -integrationTest flag * Renamed FunctionalTestBaseClass to IntegrationTestBaseClass * Missed rename * Modified tests to make them more stable * Fixed leak in object pool (#5521) Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com> Co-authored-by: Frank Dong <frdong@microsoft.com> Co-authored-by: Michael Sharp <misharp@microsoft.com> Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com> * fix benchmark test timeout issue (#5530) * removed old build stuff (#5531) * Fixes Code Coverage in Arcade (#5528) * arcade code coverage changes * adding Michael's changes * updating path Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * Removed CODEOWNERS file to unify review process (#5535) * Fix publishing problems (#5538) * Removed our dependency to BuildTools by using the NugetCommand Azure Task. * We should publish a nuget named "SampleUtils", but we were publishing it with the name "SamplesUtils" * The naming conventions of our published nugets didn't match the ones described on arcade's docs: Versioning.md. I've also added the option so that when queuing the publishing build, we can pass the VERSIONKIND variable with value "release", so that it produces the nugets with arcade's conventions for "Release official build" nugets (as opposed to the "Daily official build" naming convention that's going to be used now by our CI that publishes nightly nugets). * Updated prerelease label (#5540) * Fix warnings from CI Build (#5541) * fix warnings * also add conditional copy asset to native.proj * test fix warnings * supress nuget warning 5118 * supress other warning * remove unnecessary change * put skip warning at Directory.Buil.props * Updated build instructions (#5534) * Updated build instructions * Adressed reviews * Reviews * removed the rest of the old pkg references: (#5537) * Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (#5395) * Fix for issue 744 * cleanup * fixing report output * fixedTestReferenceOutputs * Fixed test reference outputs for NetCore31 * change top k acc output string format * Ranking algorithm now uses first appearance in dataset rather than worstCase * fixed benchmark * various minor changes from code review * limit TopK to OutputTopKAcc parameter * top k output name changes * make old TopK readOnly * restored old baselineOutputs since respecting outputTopK param means no topK in most test output * fix test fails, re-add names parameter * Clean up commented code * that'll teach me to edit from the github webpage * use existing method, fix nits * Slight comment change * Comment change / Touch to kick off build pipeline * fix whitespace * Added new test * Code formatting nits * Code formatting nit * Fixed undefined rankofCorrectLabel and trailing whitespace warning * Removed _numUnknownClassInstances and added test for unknown labels * Add weight to seenRanks * Nits * Removed FastTree import Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Fixed Spelling on stopwords (#5524) * Changes to onnx export. (#5544) * Add back missing test project from running on arcade (#5545) * add back test result upload and add missing test project from running * fix identification * filter out performance test result files to avoid warnings * [CodeGenerator] Fix MLNet.CLI build error. (#5546) * upgrade to 3.1 * write inline data using invariantCulture * fix mlnet build error * Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (#5548) * Fixed bug * Tensorflow fix (#5547) * fix tensorflow issue on sample repo * add comments * Update to OnnxRuntime 1.6.0 and fixed bug with sequences outputs (#5529) * Use onnx prerelease * Upgrade to onnx 1.6.0 * Updated docs * Fixed problem with sequences * added in DcgTruncationLevel to AutoML api (#5433) * added in DcgTruncationLevel to automl api * changed default to 10 * updated basline output * fixed failing tests and baselines * Changes from PR comments. * Update src/Microsoft.ML.AutoML/Experiment/MetricsAgents/RankingMetricsAgent.cs Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Changes based on PR comments. * Fix ranking test. Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Created release notes for v1.5.3 (#5543) * Created release notes for v1.5.3 * Updated with review comments * Updated with review comments * Updated release notes with latest PRs * Fixed typo * Forward logs of Experiment's sub MLContexts to main MLContext (#5554) * Forward logs of Experiment's sub MLContexts to main MLContext * Adressed reviews * Update Stale docs (#5550) * Updated OnnxMl.md * Updated MlNetMklDeps docs * Typo * typo * continueOnError on Brew Workaround (#5555) * continueOnError:true * Fix publishing symbols (#5556) * Disable Portable PDB conversion * Push packages to artifacts * Fix symbols issues * Added note about Microsoft.ML.dll * try out just packing * Return Build=false, but actually use configuration * Added missing TargetArchitecture * add back tests * Added missing flags * Updated version to 1.5.4 (#5557) * Fixed version numbers in the right place (#5558) * Updated version to 1.5.4 * Updated version to 1.5.4 * eng (#5560) * Renamed release notes file (#5561) * Renamed release notes file * Updated version number in release notes * Add SymSgdNative reference to AutoML.Tests.csproj (#5559) * runSpecific in YAML * RunSpecific in test * Add SymSgdNative reference * Revert "RunSpecific in test" This reverts commit fed12b26ae71e7a95d2dd1f4703541138a780d75. * Revert "runSpecific in YAML" This reverts commit f9f328d52cd5b4281ad38b7a6af20c219dd0fd44. * Nuget.config url fix for roslyn compilers (#5584) * fixed nuget url, versions, and failing tests * changes from pr comments and MacOS changes * MacOS homebrew bug workaround * removed unnused nuget url * added in note that PredictionEngine is not thread safe (#5583) * Onnx Export for ValueMapping estimator (#5577) * Fixed Averaged Perceptron default value (#5586) * fixed missed averaged perceptron default value * fixed extension api * fixed test baselines * fixing official build (#5596) * Release/1.5.4 fix (#5599) * Nuget.config url fix for roslyn compilers (#5584) * fixed nuget url, versions, and failing tests * changes from pr comments and MacOS changes * MacOS homebrew bug workaround * removed unnused nuget url * fixing official build (#5596) * Remove references to Microsoft.ML.Scoring (#5602) This was the very first ONNX .NET bindings, it was replaced with Microsoft.ML.OnnxRuntime then Microsoft.ML.OnnxRuntime.Managed. * Make ColumnInference serializable (#5611) * upgrade to 3.1 * write inline data using invariantCulture * make column inference serializable * add test json * add approvaltests * fixerd nuget.config (#5614) * Fix issue in SRCnnEntireAnomalyDetector (#5579) * update * refine codes * update comments * update for nit Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> * Offer suggestions for possibly mistyped label column names in AutoML (#5574) (#5624) * Offer suggestions for possibly mistyped label column names * review changes * TimeSeries - fix confidence parameter type for some detectors (#4058) (#5623) * TimeSeries - fix confidence parameter type for some detectors. - The public API exposed confidence parameters as int even though it's internally implemented as double - There was no workaround since all classes where double is used are internal - This caused major issues for software requiring high precision predictions - This change to API should be backwards compatible since int can be passed to parameter of type double * TimeSeries - reintroduce original methods with confidence parameter of type int (to not break the API). * TimeSeries - make catalog API methods with int confidence parameter deprecated. - Tests adjusted to not use the deprecated methods * Update Conversion.cs (#5627) * Documentation updates (#5635) * documentation updates * fixed spelling error * Update docs/building/unix-instructions.md Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com> Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com> * AutoML aggregate exception (#5631) * added check for aggregate exception * Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * pulled message out to private variable so its not duplicated * Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Treat TensorFlow output as non-batched. (#5634) * Can now not treat output as batched. * updated comments based on PR comments. * Fixing saving/loading with new parameter. * Updates based on PR comments * Update src/Microsoft.ML.TensorFlow/TensorflowUtils.cs Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * reverted accidental test changes * fixes based on PR comments Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * Added in release notes for 1.5.5 (#5639) * added in release notes * Update release-1.5.5.md Removed incorrect PR. * Update docs/release-notes/1.5.5/release-1.5.5.md Co-authored-by: Eric StJohn <ericstj@microsoft.com> * Update docs/release-notes/1.5.5/release-1.5.5.md Co-authored-by: Eric StJohn <ericstj@microsoft.com> * Update release-1.5.5.md Co-authored-by: Eric StJohn <ericstj@microsoft.com> * updating version after release (#5642) * Move DataFrame to machinelearning (#5641) * Change namespace to Microsoft.Data.Analysis (#2773) * Update namespace to Microsoft.Data.Analysis * Remove "DataFrame" from the test project name * APIs for reversed binary operators (#2769) * Support reverse binary operators * Fix file left behind in a rebase * Fix whitespace * Throw for incompatible inPlace (#2778) * Throw if inPlace is set and types mismatch * Unit test * Better error message * Remove empty lines * Version, Tags and Description for Nuget (#2779) * Version, Tags and Description for Nuget * sq * Flags for release (#2781) * Publish packages to artifacts * Flags for release * Fix the Description method to not throw (#2786) * Fix the Description method to not crash Adds an Info method * sq * Address feddback * Last round of feedback * Use dataTypes if it passed in to LoadCsv (#2791) * Fix LoadCsv to use dataType if it passed in * sq * Don't read the full file after guessRows lines have been read * Address feedback * Last round of feedback * Creating a `Rows` property, similar to `Columns` (#2794) * Rows collection, similar to Columns * Doc * Some minor clean up * Make DataFrameRow a view into the DataFrame * sq * Address feedback * Remove DataFrame.RowCount * More row count changes * sq * Address feedback * Merge upstream * DataFrame.LoadCsv throws an exception on projects targeting < netcore3.0 (#2797) Fixing by passing in an encoding and a default buffer size. Also, get our tests running on .NET Framework. Fix #2783 * Params constructor on DataFrame (#2800) * Params constructor on DataFrame * Delete redundant constructors * Remove `T : unmanaged` constraint from DataFrameColumn.BinaryOperations (#2801) * Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations * Address feedback * Rename the value version of the APIs * sq * Fix build * Address feedback * Remove Value from the APIs * sq * Address feedback * Bump version to 0.2.0 (#2803) * Add Apply<TResult>method to PrimitiveDataFrameColumn (#2807) * Add Apply method to PrimitiveDataFrameColumn and its container * Add TestApply test * Remove unused df variable in DataFrameTests * Add xml doc comments to Apply method * Add additional tests for ReadCsv (#2811) * Add additional tests for ReadCsv * Update asserts * Add empty row and skip test pending another fix * Remove test for another issue * Added static factory methods to DataFrameColumn (#2808) * Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type). * Remove regions * Update some parts of the unit tests to use static factory methods to create DataFrameColumns. * Remove errant {T} on StringDataFrameColumn. * PR feedback Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * Append rows to a DataFrame (#2823) * Append rows to a DataFrame * Unit test * Update unit tests and doc * Need to perfrom a type check every time * sq * Update unit test * Address comments * Move corefxlab to arcade (#2795) * Add eng folder * First cut of moving corefxlab to arcade * Move arcade symbol validation inside official buil * Move base yml file to root * Arcade will build, publish packages and symbols * UpdateXlf. Review this * Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel * Remove property that was causing the build to fail * Moving global properties to the main Yaml instead of step in order to unblock publishing * Committing xlfs and changing the build script to not update Xlf on build * clean up corefxlab-base.yml * sq * Delete unused files and scripts * Get rid of all the xlf stuff * Remove UpdateXlfOnBuild for non-NT builds * Minor cleanup * More cleanup * update eng\build.sh permission * Rename to Nuget.config * sq * Remove the runtime spec from global.json * Don't publish test projs * Typo * Move version prefix to versions.props Change prereleaselabel to alpha * Increment version number to list as the latest package Increment version number of Microsoft.Experimental.Collections to list as the latest package Turn off graph generation * Update the Readme * Test removing the scripts folder * Touch readme to force a change * Address Jose's comments * Typo * Move versions to eng/versions.props * Benchmark.proj needs to refer to xunit * Clean up dependencies.props * Remove dependencies.props Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com> * Rename Sort to OrderBy (#2814) * Rename sort to orderby and add orderbydescending method * Add doc strings * Update bench mark test * Update tests * Update DataFrameColumn to use orderby * Update doc comment * Additions to sortby * Revert "Additions to sortby" This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35. * Revert "Update doc comment" This reverts commit 192f7797fe2b77625486637badf77046162fedbf. * Revert "Update DataFrameColumn to use orderby" This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4. * Explode column types and generate converters (#2857) * Explode column types and generate converters * Clean this * sq * sq * Cherry pick for next commit * sq * Undo unnecessary change * Address remaining concerns from the 2nd DataFrame API Review (#2861) * Move string indexer to Columns * API changes from the 2nd API review * Unit tests * Address comments * Add binary operations and operators on the exploded columns (#2867) * Generate combinations of binary operations and Add * Numeric Converters and CloneAsNumericColumns * Binary, Comparison and Shift operations * Clean up and bug fix * Fix the binary op apis to not be overridden * Internal constructors for exploded types * Proper return types for exploded types * Update unit tests * Update csproj * Revert "Fix the binary op apis to not be overridden" This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1. * Bug fix and unit test * Constructor that takes in a container * Unit tests * Call the implementation where possible * Review sq * sq * Cherry pick for next commit * sq * Undo unnecessary change * Rename to the system namespace column types * Address comments * Push to pull locally * Mimic C#'s arithmetic grammar in DataFrame * Address feedback * Reduce the number of partial column definitions * Address feedback * Add APIs to get the strongly typed columns from a DataFrame (#2878) * CP * sq * sq * Improve docs * Enable xml docs for Data.Analysis (#2882) * Enable xml docs for Data.Analysis * Fix /// summary around inheritdoc * Minor doc changes * sq * sq * Address feedback * Add Apply to ArrowStringDataFrameColumn (#2889) * Support for Exploded columns types in Arrow and IO scenarios (#2885) * Support for Exploded columns types in Arrow and IO scenarios * Unit tests * Address feedback * Bump version (#2890) * Fix versioning to allow for individual stable packages (#2891) * Fix versioning to allow for individual stable packages * sq * Bump Microsoft.Data.Analysis version to 0.4.0 (#2892) * Bump Microsoft.Data.Analysis version to 0.4.0 * Fix https://github.com/dotnet/corefxlab/issues/2906 (#2907) * Fix https://github.com/dotnet/corefxlab/issues/2906 * Improvements and unit tests * sq * Better fix * sq * Improve LoadCsv to handle null values when deducing the column types (#2916) * Unit test to repro * Fix https://github.com/dotnet/corefxlab/issues/2915 Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn * Update src/Microsoft.Data.Analysis/DataFrame.IO.cs Co-authored-by: Günther Foidl <gue@korporal.at> * Update src/Microsoft.Data.Analysis/DataFrame.cs Co-authored-by: Günther Foidl <gue@korporal.at> * Feedback Co-authored-by: Günther Foidl <gue@korporal.at> * Create a 0.4.0 package (#2918) * Revert "Create a 0.4.0 package (#2918)" (#2919) This reverts commit 0bef531289744274ab97e8bbb9e5694b0d855689. * Produce a 0.4.0 build (#2920) * Default Length for StringDataFrameColumn (#2921) (#2923) * Increment version and stop producing stable packages (#2922) * Increment version and stop producing stable packages * Add DataFrame object formatter. (#2931) * Add DataFrame object formatter. * Update nuget dependencies. * Apply CR fixes. * Fix a bug in InsertColumn * Add Microsoft.Data.Analysis.nuget project (#2933) * Add DataFrame object formatter. * Update nuget dependencies. * Apply CR fixes. * Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj. * Add Microsoft.Data.Analysis.nuget project. * Move project to src. Fix nuget project settings. * Remove NoBuild property from project. * Remove IncludeBuildOutput and IncludeSymbols from project. * Add VersionPrefix to project. * Add IncludeBuildOutput property. * Add unit tests. * Downgrade from netcoreapp3.1 to netcoreapp3.0 * Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0) * Add netcoreapp3.1 to global settings * Add dotnet 3.1.5 runtime to global settings * Build fixes * Moving MDAI into interactive-extensions folder of the package * Minor refactoring * Respond to PR feedback Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com> Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com> Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * ColumnName indexer on DataFrame (#2959) * ColumnName indexer on DataFrame Fixes https://github.com/dotnet/corefxlab/issues/2934 * Unit tests * Null column name * Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: false (#2956) * implement FillNulls method for ArrowStringDataFrameColumn * additional asserts for testcase * Prevent DataFrame.Sample() method from returning duplicated rows (#2939) * resolves #2806 * replace forloop with ArraySegment<T> * reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows) * Add WriteCsv plus unit tests. (#2947) * Add WriteCsv plus unit tests. * Add CultureInfo to WriteCsv. Remove index column param. Update unit tests. * Add CR changes. CultureInfo. Separator. * Format decimal types individually. Fix culture info. Fix unit tests. * Format decimal types individually. Fix culture info. Fix unit tests. * Missing values default to a `StringDataFrameColumn` (#2982) * Make LoadCsv more robust * Test empty string column * Retain prev guess where possible * Update FromArrowRecordBatches for dotnet-spark (#2978) * Support for RecordBatches with StructArrays * Sq * Address comments * Nits * Nits * Implement DataFrame.LoadCsvFromString (#2988) * Implement DataFrame.LoadCsvFromString * Address comments * Part 1 of porting the csv reader (#2997) * Move to the test folder * Suppress warnings * Move extensions reference out of props Make MDA.test use the props defined TFM Comment out 2 unit tests * Address feedback * Address feedback * Default to preview version * Update nuget.config Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com> Co-authored-by: Jon Wood <jwood803@users.noreply.github.com> Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com> Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com> Co-authored-by: Günther Foidl <gue@korporal.at> Co-authored-by: Rhys Parry <rhys@i-think22.net> Co-authored-by: daniel costea <dcostea@users.noreply.github.com> Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com> * Update to the latest Microsoft.DotNet.Interactive (#5710) * Update to the latest Microsoft.DotNet.Interactive * Add System.CommandLine nuget feed * Fix Data.Analysis.Interactive test * added main branch to yml files (#5715) * Renamed master to main (#5717) * renamed master to main * Update vsts-ci.yml * updated urls * renamed master to main (#5719) * IDataView to DataFrame (#5712) * IDataView -> DataFrame Implement the virtual function * More APIs and unit tests * ANother unit test * Address feedback * Last bit of feedback * Fix some stuff and unit tests * sq * Move RowCursor back * Remove unused param Docs maxRows More unit tests Fixed ArrowStringDataFrameColumn construction in the unit test * Improve csv parsing (#5711) * Part 2 of TextFieldParser. Next up is hooking up ReadCsv to use TextFieldParser * Make LoadCsv use TextFieldParser * More unit tests * cleanup * Address feedback * Last bit of feedback * Remove extra var * Remove duplicate file * Rename strings.resx to Strings.resx * rename the designer.cs file too * Fix doc markdown (#5732) Fixed documentation markdown remarks for * MulticlassClassificationMetrics.LogLoss * MulticlassClassificationMetrics.LogLossReduction Signed-off-by: Robin Windey <ro.windey@gmail.com> * Use Official package for SharpZipLib (#5735) Co-authored-by: Xiaoyun Zhang <bigmiao.zhang@gmail.com> Co-authored-by: BigBigMiao <BigBigMiao@github.com> Co-authored-by: Keren Fuentes <dkeren@seas.upenn.edu> Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> Co-authored-by: Yuanxiang Ying <yingyuanxiang34@sina.com> Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com> Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com> Co-authored-by: Piotr Telman <ptelman@users.noreply.github.com> Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com> Co-authored-by: Harish Kulkarni <harishsk@users.noreply.github.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Frank Dong <frdong@microsoft.com> Co-authored-by: Michael Sharp <misharp@microsoft.com> Co-authored-by: Jason DeBoever <github@deboever.us> Co-authored-by: Leo Gaunt <36968548+LeoGaunt@users.noreply.github.com> Co-authored-by: Keren Fuentes <kerenfuentes313@gmail.com> Co-authored-by: Eric StJohn <ericstj@microsoft.com> Co-authored-by: Ivan Agarský <agarskyivan@gmail.com> Co-authored-by: Andrej Kmetík <akmetik@gmail.com> Co-authored-by: Phan Tấn Tài <37982283+4201104140@users.noreply.github.com> Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com> Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com> Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com> Co-authored-by: Jon Wood <jwood803@users.noreply.github.com> Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com> Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com> Co-authored-by: Günther Foidl <gue@korporal.at> Co-authored-by: Rhys Parry <rhys@i-think22.net> Co-authored-by: daniel costea <dcostea@users.noreply.github.com> Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com> Co-authored-by: Robin Windey <ro.windey@gmail.com> * Actually merge from main (#2) * update tensorflow.net to 0.20.0 (#5404) * upgrade to 3.1 * write inline data using invariantCulture * upodate tensorflow * update Microsoft.ML.Vision * fix test && comment * udpate tensorflow.net to 0.20.1 * update tf major version * downgrade tf runtime to 1.14.1 * Update Dependencies.props * Update Dependencies.props * update tffact to stop running test on linux with glibc < 2.3) * fix TensorFlowTransformInputShapeTest * use tf.v1 api * fix comment: * fix building error * fix test * fix nit * remove linq Co-authored-by: BigBigMiao <BigBigMiao@github.com> * ProduceWordBags Onnx Export Fix (#5435) * fix for issue * fix documentation * aligning test * adding back line * aligning fix Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * [SrCnnEntireAnomalyDetector] Upgrade boundary calculation and expected value calculation (#5436) * adjust expected value * update boundary calculation * fix boundary * adjust default values * fix percent case * fix error in anomaly score calculation Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> * Update OnnxRuntime to 1.5.2 (#5439) * Added prerelease feed and updated to 1.5.2 * Remove prerelease feed * Updated docs * Update doc * Fixed MacOS CI Pipeline builds (#5457) * Added MacOS Homebrew bug fix * nit fix * Improving error message (#5444) * better error fix * revisions Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * Fixed MacOS daily & nightly builds due to Homebrew bug (#5467) * Fixed MacOS nightly builds due to Homebrew bug * Edit workaround * Remove untapping of python2 * Nit edit * Remove installation of mono-libgdiplus * try installing mono-libgdiplus * unlink python 3.8 * Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment (#5163) * Fix issue when parsing float string fails on pl-PL culture set * Added InvariantCulture float parsing as per CodeReview request * Update src/Microsoft.ML.AutoML/Sweepers/SweeperProbabilityUtils.cs Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Update Parameters.cs * Added PL test * Added multiple cultures * debugging CI failure * Debug runSpecific * Revert "Debug runSpecific" This reverts commit 95b728099415cacbe8cf3819ec51ce50cec94eb2. * Removed LightGBM and addressed comments * Increased time * Increase time * Increased time Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> * handle exception during GetNextPipeline for AutoML (#5455) * handle exception during GetNextPipeline for AutoML * take comments * Changing LoadRawImages Sample (#5460) replacing example Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (#5445) * Use ctx.CalncelExecution() to fix AutoML max-time experiment bug * Added unit test for checking canceled experiment * Nit fix * Different run time on Linux * Review * Testing four ouput * Used reflection to test for contexts being canceled * Reviews * Reviews * Added main MLContext listener-timer * Added PRNG on _context, held onto timers for avoiding GC * Addressed reviews * Unit test edits * Increase run time of experiment to guarantee probabilities * Edited unit test to check produced schema of next run model's predictions * Remove scheme check as different CI builds result in varying schemas * Decrease max experiment time unit test time * Added Timers * Increase second timer time, edit unit test * Added try catch for OperationCanceledException in Execute() * Add AggregateException try catch to slow unit tests for parallel testing * Reviews * Final reviews * Added LightGBMFact to binary classification test * Removed extra Operation Stopped exception try catch * Add back OperationCanceledException to Experiment.cs * fix issue 5020, allow ML.NET to load tf model with primitive input and output column (#5468) * handle exception during GetNextPipeline for AutoML * take comments * Enable TesnflowTransformer take primitive type as input column * undo unnecessary changes * add test * update on test * remove unnecessary line * take comments * maxModels instead of time for AutoML unit test (#5471) Uses the internal `maxModels` parameter instead of `MaxExperimentTimeInSeconds` for the exit criteria of AutoML. This is to increase the test stability in case the test is run on a slower machine. * Disabling AutoFitMaxExperimentTimeTest Disabling AutoFitMaxExperimentTimeTest * Fix AutoFitMaxExperimentTimeTest (#5506) *Fixed test Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> * Fix SR anomaly score calculation at beginning (#5502) * adjust expected value * update boundary calculation * fix boundary * adjust default values * fix percent case * fix error in anomaly score calculation * adjust score calculation for first & second points * fix sr do not report anomaly at beginning * fix a issue in batch process * remove a unused parameter Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> * Merge arcade to master (#5525) * Initial commit for Arcade migration * Added omitted files * Changed strong name signing to use the same key for shipping and test assemblies * arcade linux build (#5423) * arcade linux build * put file execution permission change into source control * The `-test` command for windows. Nuget packages (#5464) * working on testing * testing updates * tests almost working * build changes * all tests should be working * changes from PR comments * fixes for .net 3.1 * Fixed extension check. Removed <PackageId> where not needed * Removed pkg folder and updated paths. * Added test key. (#5475) * Added test key. * Update PublicKey.cs Removed extra newline. * Update ComponentCatalog.cs Fixed 3 spaces to 4. * Windows CI working (#5477) * ci testing changes * comments from pr * Added Linux & Mac changes for Arcade (#5479) * Initial Windows, Linux, Macos builds test * Add Linux/MacOS specific CI requirements * Run Arcade CI tests on MacOS/Linux * Fix final package building * Add benchmark download to benchmars .csporj file * Print detailed status of each unit test * Install CentOS & Ubuntu build dependencies * Use container names to differenciate between Ubuntu & CentOS * Remove sudo usage in CentOS * Fix Linux build dependencies * Add -y param to apt install * Remove installation of Linux dependencies * Minor additions * Rename Benchmarks to PerformanceTests for Arcade * Changes * Added benchmark doc changes * Pre-merge changes * Fixing failing Arcade Windows Builds (#5482) * Try Windows build single quote fix * Remove %20 * Added variable space value * Using variables for spacing * Added space values as job parameters * Try conditional variables again * fix official builds * Revert "fix official builds" This reverts commit 7dbbdc7b946f4f48db5452887ad9bf53616a37e8. * fixing tensorflow rebase issue * Fixes for many of the CI builds. (#5496) * yml log changes * Fix NetFX builds by ensuring assembly version is set correctly and not to Arcade default of 42.42.42.42 (#5503) * Fixed official builds for Arcade SDK (#5512) * Added fixes for official builds * Make .sh files executable * fix mkl nuget issue Co-authored-by: Frank Dong <frdong@microsoft.com> * fix code generator tests failure (#5520) * Added fixes for official builds * Make .sh files executable * fix mkl nuget issue * fix code generate test fails * only add necessary dependency Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com> * Fixed memory leaks from OnnxTransformer (#5518) * Fixed memory leak from OnnxTransformer and related x86 build fixes * Reverting x86 build related fixes to focus only on the memory leaks * Updated docs * Reverted OnnxRuntimeOutputCatcher to private class * Addressed code review comments * Refactored OnnxTransform back to using MapperBase based on code review comments * Handle integration tests and nightly build testing (#5509) * Make -integrationTests work * Update .yml file * Added the TargetArchitecture properties * Try out -integrationTest * Missed -integrationTest flag * Renamed FunctionalTestBaseClass to IntegrationTestBaseClass * Missed rename * Modified tests to make them more stable * Fixed leak in object pool (#5521) Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com> Co-authored-by: Frank Dong <frdong@microsoft.com> Co-authored-by: Michael Sharp <misharp@microsoft.com> Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com> * fix benchmark test timeout issue (#5530) * removed old build stuff (#5531) * Fixes Code Coverage in Arcade (#5528) * arcade code coverage changes * adding Michael's changes * updating path Co-authored-by: Keren Fuentes <kedejesu@microsoft.com> * Removed CODEOWNERS file to unify review process (#5535) * Fix publishing problems (#5538) * Removed our dependency to BuildTools by using the NugetCommand Azure Task. * We should publish a nuget named "SampleUtils", but we were publishing it with the name "SamplesUtils" * The naming conventions of our published nugets didn't match the ones described on arcade's docs: Versioning.md. I've also added the option so that when queuing the publishing build, we can pass the VERSIONKIND variable with value "release", so that it produces the nugets with arcade's conventions for "Release official build" nugets (as opposed to the "Daily official build" naming convention that's going to be used now by our CI that publishes nightly nugets). * Updated prerelease label (#5540) * Fix warnings from CI Build (#5541) * fix warnings * also add conditional copy asset to native.proj * test fix warnings * supress nuget warning 5118 * supress other warning * remove unnecessary change * put skip warning at Directory.Buil.props * Updated build instructions (#5534) * Updated build instructions * Adressed reviews * Reviews * removed the rest of the old pkg references: (#5537) * Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (#5395) * Fix for issue 744 * cleanup * fixing report output * fixedTestReferenceOutputs * Fixed test reference outputs for NetCore31 * change top k acc output string format * Ranking algorithm now uses first appearance in dataset rather than worstCase * fixed benchmark * various minor changes from code review * limit TopK to OutputTopKAcc parameter * top k output name changes * make old TopK readOnly * restored old baselineOutputs since respecting outputTopK param means no topK in most test output * fix test fails, re-add names parameter * Clean up commented code * that'll teach me to edit from the github webpage * use existing method, fix nits * Slight comment change * Comment change / Touch to kick off build pipeline * fix whitespace * Added new test * Code formatting nits * Code formatting nit * Fixed undefined rankofCorrectLabel and trailing whitespace warning * Removed _numUnknownClassInstances and added test for unknown labels * Add weight to seenRanks * Nits * Removed FastTree import Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com> Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Fixed Spelling on stopwords (#5524) * Changes to onnx export. (#5544) * Add back missing test project from running on arcade (#5545) * add back test result upload and add missing test project from running * fix identification * filter out performance test result files to avoid warnings * [CodeGenerator] Fix MLNet.CLI build error. (#5546) * upgrade to 3.1 * write inline data using invariantCulture * fix mlnet build error * Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (#5548) * Fixed bug * Tensorflow fix (#5547) * fix tensorflow issue on sample repo * add comments * Update to OnnxRuntime 1.6.0 and fixed bug with sequences outputs (#5529) * Use onnx prerelease * Upgrade to onnx 1.6.0 * Updated docs * Fixed problem with sequences * added in DcgTruncationLevel to AutoML api (#5433) * added in DcgTruncationLevel to automl api * changed default to 10 * updated basline output * fixed failing tests and baselines * Changes from PR comments. * Update src/Microsoft.ML.AutoML/Experiment/MetricsAgents/RankingMetricsAgent.cs Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Changes based on PR comments. * Fix ranking test. Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Created release notes for v1.5.3 (#5543) * Created release notes for v1.5.3 * Updated with review comments * Updated with review comments * Updated release notes with latest PRs * Fixed typo * Forward logs of Experiment's sub MLContexts to main MLContext (#5554) * Forward logs of Experiment's sub MLContexts to main MLContext * Adressed reviews * Update Stale docs (#5550) * Updated OnnxMl.md * Updated MlNetMklDeps docs * Typo * typo * continueOnError on Brew Workaround (#5555) * continueOnError:true * Fix publishing symbols (#5556) * Disable Portable PDB conversion * Push packages to artifacts * Fix symbols issues * Added note about Microsoft.ML.dll * try out just packing * Return Build=false, but actually use configuration * Added missing TargetArchitecture * add back tests * Added missing flags * Updated version to 1.5.4 (#5557) * Fixed version numbers in the right place (#5558) * Updated version to 1.5.4 * Updated version to 1.5.4 * eng (#5560) * Renamed release notes file (#5561) * Renamed release notes file * Updated version number in release notes * Add SymSgdNative reference to AutoML.Tests.csproj (#5559) * runSpecific in YAML * RunSpecific in test * Add SymSgdNative reference * Revert "RunSpecific in test" This reverts commit fed12b26ae71e7a95d2dd1f4703541138a780d75. * Revert "runSpecific in YAML" This reverts commit f9f328d52cd5b4281ad38b7a6af20c219dd0fd44. * Nuget.config url fix for roslyn compilers (#5584) * fixed nuget url, versions, and failing tests * changes from pr comments and MacOS changes * MacOS homebrew bug workaround * removed unnused nuget url * added in note that PredictionEngine is not thread safe (#5583) * Onnx Export for ValueMapping estimator (#5577) * Fixed Averaged Perceptron default value (#5586) * fixed missed averaged perceptron default value * fixed extension api * fixed test baselines * fixing official build (#5596) * Release/1.5.4 fix (#5599) * Nuget.config url fix for roslyn compilers (#5584) * fixed nuget url, versions, and failing tests * changes from pr comments and MacOS changes * MacOS homebrew bug workaround * removed unnused nuget url * fixing official build (#5596) * Remove references to Microsoft.ML.Scoring (#5602) This was the very first ONNX .NET bindings, it was replaced with Microsoft.ML.OnnxRuntime then Microsoft.ML.OnnxRuntime.Managed. * Make ColumnInference serializable (#5611) * upgrade to 3.1 * write inline data using invariantCulture * make column inference serializable * add test json * add approvaltests * fixerd nuget.config (#5614) * Fix issue in SRCnnEntireAnomalyDetector (#5579) * update * refine codes * update comments * update for nit Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com> * Offer suggestions for possibly mistyped label column names in AutoML (#5574) (#5624) * Offer suggestions for possibly mistyped label column names * review changes * TimeSeries - fix confidence parameter type for some detectors (#4058) (#5623) * TimeSeries - fix confidence parameter type for some detectors. - The public API exposed confidence parameters as int even though it's internally implemented as double - There was no workaround since all classes where double is used are internal - This caused major issues for software requiring high precision predictions - This change to API should be backwards compatible since int can be passed to parameter of type double * TimeSeries - reintroduce original methods with confidence parameter of type int (to not break the API). * TimeSeries - make catalog API methods with int confidence parameter deprecated. - Tests adjusted to not use the deprecated methods * Update Conversion.cs (#5627) * Documentation updates (#5635) * documentation updates * fixed spelling error * Update docs/building/unix-instructions.md Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com> Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com> * AutoML aggregate exception (#5631) * added check for aggregate exception * Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * pulled message out to private variable so its not duplicated * Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com> * Treat TensorFlow output as non-batched. (#5634) * Can now not treat output as batched. * updated comments based on PR comments. * Fixing saving/loading with new parameter. * Updates based on PR comments * Update src/Microsoft.ML.TensorFlow/TensorflowUtils.cs Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * reverted accidental test changes * fixes based on PR comments Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * Added in release notes for 1.5.5 (#5639) * added in release notes * Update release-1.5.5.md Removed incorrect PR. * Update docs/release-notes/1.5.5/release-1.5.5.md Co-authored-by: Eric StJohn <ericstj@microsoft.com> * Update docs/release-notes/1.5.5/release-1.5.5.md Co-authored-by: Eric StJohn <ericstj@microsoft.com> * Update release-1.5.5.md Co-authored-by: Eric StJohn <ericstj@microsoft.com> * updating version after release (#5642) * Move DataFrame to machinelearning (#5641) * Change namespace to Microsoft.Data.Analysis (#2773) * Update namespace to Microsoft.Data.Analysis * Remove "DataFrame" from the test project name * APIs for reversed binary operators (#2769) * Support reverse binary operators * Fix file left behind in a rebase * Fix whitespace * Throw for incompatible inPlace (#2778) * Throw if inPlace is set and types mismatch * Unit test * Better error message * Remove empty lines * Version, Tags and Description for Nuget (#2779) * Version, Tags and Description for Nuget * sq * Flags for release (#2781) * Publish packages to artifacts * Flags for release * Fix the Description method to not throw (#2786) * Fix the Description method to not crash Adds an Info method * sq * Address feddback * Last round of feedback * Use dataTypes if it passed in to LoadCsv (#2791) * Fix LoadCsv to use dataType if it passed in * sq * Don't read the full file after guessRows lines have been read * Address feedback * Last round of feedback * Creating a `Rows` property, similar to `Columns` (#2794) * Rows collection, similar to Columns * Doc * Some minor clean up * Make DataFrameRow a view into the DataFrame * sq * Address feedback * Remove DataFrame.RowCount * More row count changes * sq * Address feedback * Merge upstream * DataFrame.LoadCsv throws an exception on projects targeting < netcore3.0 (#2797) Fixing by passing in an encoding and a default buffer size. Also, get our tests running on .NET Framework. Fix #2783 * Params constructor on DataFrame (#2800) * Params constructor on DataFrame * Delete redundant constructors * Remove `T : unmanaged` constraint from DataFrameColumn.BinaryOperations (#2801) * Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations * Address feedback * Rename the value version of the APIs * sq * Fix build * Address feedback * Remove Value from the APIs * sq * Address feedback * Bump version to 0.2.0 (#2803) * Add Apply<TResult>method to PrimitiveDataFrameColumn (#2807) * Add Apply method to PrimitiveDataFrameColumn and its container * Add TestApply test * Remove unused df variable in DataFrameTests * Add xml doc comments to Apply method * Add additional tests for ReadCsv (#2811) * Add additional tests for ReadCsv * Update asserts * Add empty row and skip test pending another fix * Remove test for another issue * Added static factory methods to DataFrameColumn (#2808) * Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type). * Remove regions * Update some parts of the unit tests to use static factory methods to create DataFrameColumns. * Remove errant {T} on StringDataFrameColumn. * PR feedback Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * Append rows to a DataFrame (#2823) * Append rows to a DataFrame * Unit test * Update unit tests and doc * Need to perfrom a type check every time * sq * Update unit test * Address comments * Move corefxlab to arcade (#2795) * Add eng folder * First cut of moving corefxlab to arcade * Move arcade symbol validation inside official buil * Move base yml file to root * Arcade will build, publish packages and symbols * UpdateXlf. Review this * Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel * Remove property that was causing the build to fail * Moving global properties to the main Yaml instead of step in order to unblock publishing * Committing xlfs and changing the build script to not update Xlf on build * clean up corefxlab-base.yml * sq * Delete unused files and scripts * Get rid of all the xlf stuff * Remove UpdateXlfOnBuild for non-NT builds * Minor cleanup * More cleanup * update eng\build.sh permission * Rename to Nuget.config * sq * Remove the runtime spec from global.json * Don't publish test projs * Typo * Move version prefix to versions.props Change prereleaselabel to alpha * Increment version number to list as the latest package Increment version number of Microsoft.Experimental.Collections to list as the latest package Turn off graph generation * Update the Readme * Test removing the scripts folder * Touch readme to force a change * Address Jose's comments * Typo * Move versions to eng/versions.props * Benchmark.proj needs to refer to xunit * Clean up dependencies.props * Remove dependencies.props Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com> * Rename Sort to OrderBy (#2814) * Rename sort to orderby and add orderbydescending method * Add doc strings * Update bench mark test * Update tests * Update DataFrameColumn to use orderby * Update doc comment * Additions to sortby * Revert "Additions to sortby" This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35. * Revert "Update doc comment" This reverts commit 192f7797fe2b77625486637badf77046162fedbf. * Revert "Update DataFrameColumn to use orderby" This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4. * Explode column types and generate converters (#2857) * Explode column types and generate converters * Clean this * sq * sq * Cherry pick for next commit * sq * Undo unnecessary change * Address remaining concerns from the 2nd DataFrame API Review (#2861) * Move string indexer to Columns * API changes from the 2nd API review * Unit tests * Address comments * Add binary operations and operators on the exploded columns (#2867) * Generate combinations of binary operations and Add * Numeric Converters and CloneAsNumericColumns * Binary, Comparison and Shift operations * Clean up and bug fix * Fix the binary op apis to not be overridden * Internal constructors for exploded types * Proper return types for exploded types * Update unit tests * Update csproj * Revert "Fix the binary op apis to not be overridden" This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1. * Bug fix and unit test * Constructor that takes in a container * Unit tests * Call the implementation where possible * Review sq * sq * Cherry pick for next commit * sq * Undo unnecessary change * Rename to the system namespace column types * Address comments * Push to pull locally * Mimic C#'s arithmetic grammar in DataFrame * Address feedback * Reduce the number of partial column definitions * Address feedback * Add APIs to get the strongly typed columns from a DataFrame (#2878) * CP * sq * sq * Improve docs * Enable xml docs for Data.Analysis (#2882) * Enable xml docs for Data.Analysis * Fix /// summary around inheritdoc * Minor doc changes * sq * sq * Address feedback * Add Apply to ArrowStringDataFrameColumn (#2889) * Support for Exploded columns types in Arrow and IO scenarios (#2885) * Support for Exploded columns types in Arrow and IO scenarios * Unit tests * Address feedback * Bump version (#2890) * Fix versioning to allow for individual stable packages (#2891) * Fix versioning to allow for individual stable packages * sq * Bump Microsoft.Data.Analysis version to 0.4.0 (#2892) * Bump Microsoft.Data.Analysis version to 0.4.0 * Fix https://github.com/dotnet/corefxlab/issues/2906 (#2907) * Fix https://github.com/dotnet/corefxlab/issues/2906 * Improvements and unit tests * sq * Better fix * sq * Improve LoadCsv to handle null values when deducing the column types (#2916) * Unit test to repro * Fix https://github.com/dotnet/corefxlab/issues/2915 Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn * Update src/Microsoft.Data.Analysis/DataFrame.IO.cs Co-authored-by: Günther Foidl <gue@korporal.at> * Update src/Microsoft.Data.Analysis/DataFrame.cs Co-authored-by: Günther Foidl <gue@korporal.at> * Feedback Co-authored-by: Günther Foidl <gue@korporal.at> * Create a 0.4.0 package (#2918) * Revert "Create a 0.4.0 package (#2918)" (#2919) This reverts commit 0bef531289744274ab97e8bbb9e5694b0d855689. * Produce a 0.4.0 build (#2920) * Default Length for StringDataFrameColumn (#2921) (#2923) * Increment version and stop producing stable packages (#2922) * Increment version and stop producing stable packages * Add DataFrame object formatter. (#2931) * Add DataFrame object formatter. * Update nuget dependencies. * Apply CR fixes. * Fix a bug in InsertColumn * Add Microsoft.Data.Analysis.nuget project (#2933) * Add DataFrame object formatter. * Update nuget dependencies. * Apply CR fixes. * Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj. * Add Microsoft.Data.Analysis.nuget project. * Move project to src. Fix nuget project settings. * Remove NoBuild property from project. * Remove IncludeBuildOutput and IncludeSymbols from project. * Add VersionPrefix to project. * Add IncludeBuildOutput property. * Add unit tests. * Downgrade from netcoreapp3.1 to netcoreapp3.0 * Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0) * Add netcoreapp3.1 to global settings * Add dotnet 3.1.5 runtime to global settings * Build fixes * Moving MDAI into interactive-extensions folder of the package * Minor refactoring * Respond to PR feedback Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com> Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com> Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com> * ColumnName indexer on DataFrame (#2959) * ColumnName indexer on DataFrame Fixes https://github.com/dotnet/corefxlab/issues/2934 * Unit tests * Null column name * Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: false (#2956) * implement FillNulls method for ArrowStringDataFrameColumn * additional asserts for testcase * Prevent DataFrame.Sample() method from returning duplicated rows (#2939) * resolves #2806 * replace forloop with ArraySegment<T> * reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows) * Add WriteCsv plus unit tests. (#2947) * Add WriteCsv plus unit tests. * Add CultureInfo to WriteCsv. Remove index column param. Update unit tests. * Add CR changes. CultureInfo. Separator. * Format decimal types individually. Fix culture info. Fix unit tests. * Format decimal types individually. Fix culture info. Fix unit tests. * Missing values default to a `StringDataFrameColumn` (#2982) * Make LoadCsv more robust * Test empty string column * Retain prev guess where possible * Update FromArrowRecordBatches for dotnet-spark (#2978) * Support for RecordBatches with StructAr…

Fixed the syntax of cited example.

8412e47

KrzysztofCwalina approved these changes May 4, 2018

View reviewed changes

sandyarmstrong reviewed May 4, 2018

View reviewed changes

shauheen requested a review from glebuk May 4, 2018 19:45

glebuk approved these changes May 4, 2018

View reviewed changes

glebuk reviewed May 4, 2018

View reviewed changes

eerhardt merged commit 972f623 into dotnet:master May 4, 2018

danmoseley deleted the fix_readme_md branch May 4, 2018 23:34

zeahmed referenced this pull request in zeahmed/machinelearning May 24, 2018

Merge pull request #2 from dotnet/master

bd83d91

Update local fork

TomFinley mentioned this pull request Jul 18, 2018

Fixed the TextTransform bug where chargrams where being computed differently when using with/without word tokenizer. #548

Merged

ashahabov mentioned this pull request Aug 15, 2018

Predict similar scheme #680

Closed

ericstj pushed a commit to ericstj/machinelearning that referenced this pull request Aug 24, 2018

Merge pull request dotnet#2 from dotnet/master

7d0ea81

Latest dotnet/master

ericstj mentioned this pull request Sep 13, 2018

Enable TensorFlowTransform to work with pre-trained models that are not frozen #853

Merged

eerhardt mentioned this pull request Sep 14, 2018

ComponentCatalog design issues #208

Closed

shmoradims pushed a commit to shmoradims/machinelearning that referenced this pull request Oct 22, 2018

Addressed PR comments dotnet#2

1b18db5

sfilipi mentioned this pull request Oct 26, 2018

Adding training statistics for LR in the HAL learners package. #1392

Merged

This was referenced Oct 29, 2018

Replaces ChooseColumnsTransform and DropColumnsTransform with SelectColumnsTransform #1371

Merged

SelectColumnsTransform drops columns that it should not drop #1504

Closed

abgoswam mentioned this pull request Jan 23, 2019

Multiple feature columns in FFM #2205

Merged

abgoswam mentioned this pull request Jan 30, 2019

Creation of components through MLContext: advanced options and other feedback #1798

Closed

Ivanidzo4ka mentioned this pull request Apr 2, 2019

ImageLoadingTransformer hides exceptions in Mapper.MakeGetter #3154

Open

Dmitry-A referenced this pull request in Dmitry-A/machinelearning Apr 12, 2019

Create README.md (#2)

d65315f

sayanshaw24 pushed a commit to sayanshaw24/machinelearning that referenced this pull request Aug 14, 2019

Merge pull request dotnet#2 from Oceania2018/tftransferlearning

8830af9

fix TensorflowUtil.GetModelSchema

cyberkoolman mentioned this pull request Sep 17, 2019

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

Closed

MaxAkbar mentioned this pull request Oct 28, 2019

TextCatalog.ApplyWordEmbedding to KMeans Trainer generates IndexOutOfRangeException #4397

Closed

artemiusgreat mentioned this pull request Dec 30, 2019

lib_lightgbm.dll is not getting loaded while running benchmarks on .NetFramework #1945

Closed

harishsk mentioned this pull request Feb 13, 2020

Added Done() call in BaseTestBaseline.Cleanup and added related fixes #4823

Merged

artemiusgreat mentioned this pull request Mar 2, 2020

Dynamic number of features for the trainer / schema #4903

Closed

frank-dong-ms-zz mentioned this pull request Jun 23, 2020

predictionEngine breaks after saving/loading a Model #4226

Closed

emilylawton mentioned this pull request Jul 14, 2020

Unable to load shared library 'CpuMathNative' or one of its dependencies. #5299

Closed

ankitasankars mentioned this pull request Aug 31, 2021

Removing Logging Line from ch.Info (#5598) #5920

Closed

4 tasks

ghost locked as resolved and limited conversation to collaborators Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed the syntax of cited example. #2

Fixed the syntax of cited example. #2

zeahmed commented May 4, 2018

sandyarmstrong May 4, 2018

eerhardt commented May 4, 2018

glebuk commented May 4, 2018

glebuk left a comment

glebuk May 4, 2018 •

edited

Loading

Fixed the syntax of cited example. #2

Fixed the syntax of cited example. #2

Conversation

zeahmed commented May 4, 2018

sandyarmstrong May 4, 2018

Choose a reason for hiding this comment

eerhardt commented May 4, 2018

glebuk commented May 4, 2018

glebuk left a comment

Choose a reason for hiding this comment

glebuk May 4, 2018 • edited Loading

Choose a reason for hiding this comment

glebuk May 4, 2018 •

edited

Loading