Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking in the samples generated during bug bash for MissingNa, Repl… #2960

Merged
merged 2 commits into from
Mar 15, 2019

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Mar 14, 2019

Towards #1209

Gathering the work of PR: #2814, #2779 and #2773


var samples = new List<DataPoint>()
{
new DataPoint(){ Label = 3, Features = new float[3] {1, 1, 0} },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Label column is not used in the following transform, we can remove it completely.

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have in-memory class definition, in-memory data set creation, in-memory prediction. What else I can ask?

// 'true' where the value in the input column is NaN. This value can be used
// to replace missing values with other values.

IEstimator<ITransformer> pipeline = mlContext.Transforms.IndicateMissingValues("MissingIndicator", "Features");
Copy link
Contributor

@rogancarr rogancarr Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEstimat [](start = 12, length = 8)

Blank line above #Resolved


// a small printing utility
Func<object[], string> vectorPrinter = (object[] vector) =>
{
Copy link
Contributor

@rogancarr rogancarr Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break out of main code path and into a helper. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the main logic is above. Breaking out would just change the order of what comes first to the attention of the users: the definition of printing or printing itself..


In reply to: 265789218 [](ancestors = 265789218)

// And finally, we can write out the rows of the dataset, looking at the columns of interest.
foreach (var row in rowEnumerable)
{
Console.WriteLine($"Label: {row.Label} Features: {vectorPrinter(row.Features.Cast<object>().ToArray())} MissingIndicator: {vectorPrinter(row.MissingIndicator.Cast<object>().ToArray())}");
Copy link
Contributor

@rogancarr rogancarr Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Cast().ToArray() [](start = 92, length = 25)

This is a bit confusing for a sample, IMHO. Maybe better to just have two helper functions? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels self-explanatory since it casts, than ToArray. Addign yet another sample that does the same thing might make the sample look less professional.


In reply to: 265789625 [](ancestors = 265789625)

{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();
Copy link
Contributor

@rogancarr rogancarr Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ml [](start = 16, length = 2)

mlContext #Resolved

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits!
:shipit:

@codecov
Copy link

codecov bot commented Mar 14, 2019

Codecov Report

Merging #2960 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2960      +/-   ##
==========================================
+ Coverage   72.29%   72.29%   +<.01%     
==========================================
  Files         796      796              
  Lines      142349   142349              
  Branches    16051    16051              
==========================================
+ Hits       102905   102908       +3     
+ Misses      35063    35062       -1     
+ Partials     4381     4379       -2
Flag Coverage Δ
#Debug 72.29% <ø> (ø) ⬆️
#production 68.01% <ø> (ø) ⬆️
#test 88.48% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 100% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/ExtensionsCatalog.cs 75% <ø> (ø) ⬆️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 85.69% <0%> (+0.16%) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
...StandardTrainers/Standard/LinearModelParameters.cs 60.9% <0%> (+0.26%) ⬆️

@@ -6,7 +6,7 @@ internal static class Program
{
static void Main(string[] args)
{
CustomMapping.Example();
ReplaceMissingValues.Example();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplaceMissingValues [](start = 12, length = 20)

please don't change this file. It creates unnecessary merge conflicts.

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 9cd9a8c into dotnet:master Mar 15, 2019
@sfilipi sfilipi deleted the bugBashSamples branch March 15, 2019 04:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants