Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneHotEncoding sample #2779

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
using System;
using System.Collections.Generic;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.OneHotEncodingTransformer;

namespace Microsoft.ML.Samples.Dynamic
{
public static class OneHotEncodingTransform

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneHotEncodingTransform [](start = 24, length = 23)

Please rename this to OneHotEncoding (identical to the API extension method that it correspond to)

{
public static void Example()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it come from an existing test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example [](start = 27, length = 7)

please link it to the extension it documents through the node. See the other extension methods xml doc.

{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();

// Get a small dataset as an IEnumerable and convert it to an IDataView.
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> [](start = 12, length = 55)

var

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetInfertData [](start = 101, length = 13)

think we want to deprecate the infert datasets from the samples; as it is a sensitive one. Is it possible to use one of the other dataset snippets?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use Adult dataset that exists in DatasetUtils


In reply to: 261049116 [](ancestors = 261049116)

var trainData = ml.Data.LoadFromEnumerable(data);

// Preview of the data.
//
// Age Case Education Induced Parity PooledStratum RowNum ...
// 26 1 0-5yrs 1 6 3 1 ...
// 42 1 0-5yrs 1 1 1 2 ...
// 39 1 0-5yrs 2 6 4 3 ...
// 34 1 0-5yrs 2 4 2 4 ...
// 35 1 6-11yrs 1 3 32 5 ...

// A pipeline for one hot encoding the Education column.
var pipeline = ml.Transforms.Categorical.OneHotEncoding("EducationOneHotEncoded", "Education", OutputKind.Bag);
Copy link
Member

@wschin wschin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var pipeline = ml.Transforms.Categorical.OneHotEncoding("EducationOneHotEncoded", "Education", OutputKind.Bag);
var pipeline = ml.Transforms.Categorical.OneHotEncoding("EducationOneHotEncoded", "Education", OutputKind.Bag);

Need an empty line here.

// Fit to data.
var transformer = pipeline.Fit(trainData);

// Get transformed data
var transformedData = transformer.Transform(trainData);

// Getting the data of the newly created column, so we can preview it.
var encodedColumn = transformedData.GetColumn<float[]>(ml, "EducationOneHotEncoded");

// A small printing utility.
Action<string, IEnumerable<float[]>> printHelper = (colName, column) =>
{
foreach (var row in column)
{
for (var i = 0; i < row.Length; i++)
Console.Write($"{row[i]} ");
Console.WriteLine();
}
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this to SampleUtils.ConsoleUtils


printHelper("Education", encodedColumn);

// data column obtained post-transformation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EducationOneHotEncoded data column

// 1 0 0 0 ...
// 1 0 0 0 ...
// 1 0 0 0 ...
// 1 0 0 0 ...
// 0 1 0 0 ...
// ....
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add a sample for the overload version with ColumnOptions and call it OneHotEncodingWithOptions.cs