Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding documentation about the rest of the classes involved on generating the CSharpAPI #529

Merged
merged 19 commits into from
Jul 18, 2018

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Jul 13, 2018

Resolves #389
Added more documentation and examples about mostly transforms components. (A few learners as well.)

The documentation for the classes involved in generating the entry points lives in the doc.xml documents, since it needs to be referenced from two locations, for the most part, and since the CSharpApi is auto-generated.

sfilipi added 11 commits July 9, 2018 10:39
…les.

For the summary text that is common between several learners, the examples will be added on a separate node.
An example of how that will look like is in the LogisticRegressionBinaryClassifier and LogisticRegressionClassifier.
changing the root of the XML documents from docs -> doc, since its only one.
Switching all <see href /> to the valid <see cref />
@sfilipi
Copy link
Member Author

sfilipi commented Jul 13, 2018

I would suggest to mostly review the text in the doc.xml files.

I have marked this as WIP, because i need to do a final pass on extracting the examples separately from the nodes for the xml that is shared between two classes with different names, and only reference the example from the class with the corresponding name.

I will update the WIP tag, and close this comment once done.
#Resolved

// * Each column builds/uses exactly one "vocabulary" (dictionary).
// * Output columns are KeyType-valued.
// * The Key value is the one-based index of the item in the dictionary.
// * Not found is assigned the value zero.
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to be careful here. Conceptually, the key-values are logically starting at 0, but physically valid values start at 1. I feel like this might not be the best place to talk about key-values unless you're really going to go into them, since otherwise it may be confusing. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it as is, since it is just a code comment for us.


In reply to: 202454960 [](ancestors = 202454960)

@@ -10,6 +10,7 @@
<ProjectReference Include="..\Microsoft.ML.Core\Microsoft.ML.Core.csproj" />
<ProjectReference Include="..\Microsoft.ML.CpuMath\Microsoft.ML.CpuMath.csproj" />
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" />
<ProjectReference Include="..\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" />
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 4, length = 84)

Is this necessary? Is this due to some sort of indirect dependency we have via depedency injection or somesuch? #Closed

Copy link
Member Author

@sfilipi sfilipi Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's actually PKPD referencing FastTree in the documentation through a that adds this project reference.
Removing the reference, and the see tag. Let me know if the opposite is desired.


In reply to: 202456059 [](ancestors = 202456059)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'd prefer to not have things like this result in dependencies. :)


In reply to: 202741801 [](ancestors = 202741801,202456059)

@TomFinley
Copy link
Contributor

TomFinley commented Jul 13, 2018

      The optimization technique used for LogisticRegression Classifier is the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS).

Well, partially. The L1 regularization is actually a technique related but distinct from L-BFGS called OWL-QN. #Pending


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/doc.xml:13 in db2592d. [](commit_id = db2592d, deletion_comment = False)

@TomFinley
Copy link
Contributor

TomFinley commented Jul 13, 2018

      This learner can use elastic net regularization: a linear combination of L1 (lasso) and L2 (ridge) regularizations.

LASSO is an acronym so I wonder if this should be capitalized? #Closed


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/doc.xml:20 in db2592d. [](commit_id = db2592d, deletion_comment = False)

</summary>
<remarks>
In machine learning​ it is a pretty common and powerful approach to utilize the already trained model in the process of defining features.
<para>A most obvious example could be to use the model's scores as features to downstream models. For example, we might run clustering on the original features,
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A most obvious example [](start = 14, length = 22)

I feel like the words "most obvious" aren't adding to the description here. #Closed

<para>A most obvious example could be to use the model's scores as features to downstream models. For example, we might run clustering on the original features,
and use the cluster distances as the new feature set.
Instead of consuming the model's output, we could go deeper, and extract the 'intermediate outputs' that are used to produce the final score. </para>
There's a number of famous or popular examples of this technique:
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a number of famous or popular examples [](start = 8, length = 46)

Should be "There are ... examples" not "There is ... examples". #Closed

</remarks>
<example>
<code>
pipeline.Add(new TreeLeafFeaturizer())
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline.Add(new TreeLeafFeaturizer()) [](start = 10, length = 38)

This is very specific to the CSharpApi.cs code, and will be going away. We're going to need to remember to nuke it. With this going away, ought we to even bother especially since this doesn't seem terribly informative? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment would apply to all the examples. Keeping it, since leaving the documentation without any example would not look good.


In reply to: 202459319 [](ancestors = 202459319)

can be different from LightGbmClassifier, which develops a multi-class classifier directly.
</para>
</remarks>
<seealso cref='Microsoft.ML.Trainers.LogisticRegressionClassifier'>LogisticRegressionClassifier</seealso>
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegressionClassifier [](start = 6, length = 105)

My expectation with <seealso tags is that they take the form <seealso cref="Foo"/> not <seealso cref="Foo">Foo</seealso>. That latter form, even if it were supported, I would discourage us from using because it will not track renames of the elements. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to this when i needed the absolute name, because all the namespace repetition looked lousy. Agree that the renaming is more important, thought. switching back.


In reply to: 202460733 [](ancestors = 202460733)

<para>
This transform uses a set of aggregators to count the number of non-default values for each slot and
instantiates a <see cref="Microsoft.ML.Runtime.Data.DropSlotsTransform"/> to actually drop the slots.
This transform is useful when applied together with a CategoricalHashOneHotVectorizer.
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoricalHashOneHotVectorizer [](start = 64, length = 31)

Should this have a <see tag somehow? #Resolved

Copy link
Member Author

@sfilipi sfilipi Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, but the compiler was complaining of not being able to find a reference to that class, although the cref of the see tag had absolute reference to the CategoricalHashOneHotVectorizer, starting from the namespace.
I'll check the proj references. Makes no sense to add a reference to the dll containing the C# API, though (if that will fix it ).


In reply to: 202461085 [](ancestors = 202461085)

Copy link
Member

@sharwell sharwell Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfilipi There are a few ways to address this if you want the link to resolve but don't want to mess up code references. My favorite tends to be declaring an assembly reference but specifying <Aliases> to ensure the reference can't be readily used in code. This form allows for a compiler-validated cref that doesn't result in an assembly reference in the compiled assembly. I'll try to find an example.

A secondary way to resolve this is using a fully-expanded cref that the compiler simply passes through without validation, but that documentation tools know how to parse:

<!-- This is inferior because the compiler will not validate it, but works as a last resort -->
<see cref="T:Fully.Qualified.Path.To.TypeName" />
``` #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what was not working, actually.
I didn't specify the T: in the cref. I'll try that.
Thanks for the suggestion, though.


In reply to: 202780648 [](ancestors = 202780648)

// Imputation modes are supported for vectors both by slot and across all slots.
// </summary>
// REVIEW: May make sense to implement the transform template interface.
/// <include file='doc.xml' path='doc/members/member[@name="NAReplace"]/*' />
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the XML comments have to be tripple /// indented.

I think this // REVIEW comment might be messing up the XML parsing. Maybe it should go somewhere else (or even away, it's not adding much info). My suspicion is that this comment isn't actually being compiled into the docs. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... wait were you trying to make it no longer an XML doc comment?

In that case you might want to take our those <summary> tags.


In reply to: 202461616 [](ancestors = 202461616)


<member name="CharacterTokenizer">
<summary>
Character-oriented tokenizer where text is considered a sequence of characters.
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Character-oriented [](start = 8, length = 18)

There's something that seems a bit off about this description. "Character-oriented" is a strange description, in any case text is as far as I know always considered a sequence of characters, and at the end of this I don't really have a clear idea of what it's doing.

Maybe, "breaks text into individual tokens consisting of each individual character" might be a good description? #Closed

</item>
<item>
<description>L-p vector normalization.​</description>
</item>
Copy link
Contributor

@TomFinley TomFinley Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later in this document and I think in other documents you tended to do

<item><description>Foo</description></item>

vs.

<item>
    <description>Foo</description>
</item>

The first form might make this list a little less voluminous and easier to read. #Closed

These learner will request normalization from the data pipeline if the
classifier indicates it would benefit from it. Note that even if the
classifier indicates that it does not need caching, OVA will always
request caching, as it will be performing multiple passes over the data set.
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be misplaced #Closed

<example name="MultiClassNaiveBayesTrainer">
<example>
<code language="csharp">
pipeline.Add(new NaiveBayesClassifier(){ NormalizeFeatures = NormalizeOption.Auto, Caching = CachingOptions.Memory });
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note about consistency between examples #Closed

</example>

<member name="OVA">
<summary>
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 7, length = 7)

I think the text below would make a great first paragraph of the remarks section, and summary should be one sentence along the lines of:

Trains a one-versus-all multi-class classifier on top of specified binary classifier. #Closed

Prediction is then performed by running these binary classifiers, and choosing the prediction with the highest confidence score.
</summary>
<remarks>
<para>This algorithm can be treated as a wrapper for all the binary classifiers in ML.NET.
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be treated as a wrapper for all [](start = 29, length = 35)

'can be used with any of' sounds more appropriate #Closed

@@ -31,6 +31,29 @@ namespace Microsoft.ML.Runtime.Learners
using TDistPredictor = IDistPredictorProducing<Float, Float>;
using CR = RoleMappedSchema.ColumnRole;

/// <summary>
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 9, length = 7)

Why here? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no entry point for it. it is only needed here.


In reply to: 203200111 [](ancestors = 203200111)

Here p(x,y) is the joint probability density function of X and Y, p(x) and p(y) are the marginal probability density functions of X and Y respectively.
In general, a higher mutual information between the dependent variable (or label) and an independent variable (or feature) means
that the label has higher mutual dependence over that feature.
The mutual information feature selection mode selects the features based on the mutual information.
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a tautology to me. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not adding to the understanding either. Removing it.


In reply to: 203200715 [](ancestors = 203200715)

</example>
</example>

<member name="MutualInformationFeatureSelection">
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutualInformationFeatureSelection [](start = 18, length = 33)

Duplicate? #Closed

</remarks>
<example>
<code language="csharp">
pipeline.Add(new FeatureSelectorByMutualInformation() { Column = new[]{ &quot;Feature1&quot;}, SlotsInOutput = 6 });
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new[] [](start = 75, length = 5)

does it have to be an array? That's unfortunate #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a string[]


In reply to: 203201036 [](ancestors = 203201036)

</member>

<member name="OptionalColumnTransform">
<summary>
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 7, length = 7)

this should probably go into remarks?
The summary could then be

Creates a new column with the specified type and default values.
``` #Closed

</para>
<para>
When computing the mean/max/min value, there is also an option to compute it over the whole column instead of per slot.
This option has a default value of true for variable length vectors, and false for known length vectors.
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, I remember that... Not a source of confusion at all... #Closed

Un-groups vector columns into sequences of rows, inverse of Group transform.
</summary>
<remarks>
<para>This can be thought of as an inverse of the CombinerByContiguousGroupId.
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CombinerByContiguousGroupId [](start = 58, length = 27)

link? #Closed

</example>

<member name="KeyToText">
<summary>
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 7, length = 7)

I suggest

Transforms keys in a key column into the corresponding values

The fact that it is using the metadata looks like implementation detail to me #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good


In reply to: 203202293 [](ancestors = 203202293)

The resulting data will have all the group key columns preserved,
and the aggregated columns will become variable-length vectors of the original types.
<para>This transform essentially performs the following SQL-like operation:</para>
<para>GroupKey1, GroupKey2, ... GroupKeyK, LIST(Value1), LIST(Value2), ... LIST(ValueN)</para>
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

para [](start = 8, length = 4)

I think you missed the SELECT keyword #Closed

//
// All metadata is preserved for the retained columns. For 'unrolled' columns, all known metadata
// except slot names is preserved.
/// <include file='doc.xml' path='doc/members/member[@name="Ungroup"]/*' />
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include [](start = 9, length = 7)

Why do we have both text and include here? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the comment where they were in the original places, just made them comments instead of documentation - style comments, to avoid having someone working on the code having to open the doc and read through it.

The text in the doc was better worded that the actual comment in the code.


In reply to: 203202675 [](ancestors = 203202675)

</example>

<member name="WordTokenizer">
<summary>
Copy link
Contributor

@Zruty0 Zruty0 Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 7, length = 7)

again, this would make a good first paragraph in remarks, and the summary could be

Splits the text into words using the separator character(s).
``` #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the changes done, only one sentence is gone.


In reply to: 203203181 [](ancestors = 203203181)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching.


In reply to: 203513343 [](ancestors = 203513343,203203181)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 18, 2018

    This classifier is a trainer based on the Stochastic DualCoordinate Ascent(SDCA) method, a state-of-the-art optimization technique for convex objective functions.

Sorry, I wasn't clear. DualCoordinate should be replaced with dual coordinate. convex objective functions is good.


In reply to: 405745390 [](ancestors = 405745390)


Refers to: src/Microsoft.ML.StandardLearners/Standard/doc.xml:10 in 6f45e69. [](commit_id = 6f45e69, deletion_comment = False)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 839bd6d into dotnet:master Jul 18, 2018
@sfilipi sfilipi deleted the transformsDocs2 branch July 18, 2018 23:45
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
…ting the CSharpAPI (dotnet#529)

* Moving from xml strings to having the documentation details in xml files.
For the summary text that is common between several learners, the examples will be added on a separate node.
An example of how that will look like is in the LogisticRegressionBinaryClassifier and LogisticRegressionClassifier.

* fixing the aftermath of renaming the XML file.

* removing the Desc from the EntryPoint attribute is a bad idea.

* removing the XML docs from the doc folder, and added them under the respective projects.

* Some OS get picky about casing.

* file name should be vanilla

* Adding documentation for the first group of transforms

* adding more documentation.
changing the root of the XML documents from docs -> doc, since its only one.
Switching all <see href /> to the valid <see cref />

* formatting tweaks, and adressing most of the code comments.

* Extracted the examples outside of the member nodes in the xml, so that they only appear in the CSharpApi classes, and not on the runtime classes.

* small fixes

* addressing code comments

* addressing Pete's comments.

* Fixing language around the CharTokenizer description.

Closes dotnet#389
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants