Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VBuffer documentation #3136

Merged
merged 6 commits into from
Apr 1, 2019
Merged

Update VBuffer documentation #3136

merged 6 commits into from
Apr 1, 2019

Conversation

TomFinley
Copy link
Contributor

Towards #3095, specifically about VBuffers. Changes documentation to reflect the changes made by @eerhardt in his refactoring of #1580.

/// <param name="ivDst">The starting index of <paramref name="dst"/> at which to start copying</param>
/// <param name="defaultValue">The value to fill in for the implicit sparse entries. This is a potential exception to
/// general expectation of sparse <see cref="VBuffer{T}"/> that the implicit sparse entries have the default value
/// of <typeparamref name="T"/>.</param>
public void CopyTo(Span<T> dst, int ivDst, T defaultValue = default(T))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ivDst [](start = 44, length = 5)

Looks like we still have some acronyms and whatever this is lurking around. Not sure if care enough to change the API for v1...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that changing a parameter name after you ship an API is a breaking change. So it's now or never.

Copy link
Contributor Author

@TomFinley TomFinley Mar 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... I was hoping to not make this one of the cherry picked PRs. Let me do this, I'll do the review comments in one commit, then do the changes to parameters in an isolated commit, and then we can decide what we want to do. /cc @shauheen

Adding label "shiproom-review" in light of this... sigh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so, last commit titled Update parameter names to avoid abbreviations, e.g., src => source and suchlike. is up, I updated the parameter names in VBuffer. Names in VBufferEditor were fine. I also updated the documenation in the .md file, but I did not update the actual code that inspired the example since that is in the internal API. (But, the documentation uses the idea merely as a simple, easy to understand example, so I think that's fine.) I took some queues from things like Array.Copy, but without going to far as calling the parameters VBuffer<T> sourceBuffer since I felt that might be going a bit too far, I just named them things like source rather than go quasi-hungarian w.r.t. the names, so, just source, destination, etc., as I believe you also did in VBufferEditor @eerhardt (e.g., your Create methods take destination, not destinationBuffer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW on the same subject of renamings, there was one place @eerhardt pointed out where an index was being named slot for some reason (since I guess we conceptually separate the concepts of "columns" and "slots" in data view, but, for VBuffer specifically, this should have just been index).

@TomFinley TomFinley added the documentation Related to documentation of ML.NET label Mar 29, 2019
@glebuk
Copy link
Contributor

glebuk commented Mar 29, 2019

    /// The logical length of the buffer.

perhaps make it a bit clearer:
The logical length of the buffer. The value would include counts of both sparse and dense values.


Refers to: src/Microsoft.ML.DataView/VBuffer.cs:49 in ce4618d. [](commit_id = ce4618d, deletion_comment = False)

@glebuk
Copy link
Contributor

glebuk commented Mar 29, 2019

A VBuffer<T> is a generic type that supports both dense and sparse vectors

Perhaps, it should mention how to represent multi-dimensional data, such as multi-color pixels? (either howto or mention the limitation.) #WontFix


Refers to: docs/code/VBufferCareFeeding.md:9 in ce4618d. [](commit_id = ce4618d, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

Perhaps, it should mention how to represent multi-dimensional data, such as multi-color pixels? (either howto or mention the limitation.)

Hmmm. I agree that's worth mentioning, but that really has nothing to do with VBuffer<T>, but VectorDataViewType. (I mean, the concept of multiple dimensions does not appear anywhere in any method related to VBuffer itself as far as I can see, even once.)

For that reason, would you mind if I saved this clarification to the point when I am working on updating the IDataViewTypeSystem.md document, where I'd also presumably change the documentation on some of the DataViewType XML comments? I'm not sure this change belongs on VBuffer, since again by itself VBuffer knows literally not one thing about multiple dimensions.

@TomFinley
Copy link
Contributor Author

    /// The logical length of the buffer.

perhaps make it a bit clearer:
The logical length of the buffer. The value would include counts of both sparse and dense values.

Refers to: src/Microsoft.ML.DataView/VBuffer.cs:49 in ce4618d. [](commit_id = ce4618d, deletion_comment = False)

Sounds good. I also took the opportunity to reinforce the idea of implicit and explicit values.

* Improve the XML documentation for VBuffer/VBufferEditor.
* Update the "best practices" documentation to reflect recent changes.
@TomFinley
Copy link
Contributor Author

Thanks @glebuk, I updated it, let me know if you're happy with it yet.

@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #3136 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3136      +/-   ##
==========================================
- Coverage   72.52%   72.51%   -0.01%     
==========================================
  Files         808      808              
  Lines      144665   144665              
  Branches    16198    16198              
==========================================
- Hits       104912   104906       -6     
- Misses      35342    35348       +6     
  Partials     4411     4411
Flag Coverage Δ
#Debug 72.51% <ø> (-0.01%) ⬇️
#production 68.11% <ø> (-0.01%) ⬇️
#test 88.81% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0%> (-20.52%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.31% <0%> (+0.26%) ⬆️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0%> (+1.45%) ⬆️

@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #3136 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3136      +/-   ##
==========================================
+ Coverage   72.52%   72.52%   +<.01%     
==========================================
  Files         808      808              
  Lines      144665   144740      +75     
  Branches    16198    16202       +4     
==========================================
+ Hits       104912   104977      +65     
- Misses      35342    35352      +10     
  Partials     4411     4411
Flag Coverage Δ
#Debug 72.52% <ø> (ø) ⬆️
#production 68.11% <ø> (-0.01%) ⬇️
#test 88.82% <ø> (+0.01%) ⬆️
Impacted Files Coverage Δ
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0%> (-20.52%) ⬇️
.../Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs 97.26% <0%> (-2.74%) ⬇️
...rc/Microsoft.ML.StaticPipe/SdcaStaticExtensions.cs 81.72% <0%> (-0.61%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...oft.ML.StandardTrainers/Standard/SdcaRegression.cs 95.83% <0%> (ø) ⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 72.95% <0%> (+0.17%) ⬆️
...StandardTrainers/Standard/LinearModelParameters.cs 60.31% <0%> (+0.26%) ⬆️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 24.46% <0%> (+0.73%) ⬆️
...StandardTrainers/Standard/StochasticTrainerBase.cs 93.02% <0%> (+4.65%) ⬆️

Copy link
Contributor

@glebuk glebuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just had some random thoughts/comments. Feel free to address as you see appropriate.

@TomFinley TomFinley merged commit e5cbca7 into dotnet:master Apr 1, 2019
shauheen pushed a commit to shauheen/machinelearning that referenced this pull request Apr 2, 2019
* Improve VBuffer documentation.
* Improve the XML documentation for VBuffer/VBufferEditor.
* Update the "best practices" documentation to reflect recent changes.
* Update parameter names to avoid abbreviations, e.g., src => source and suchlike.
@TomFinley TomFinley deleted the DocVector branch April 19, 2019 20:13
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants