Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up the SchemaDefinition class #2995

Merged
merged 7 commits into from
Mar 20, 2019

Conversation

yaeldekel
Copy link

Fixes #2978 .


public Column(IExceptionContext ectx, string memberName, DataViewType columnType,
string columnName = null, IEnumerable<AnnotationInfo> annotationInfos = null, Delegate generator = null)
public Column(string memberName, DataViewType columnType,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 12, length = 6)

I would like to make this internal as well. Is there any scenario where users would want to add new columns to the SchemaDefinition generated from the type?

Copy link
Contributor

@TomFinley TomFinley Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. In fact I think that's one of the major scenarios, probably the central reason why people need this structure, isn't it? You start with an empty schema definition, then you add columns to it to describe how you want fields mapped to columns, etc.? This object exists in large part for those situations where you don't or somehow can't rely purely on the reflection based mechanism.


In reply to: 266437352 [](ancestors = 266437352)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking you start with a schema that's auto-generated from the type, then modify the columns that are already in it. The schema generated from the type contains all the possible columns, why would we need to add new columns (as opposed to modifying existing columns)?


In reply to: 266605420 [](ancestors = 266605420,266437352)

@codecov
Copy link

codecov bot commented Mar 18, 2019

Codecov Report

Merging #2995 into master will increase coverage by 0.02%.
The diff coverage is 90.47%.

@@            Coverage Diff             @@
##           master    #2995      +/-   ##
==========================================
+ Coverage   72.41%   72.43%   +0.02%     
==========================================
  Files         803      804       +1     
  Lines      143851   143916      +65     
  Branches    16173    16173              
==========================================
+ Hits       104171   104250      +79     
+ Misses      35258    35250       -8     
+ Partials     4422     4416       -6
Flag Coverage Δ
#Debug 72.43% <90.47%> (+0.02%) ⬆️
#production 68.1% <67.85%> (+0.01%) ⬆️
#test 88.63% <98.7%> (+0.02%) ⬆️
Impacted Files Coverage Δ
test/Microsoft.ML.Tests/Scenarios/Api/TestApi.cs 97.63% <100%> (+0.01%) ⬆️
...soft.ML.Data/DataView/DataViewConstructionUtils.cs 85.27% <100%> (-0.9%) ⬇️
...osoft.ML.Data/DataView/InternalSchemaDefinition.cs 56.94% <50%> (ø) ⬆️
src/Microsoft.ML.Data/Data/SchemaDefinition.cs 69.87% <71.42%> (+1.01%) ⬆️
...osoft.ML.Functional.Tests/SchemaDefinitionTests.cs 98.46% <98.46%> (ø)
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 85.78% <0%> (+0.15%) ⬆️
...ML.Transforms/MutualInformationFeatureSelection.cs 78.9% <0%> (+0.54%) ⬆️
... and 4 more


namespace Microsoft.ML.Functional.Tests
{
public class PredictionEngineScenarios : BaseTestClass
Copy link
Member

@eerhardt eerhardt Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I would name the file and the class name the same. #Resolved

base.Initialize();

_ml = new MLContext(42);
_ml.AddStandardComponents();
Copy link
Member

@eerhardt eerhardt Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to call AddStandardComponents? That should only be necessary when you are doing things like using the MAML syntax. When you are strictly using the API, it shouldn't be necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this line back, because of issue #2996.


In reply to: 266474155 [](ancestors = 266474155)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

}
}
}

private SchemaDefinition()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaDefinition [](start = 16, length = 16)

So, why is this private? I'm thinking about how I'd like to use it. I have my class, I create a new schema definition (but empty), then I populate the mapping. Do I have any other way to create an empty one of these guys?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that not only is there no way to create an empty one, but there is no longer a way to add a new column to it, since the column constructor is now internal.

Are we sure there are no scenarios that need to do this?

@TomFinley
Copy link
Contributor

    public static SchemaDefinition Create(Type userType, Direction direction = Direction.Both)

Is this actually the only way to create a schema definition? As far as I see this auto-populates everything, which is sort of the opposite of what someone actually trying to use this structure would want to do, most of the time. (Since the entire reason someone wants to create this thing is to be explicit about the dataview-type mapping.)


Refers to: src/Microsoft.ML.Data/Data/SchemaDefinition.cs:326 in de633a3. [](commit_id = de633a3, deletion_comment = False)

@TomFinley
Copy link
Contributor

    public static SchemaDefinition Create(Type userType, Direction direction = Direction.Both)

Maybe if you hadn't made the constructor private that would be sufficient.


In reply to: 474069107 [](ancestors = 474069107)


Refers to: src/Microsoft.ML.Data/Data/SchemaDefinition.cs:326 in de633a3. [](commit_id = de633a3, deletion_comment = False)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yaeldekel thanks for working on this. The primary reason why this thing exists is to allow people to be explicit about the mapping of columns. While many of the internalizations and cleanups were appropriate, we took out the ability to construct an empty one, which is the primary use case. So we ought to add that back in.

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yaeldekel!

@yaeldekel yaeldekel merged commit 807d813 into dotnet:master Mar 20, 2019
@yaeldekel yaeldekel deleted the schemadefinition branch March 20, 2019 17:01
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants