Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F# Records not compatible with ML .NET #180

Closed
isaacabraham opened this issue May 17, 2018 · 23 comments
Closed

F# Records not compatible with ML .NET #180

isaacabraham opened this issue May 17, 2018 · 23 comments
Labels
F# Support of F# language

Comments

@isaacabraham
Copy link

isaacabraham commented May 17, 2018

Perhaps related to #92. Apparently ML .NET doesn't work with F# records! Is this perhaps to do with the fact that F# Records use a different naming convention to Classes for backing fields, and the ML .NET Library. If you use records, the library simply can't find the columns and complains of missing columns. If you port from records to classes with mutable properties, it doesn't fail with that error any longer.

However, mutable classes are pretty much non-idiomatic in F# - no one uses them for the sorts of data-bound workloads that you'll see with ML.

Instead of this:

type SentimentData =
    { [<Column(ordinal = "0")>] SentimentText : string
      [<Column(ordinal = "1", name = "Label")>] Sentiment : float }

[<CLIMutable>]
type SentimentPrediction =
    { [<ColumnName "PredictedLabel">] Sentiment : bool }

You'll have to use something like this monstrosity.

type SentimentData() =
    [<Column(ordinal = "0"); DefaultValue>]
    val mutable SentimentText : string
    [<Column(ordinal = "1", name = "Label"); DefaultValue>]
    val mutable Sentiment : float32

type SentimentPrediction() =
    [<ColumnName "PredictedLabel"; DefaultValue>]
    val mutable Sentiment : bool

You can't even use the member val shorthand syntax that F# provides for mutable get / set properties since the ColumnName attribute doesn't work with them. Plus, you lose all the standard features that Records bring such as lightweight syntax, easy creation, copy-and-update, immutability and structural equality.

I strongly recommend adding support for them by ensuring that whatever internal hydration logic that is currently coupled to classes supports records as well.

@forki
Copy link
Contributor

forki commented May 17, 2018

should this be part of unittests and samples? How can it bind to backing fields in the first place!?

@isaacabraham
Copy link
Author

isaacabraham commented May 17, 2018

The "backing fields" bit is an assumption on my part - normally [<CLIMutable>] does the trick for libraries that want to use a default constructor and then set the properties using e.g. reflection, but it doesn't work here. So I assume that it's using some field-based reflection.

Having said that, if we want a truly F#-friendly API then even [<CLIMutable>] shouldn't be required.

@shauheen shauheen added the F# Support of F# language label May 17, 2018
@asthana86
Copy link
Contributor

@cartermp

@isaacabraham
Copy link
Author

@asthana86 This is a common issue when APIs are designed for use with C# and use e.g. reflection. I'm not sure if @cartermp will have any answers here - you probably need to look into the ML .NET codebase to see where and how it does the hydration.

@terrajobst
Copy link
Member

terrajobst commented Jun 6, 2018

Assuming F# records are just producing get/set properties it’s not even just a problem for F# — public fields aren’t canonical C# code either. We gave the same feedback in the API review but it didn’t make the cut yet.

Presumably #254 would fix that then.

@cartermp
Copy link

cartermp commented Jun 6, 2018

In practice, the trick for something like working with an ORM or equivalent piece of machinery that wants to do things to properties with reflection is ye olde [<CLIMutable>]:

[<CLIMutable>]
type Hoopty =
    { Doopty: int }

Which effectively turns it into a property with get/set so the underlying thingie can go to town on it.

So I would expect #254 to fix this. That said, it would be nice if we didn't need to set that somehow.

@isaacabraham
Copy link
Author

@terrajobst They produce getter-only properties (with backing fields that are generated with a different convention to C# backing fields).

You can avoid the need for [<CLIMutable>] completely, it's not that difficult - F# Records without the attribute always have a public constructor with an argument for each property, and this is (AFAIK) done in a deterministic order (although I can't remember what that order is!).

@terrajobst
Copy link
Member

I see. Supporting immutable types might be harder in the engine. @glebuk would know for sure.

@isaacabraham
Copy link
Author

I would hope that the library doesn't modify data that you supply to it! :-)

@isaacabraham
Copy link
Author

isaacabraham commented Jun 7, 2018

To be honest something else we discussed was to allow the caller to supply data to the library rather than the library hydrating data itself. This is kind of a prerequisite for in memory data support anyway. At that point it should not matter whether it's a record or not.

But that's kind of a separate thing.

@voronoipotato
Copy link

Even if it was "harder" I don't understand why you couldn't construct a mutable from the record, then construct a new record from the final result of the mutation.

record -> mutable ~> mutable -> record

Creating a kind of "sh#t sandwich" . No judgement intended it's just a euphemism for wrapping something you don't want, with two things you do.

@isaacabraham
Copy link
Author

@voronoipotato that would probably be even more work as it would still mean creating initial records with default values for all the fields and then modifying them later manually by creating new records. I think either use the CLIMutable approach, which is probably easier but sub-optimal, or create it based on the ctor.

@dsyme
Copy link
Contributor

dsyme commented Jul 30, 2018

#254 is definitely the underlying problem here.

@asthana86 Who is the expert on the schema mapping code? If we wanted to allow get/set properties (rather than just public fields) as the basis for schema mappings then who should we talk to? Thanks

@dsyme
Copy link
Contributor

dsyme commented Jul 30, 2018

@terrajobst re this:

Assuming F# records are just producing get/set properties it’s not even just a problem for F# — public fields aren’t canonical C# code either. We gave the same feedback in the API review but it didn’t make the cut yet.

I could go ahead and do the work to allow get/set properties but I'd need it carefully reviewed. Do you know who is the right person in the ML.NET team to talk to about this? Thanks

@dsyme
Copy link
Contributor

dsyme commented Jul 30, 2018

I searched for uses of "GetFields" and here's what I see:

Relevant:

Not relevant:

@isaacabraham
Copy link
Author

Any chance of getting ctor support as well to avoid the need for CLIMutable?

@dsyme
Copy link
Contributor

dsyme commented Jul 30, 2018

Any chance of getting ctor support as well to avoid the need for CLIMutable?

@isaacabraham I'm not sure. First things first.

@dsyme
Copy link
Contributor

dsyme commented Jul 31, 2018

@isaacabraham Some notes on immutable F# records:

For immutable records, a schema mapper or serialization framework really has no choice but to work with the "internal" representation of the record, i.e. the hidden, private fields. This is what binary serialization does, for example. This is because there are no setter properties on those immutable records, so there's really no other choice: the IDataView schema mapper works by setting fields one at a time on target objects and there is no way we could convert it to use the immutable object constructor for example.

However inputs to ML.NET currently uses field/property names in string form, e.g. "SentimentText" here:

    pipeline.Add(
        TextFeaturizer(
            "Features", [| "SentimentText" |],...

So, if the ML.NET IDataView schema mapper is dealing with internal fields (e.g. private field SentimentText@ used by F#), then it would also have to know about the relationship between internal and external names, i.e.

public property "SentimentText" <-> private field "SentimentText@"

It's possible but F# emits no specific metadata about this relationship, so the rule would have to be F#-specific. It's a stretch but not impossible. However would require more subtle coding than I'm willing to try for now, so CLIMutable is a reasonable stepping point.

@isaacabraham
Copy link
Author

Indeed, CLIMutable is a great step forward. But constructor support seems like it's a no-go because of how schema mapping in ML .NET works internally i.e. setting fields one at a time? Which, as you say, leaves just some custom "F# field mapping" based on convention as the only way to remove CLIMutable.

It's definitely not the end of the world leaving it there, but it would be nice to not require it.

@dsyme
Copy link
Contributor

dsyme commented Aug 2, 2018

Fixed in #616

@dsyme dsyme closed this as completed Aug 2, 2018
@charlesroddie
Copy link

charlesroddie commented Feb 2, 2019

type SentimentData =
    { [<Column(ordinal = "0")>] SentimentText : string
      [<Column(ordinal = "1", name = "Label")>] Sentiment : float }

[<CLIMutable>]
type SentimentPrediction =
    { [<ColumnName "PredictedLabel">] Sentiment : bool }

To me this also looks monstrous. Magic strings everywhere, even ints represented as strings, loss of compiler helpfulness with the reliance on annotations.

Can this be done in ML.Net in a way that is not reflection-based?

@isaacabraham
Copy link
Author

It's been a while since I looked at this but can you now actually access ML net without being coupled to the data access layer? If so, that would ideally remove all this stuff and we could just worry about that ourselves.

@charlesroddie
Copy link

Two ways to do this!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F# Support of F# language
Projects
None yet
Development

No branches or pull requests

9 participants