Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML.Net and Tensorflow integration demo. #3

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

zeahmed
Copy link
Owner

@zeahmed zeahmed commented Aug 10, 2018

In this demo, I created a TensorflowTransform that takes following as input and creates column based on output name provided. The output column can be used as features for further training in the ML.Net pipeline.

  • Model Directory
  • Output Name (which become column in ML.Net)
  • Input Name (it should be same as name in the model and name of the column in ML.Net)

@@ -6,6 +6,10 @@
<DefineConstants>CORECLR</DefineConstants>
</PropertyGroup>

<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|AnyCPU'">
<NoWarn>CS8002;1591</NoWarn>
Copy link

@ericstj ericstj Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that it isn't strong named. That will cause issues on desktop. We could ask for it to be strong named, or build our own copy.

Also this can be

<NoWarn>$(NoWarn);CS8002</NoWarn>

And placed in the main PropertyGroup @ line 7. #Resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is fixed with my changes #4


In reply to: 209299533 [](ancestors = 209299533)

/// <summary>
/// Tensorflow session object
/// </summary>
private readonly TFSession _session;
Copy link

@Zruty0 Zruty0 Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_session [](start = 35, length = 8)

Is TFSession thread-safe? #Closed

Copy link
Owner Author

@zeahmed zeahmed Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFSeesion and Graph seems to be threadsafe. I will look into Run method in python tensorflow code to see how they are ensuring thread safety.


In reply to: 209305362 [](ancestors = 209305362)


return (ref VBuffer<float> dst) =>
{
var runner = _session.GetRunner();
Copy link

@Zruty0 Zruty0 Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runner [](start = 24, length = 6)

Can you have multiple runners at the same time from the same session?

It feels to me that the runner should be external to the getter delegate, and reused between calls. If this can not be done, explain why. #Resolved

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we can not have single runner mostly because of implementation of Runner which currently does not allow to set the input value once it has been set (using AddInput method). This is the limitation of TensorflowSharp.


In reply to: 209305767 [](ancestors = 209305767)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will still need to think of what functionality can be extracted out of the delegate.
For example, input.Schema.GetColumnName(info.SrcIndices[i]) is not going to change from call to call


In reply to: 209413407 [](ancestors = 209413407,209305767)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can and should have multiple Runners, the Runner isn't really a heavy object, it's just a holder for the state of a run. As such it doesn't make much sense to try to share it between runs.


In reply to: 209663760 [](ancestors = 209663760,209413407,209305767)


public override void Save(ModelSaveContext ctx)
{
throw new NotImplementedException();
Copy link

@Zruty0 Zruty0 Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw [](start = 12, length = 5)

Have you given some thought on how we are planning to save/load such transforms? #Closed

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it. I haven't thought about it yet. Maybe I will look into CNTK implementation in TLC to see what they are doing.


In reply to: 209306191 [](ancestors = 209306191)

var status = new TFStatus();

return new TFSession().FromSavedModel(
sessionOptions,
Copy link

@Zruty0 Zruty0 Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to remove the TF model files after the session has read them? Or does the session defer reading till some later time? #Closed

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, session loads the file immediately upon calling this method.


In reply to: 209306586 [](ancestors = 209306586)

b = new[] { 3.0f, 3.0f,
3.0f, 3.0f } } }));

var trans = new TensorflowTransform(env, loader, model_location, "c", "a", "b");
Copy link

@ericstj ericstj Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope that we don't need to explicitly declare this but can instead read it from model. We should try to auto-map the IDV column names to TF input names. Only when we can't map them should we require some user input. #Pending

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this will completely preclude any compile-time validation.


In reply to: 209317833 [](ancestors = 209317833)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how listing strings without type information provides compile time validation. Sure if the pigsty tuples are used then I'd expect those to show up and just flow through the pipeline and work assuming remappings don't need to happen. I'd only expect a caller to explicitly list column names here if they need to change them from tuples that have already been defined in the data.


In reply to: 209324258 [](ancestors = 209324258,209317833)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is, if you make transform work with no parameters whatsoever, and ONLY transform the columns it finds in the model, then Pigsty has nothing to 'attach to'.


In reply to: 209329418 [](ancestors = 209329418,209324258,209317833)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think you're right, one doesn't preclude the other.


In reply to: 209374964 [](ancestors = 209374964,209329418,209324258,209317833)

Copy link
Owner Author

@zeahmed zeahmed Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric, it could be but its not straight forward. Because there could be many variables in the tensorflow model and its difficult to identify which ones are inputs and need values to be fed.

I will investigate it further if its possible. Because when saving model in tensorflow for serving, it ask for input and ouput. Let see if TFSharp has any convience to get it.


In reply to: 209375203 [](ancestors = 209375203,209374964,209329418,209324258,209317833)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonoma does this FYI, you can look at how they did it.


In reply to: 209413718 [](ancestors = 209413718,209375203,209374964,209329418,209324258,209317833)

var result = new float[size];
Marshal.Copy(data, result, 0, size);
return result;
}
Copy link
Owner Author

@zeahmed zeahmed Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying output buffer from TFTensor to managed memory. #Resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK as a temporary hack, put in a big comment here (or some asserts) about how this only works for row-major float tensors.


In reply to: 209413803 [](ancestors = 209413803)

using TensorFlow;

[assembly: LoadableClass(TensorflowTransform.Summary, typeof(TensorflowTransform), typeof(TensorflowTransform.Arguments), typeof(TensorflowTransform),
TensorflowTransform.UserName, "CopyColumns", "CopyColumnsTransform", TensorflowTransform.ShortName, DocName = "transform/CopyColumnsTransform.md")]
Copy link

@Zruty0 Zruty0 Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyColumns [](start = 35, length = 11)

by the way, you need to fix the attribute #Resolved


namespace Microsoft.ML.Transforms
{
public sealed class TensorflowTransform : RowToRowMapperTransformBase, IDisposable
Copy link

@Zruty0 Zruty0 Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDisposable [](start = 75, length = 11)

That is incorrect. Our transforms should never be disposable. #Resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at DnnImageFeaturizerTransform and it just lets GC finalize disposable CNTK.Function. Also checked CntkPredictorBase and it does the same for its public Function Model property.


In reply to: 209664938 [](ancestors = 209664938)

@Zruty0
Copy link

Zruty0 commented Aug 13, 2018

@YaelD is added to the review. #Closed

@Zruty0
Copy link

Zruty0 commented Aug 13, 2018

@yaeldekel is added to the review. #Closed

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253, 190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190, 253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221, 253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
};
Copy link
Owner Author

@zeahmed zeahmed Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialized the TF model in ML.NET which enables us to use the pipeline for scoring and prediction now...:blush: #Resolved

{
dims[k] = (int)(shape[k] == -1 ? batchSize : shape[k]);
}
// Output tensor is expected to be of shape [Batch, Data]
Copy link
Owner Author

@zeahmed zeahmed Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Output tensor is expected to be of shape [Batch, Data] [](start = 16, length = 57)

Not any more. I will remove the comment. #Resolved

/// <summary>
/// Tensorflow session object
/// </summary>
private TFSession _session;
Copy link
Owner Author

@zeahmed zeahmed Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_session [](start = 26, length = 8)

Need to find out a way to dispose this safely. #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implementing IDisposable, you can dispose of this resource in the finalizer of the transform.


In reply to: 210042898 [](ancestors = 210042898)

ctx.SaveBinaryStream("TFModel", w =>
{
w.WriteByteArray(buffer.ToArray());
});
Copy link
Owner Author

@zeahmed zeahmed Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for success here. #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToGraphDef might throw exceptions, but otherwise, could this not succeed?


In reply to: 210043097 [](ancestors = 210043097)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me comment shows at different line number after the several commits. Originally, I commented while loading at following code.

byte[] data = null;
ctx.TryLoadBinaryStream("TFModel", r =>
{
       data = r.ReadByteArray();
});

In reply to: 210651836 [](ancestors = 210651836,210043097)


if (tfType == TFDataType.Int32)
{
return CreateTensorValueGetter<DvInt4>(input, type, colIndex, tfShape);
Copy link
Owner Author

@zeahmed zeahmed Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DvInt4 [](start = 47, length = 6)

ML.Net types cannot be passed to TF right now. #Pending

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is work going on on removing ML.Net types and using pure .net types. That will resolve this issue.


In reply to: 210172882 [](ancestors = 210172882)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this work is not done yet, we should either not support these types, or use the ML.NET types and convert to .net types before passing to TF. this method gets the getters of the inputs, and will throw exception if called with Int64, bool or string.


In reply to: 210456041 [](ancestors = 210456041,210172882)

{
var scalar = default(T);
_srcgetter(ref scalar);
return (TFTensor)_ctor.Invoke(null, new object[] { scalar });
Copy link
Owner Author

@zeahmed zeahmed Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(TFTensor)_ctor.Invoke(null, new object[] { scalar }); [](start = 23, length = 54)

This reflection call is not intended here. We need a method in TFSharp where we can pass in generic data to TF. Currently, there are methods in TFSharp to do that but those are not exposed. I and @eric will work on that. #Resolved

_srcgetter(ref tmpBuf);
var dense = default(VBuffer<T>);
tmpBuf.CopyToDense(ref dense);
return (TFTensor)_method.Invoke(null, new object[] { _tfShape, dense.Values, 0, dense.Length });
Copy link
Owner Author

@zeahmed zeahmed Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (TFTensor)_method.Invoke(null, new object[] { _tfShape, dense.Values, 0, dense.Length }); [](start = 16, length = 96)

This reflection call is not intended here. We need a method in TFSharp where we can pass in generic data to TF. Currently, there are methods in TFSharp to do that but those are not exposed. I and @eric will work on that. #Resolved

This is just a stub to begin iterating on.

To follow:
1. Pull over a copy of TF# source
2. Tree-shake unused code.
3. Rewrite / Rename TF# API as appropriate.
4. Come up with a strategy for providing TF bins (using TF#'s copy now, merely as a starting point).
I've fixed some minor rule issues and disabled the naming analyzers since those churn the source too much.
For now it is BYOTF (bring your own TensorFlow) so the tests are grabbing the TF bits for TF#.

if (tfType == TFDataType.Int64)
{
return Utils.MarshalInvoke(MakeGetter<Int64>, outInfo.ItemType.RawType, input, iinfo);
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Int64 [](start = 54, length = 5)

Do you have any tests testing this (and int, bool and string)? #Pending

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet because ML.Net does not support dotnet types it uses DvInt etc. Once ZeeshanS changes are there I will be able to test.


In reply to: 210684078 [](ancestors = 210684078)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know when Zeeshan's changes will be done, but for now I don't think this would work. If you don't want to handle these types using DvInt,DvBool etc., then I think we should not support them until the type system changes go in.


In reply to: 210718189 [](ancestors = 210718189,210684078)

{
var inputName = tfInfo.InputColNames[i];
var type = info.SrcTypes[i];
runner.AddInput(inputName, srcTensorGetters[i].GetTensor());
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTensor [](start = 71, length = 9)

Is there a way to make this method not allocate a new array on every call? #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to keep track of the allocations and dispose of them by overriding the RowCursor's Dispose() method (IRowCursors are IDisposable).


In reply to: 210685450 [](ancestors = 210685450)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wont allocate new array. It takes the handle of the .net array and pass it on to TF.


In reply to: 210688784 [](ancestors = 210688784,210685450)

runner.AddInput(inputName, srcTensorGetters[i].GetTensor());
}

var tensors = runner.Fetch(_bindings.OutputColNames[iinfo]).Run();
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run [](start = 80, length = 3)

Same question for the outputs. #Resolved

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For output, its unmanage array that is returned and we make a copy of it after this line.


In reply to: 210685620 [](ancestors = 210685620)


Contracts.Assert(tensors.Length > 0);

var output = FetchData<T>(tensors[0].Data, _bindings.OutputCols[iinfo].VectorSize);
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FetchData [](start = 33, length = 9)

FetchData should use the values array of dst instead of creating a new one (you can check to see the common pattern in other transforms: if the length of dst.Values is not sufficient, we create a new array.) #Resolved

return result;
}

private Delegate MakeGetter(IRow input, int iinfo)
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeGetter [](start = 25, length = 10)

Same comment as for CreateTensorValueGetterVec. #Resolved

Copy link
Owner Author

@zeahmed zeahmed Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


In reply to: 210687297 [](ancestors = 210687297)

runner.AddInput(inputName, srcTensorGetters[i].GetTensor());
}

var tensors = runner.Fetch(_bindings.OutputColNames[iinfo]).Run();
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensors [](start = 24, length = 7)

Does TF know how to output values from multiple layers in one call? If this is the case, we should not make this transform ManyToOne, it should be many-to-many. #Pending

Copy link
Owner Author

@zeahmed zeahmed Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can get multiple outputs in one call. I am not sure right now how to do it in ML.Net. This is the next thing I am going to focus on.


In reply to: 210687722 [](ancestors = 210687722)

{
if (!activeInfos(iinfo))
continue;
getters[iinfo] = MakeGetter(input, iinfo);
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeGetter [](start = 37, length = 10)

Add an out parameter of a disposer to MakeGetter, this way it can be used here, and by the RowCursor. #Resolved


namespace Microsoft.ML.Transforms.TensorFlow
{
/// <summary>
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 0, length = 1)

Change the tabs to spaces. (in the other files from TFSharp as well). #Resolved


private static DataKind Tf2MlNetType(TFDataType type)
{
if(type == TFDataType.Float)
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [](start = 16, length = 2)

Can this be changed to a switch statement? #Resolved


if (tfType == TFDataType.String)
{
return Utils.MarshalInvoke(MakeGetter<string>, outInfo.ItemType.RawType, input, iinfo);
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string [](start = 54, length = 6)

This should be DvText. #Resolved


if (tfType == TFDataType.String)
{
return CreateTensorValueGetter<string>(input, type, colIndex, tfShape);
Copy link

@yaeldekel yaeldekel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string [](start = 47, length = 6)

Text will continue to be handled using DvText even after we move to .net types. #Resolved

/// <summary>
/// Creates a tensor representing type T[].
/// T[] will be pinned and wrapped in a tensor.
/// This method does not support jagged arrays since those require copies.
Copy link

@ericstj ericstj Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not support jagged arrays since those require copies. [](start = 12, length = 70)

This comment is no longer relevant since you only accept single dimensional arrays. #Resolved

/// </summary>
/// <typeparam name="T">.NET type of tensor to create</typeparam>
/// <param name="data">value of tensor</param>
public static TFTensor Create<T>(T data)
Copy link

@ericstj ericstj Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create [](start = 31, length = 6)

Consider changing this to createscalar. #Resolved


internal static TFDataType MlNet2TfType(Type type)
{
if (type == typeof(float))
Copy link

@ericstj ericstj Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( [](start = 30, length = 1)

This could be code-gen'ed by the TT file. #Resolved

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that too. Let me see...:)


In reply to: 210747481 [](ancestors = 210747481)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is already a method in TFSharp. I copied it.


In reply to: 210747970 [](ancestors = 210747970,210747481)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still plan on deleting all this TF# code and code-gen'ing it. It's all boiler-plate copy/paste that can be driven off the type mapping.


In reply to: 210758078 [](ancestors = 210758078,210747970,210747481)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!


In reply to: 210760965 [](ancestors = 210760965,210758078,210747970,210747481)

@@ -103,8 +103,7 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.CpuMath.UnitTe
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.CpuMath.UnitTests.netcoreapp", "test\Microsoft.ML.CpuMath.UnitTests.netcoreapp\Microsoft.ML.CpuMath.UnitTests.netcoreapp.csproj", "{5F81A2A4-73AD-494C-B387-07D605EC8826}"
EndProject

Project("{F2A71F9B-5D33-465A-A702-920D77279786}") = "Microsoft.ML.FSharp.Tests", "test\Microsoft.ML.FSharp.Tests\Microsoft.ML.FSharp.Tests.fsproj", "{802233D6-8CC0-46AD-9F23-FEE1E9AED9B3}"
Project("{6EC3EE1D-3C4E-46DD-8F32-0CC8E7565705}") = "Microsoft.ML.FSharp.Tests", "test\Microsoft.ML.FSharp.Tests\Microsoft.ML.FSharp.Tests.fsproj", "{802233D6-8CC0-46AD-9F23-FEE1E9AED9B3}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6EC3EE1D-3C4E-46DD-8F32-0CC8E7565705 [](start = 10, length = 36)

Why did this change?

namespace Microsoft.ML.Transforms.TensorFlow
{

/// <summary>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 0, length = 1)

Convert tabs to spaces.


namespace Microsoft.ML.Transforms.TensorFlow
{
internal static partial class NativeBinding

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 0, length = 1)

Convert tabs to spaces

_bindings.Save(ctx);
}

public void DisposeTFSession()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisposeTFSession [](start = 20, length = 16)

I think this method can be removed.

{
public sealed class TensorflowTransform : RowToRowMapperTransformBase
{
public sealed class Column : ManyToOneColumn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManyToOneColumn [](start = 37, length = 15)

Consider not using this base class (or the ManyToOneColumnBindingsBase class). These are intended for applying the transform simultaneously to multiple sets of input columns. It is advantageous in case the transform requires training (such as ngram transform) because defining multiple transforms would require multiple passes over the data.

}
}

public sealed class Arguments : TransformInputBase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments [](start = 28, length = 9)

I would change the structure of these arguments to be an array of strings for the inputs, and a string for the output.

Later we may want to change the arguments to have a mapping from the IDV column names to the model input names, and/or from the model output names to IDV output column names.

public readonly ColumnType[] OutputCols;
public readonly TFDataType[] OutputTFTypes;

public Bindings(Column[] columns, ISchema schemaInput, TensorflowTransform parent)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bindings [](start = 19, length = 8)

Change this class to have just one private ctor, and two public ctors/create methods that call the private one. This way if we make any changes to this class we won't need to remember to make them in two places.

_session.DeleteSession();
}

private ValueGetter<T> GetSrcGetter<T>(IRow input, int iinfo, int isrc)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSrcGetter [](start = 31, length = 12)

Is this used anywhere?

private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, ColumnType columnType, int colIndex, TFShape tfShape)
{
var type = TFTensor.TypeFromTensorType(tfType);
if (type != null)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type [](start = 16, length = 4)

We should be able to assert here that type != null.

return new TensorValueGetter<T>(input, colIndex);
}

private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, ColumnType columnType, int colIndex, TFShape tfShape)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vec [](start = 58, length = 3)

Can the "Vec" suffix be removed? I don't see a method that is for "NotVec".

Contracts.Assert(tensors.Length > 0);

var values = dst.Values;
if (Utils.Size(values) != _bindings.OutputCols[iinfo].VectorSize)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= [](start = 43, length = 2)

Doesn't need to be exactly equal. < is enough here.

values = new T[_bindings.OutputCols[iinfo].VectorSize];

TensorflowUtils.FetchData<T>(tensors[0].Data, values);
dst = new VBuffer<T>(values.Length, values);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VBuffer [](start = 30, length = 7)

Pass dst.Indices to the new VBuffer as well.

zeahmed pushed a commit that referenced this pull request Jan 3, 2019
* Added placeholder

* Cleaned up Infos (replaced with ColumnPairs)

* Added ColumnInfo

* Added all the Create() methods.

* Added Mapper

* Commented out the EntryPoint

* Added PcaEstimator2

* PcaWorkout test passes

* Added pigsty api

* Fixed EntryPoint

* Fixed the arguments

* Fixed tests and added pigsty test

* Deleted Wrapped PCA transform

* Float -> float

* Cleaned docstrings

* Removed some unnecessary checks

* Simplified unnecessary code

* Moved some fields to ColumnInfo for simplifications

* Simplified weight columns

* Address PR comments #1

* Addressed PR comments #2

* Moved the static test

* PR comments #3

* Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo.

* PR comments

* PR comments

* Updated manifest for entrypoint PcaCalculator

* Fixed schema exceptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants