Added support for reading pretrained graph definition files #3

anshuman23 · 2018-05-15T19:08:30Z

This PR adds support for reading a pretrained Graph into Tensorflex. This has been made possibly with the porting and addition of three important functions (Creating a new empty graph is also required but that was already added long back as Tensorflex.new_graph):

Tensorflex.new_import_graph_def_opts: This is a wrapper around the Tensorflow function TF_NewImportGraphDefOptions() and is needed when a pretrained graph is required to be read in to Tensorflow/Tensorflex
Tensorflex.read_file('path_to_graphdef_file'): This function is not actually present directly in Tensorflow but is written to allow reading in the pretrained graph definition protobuf file as a Tensorflow TF_Buffer datatype
Tensorflex.graph_import_graph_def(graph, graph_def, graph_opts): This function takes in a new empty graph, the graph_def obtained using read_file and the graph_opts which were obtained using the first function mentioned here

For clarity an example has also been added to the README which reads in the Inception graph definition created by Google.

josevalim · 2018-05-15T22:29:07Z

This is great @anshuman23! Can we run something very simple against the graph? I want us to have a minimal story such as this: load the graph and run a query against it and then we will deliver this feature from beginning to end. This means also docs and tests.

josevalim · 2018-05-15T22:31:30Z

c_src/Tensorflex.c

+static ERL_NIF_TERM graph_import_graph_def(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[])
+{
+  TF_Graph **graph;
+  enif_get_resource(env, argv[0], graph_resource, (void *) &graph);


My C is limited but I think we don't want to rely on such resources because they are global? This means I can't import two graphs at the same time? Shouldn't we build those resources only when necessary?

You are right. I will be making changes to only allocating resources when they are needed. This actually did cross my mind some time back, but then I forgot about it.

josevalim · 2018-05-15T22:32:06Z

c_src/Tensorflex.c

+    return 1;
+  }
+  else {
+    fprintf(stderr, "Successfully imported graph\n");


Let's return atoms here. Something like :ok if the import works, {:error, reason_string} otherwise.

Alright! This makes more sense. I will modify this as well.

josevalim · 2018-05-15T22:38:12Z

c_src/Tensorflex.c

+  memcpy((void *) graph_opts_resource_alloc, (void *) &new_graph_opts, sizeof(TF_ImportGraphDefOptions *));
+  ERL_NIF_TERM graph_opts = enif_make_resource(env, graph_opts_resource_alloc);
+  enif_release_resource(graph_opts_resource_alloc);
+  return graph_opts;


My understanding is that you don't want to return a resource that you have just released because it doesn't point to anything after this moment.

What you want to do is to call enif_make_resource and return that. The result of enif_make_resource will be passed to another NIF that will call enif_get_resource to get the pointer. Once you are done you deallocate it.

You are being able to get away without doing this because you are relying on graph_opts_resource as a global to pass data between C functions, which means it won't work concurrently. You need to treat those C functions as "pure" as much possible. They need to return all the data that is necessary to be used later.

I understand the issue. I'm going to make the required changes.

josevalim · 2018-05-15T22:45:10Z

I have added some comments. We should avoid globals in C, we need to get all of the data from NIFs as resources, pass then to new functions and then release them. basically, we should not rely on the loaders feature of NIFs at all.

Let's give this a try and make this API great before adding new features. :)

PS: I forgot to watch the project, so I wasn't getting any notification. But now I am watching it so I should be able to review everything in time.

anshuman23 · 2018-05-16T14:31:32Z

Understood! I'll be working on the comments first and once those are done I will resort to working on the 'story' aspect you had mentioned after loading the graph. Thank you for all the inputs so far. I will ask if I come across any problems. :)

josevalim · 2018-05-16T14:58:12Z

Right, think about it as a file resource. You need to open the file and then you need to make sure to close it.

The other thing we should discuss is if it makes sense to expose a function like new_import_graph_def_opts. Can we change those options using the tensorflow API in any way? Because if not, all we are doing is to send a reference to the Elixir code that we cannot change in any way, so I would rather streamline this into a read_graph API that calls a couple functions in tensorflow directly.

PS: I know we have said that initially we want to provide low-level bindings for all C functions but maybe we should skip some of those if there is nothing we can do with its output besides passing it to another function.

anshuman23 · 2018-05-16T16:37:59Z

That is actually what I had in mind initially when I had thought of sweeping over some of the low level Tensorflow functionality. As far as I have used/seen TF_NewImportGraphDefOptions isn't usually used in any other way than I have currently used it. Would make sense to hide it under a read_graph function that runs in C. In any case, if someone does need to use it, we can expose it later on if required.

There are many other functions such as this, especially ones that require a TF_Status type to be passed into it. For the time being, I think we can do with having them remain in C and not exposing them to Elixir. They can be exposed later on as well.

So just to make sure we're on the same page, I should not expose functions that do not do any explicit work from the user end in Elixir right? I can start writing just a read_graph function then

josevalim · 2018-05-16T16:52:22Z

So just to make sure we're on the same page, I should not expose functions that do not do any explicit work from the user end in Elixir right? I can start writing just a read_graph function then

We are on the same page.

We can have this guideline: if all a NIF returns is a reference that can only be exclusively passed to another function that will use and deallocate that reference, then we don't need that NIF.

Also, generally speaking we want to avoid references. As we want to avoid mutable structures in an immutable language. We may need to use in some places though, such as a tensorflow session (I am guessing).

anshuman23 · 2018-05-16T17:08:52Z

Got it! This makes a lot of sense. Thanks for the help :)

I have one more question which could be rather silly. So the way I was 'globally' defining the resource types and allocating them is the only way I have seen resources in NIF based projects being used so far. I am a little unsure of how I would declare a resource type 'locally' inside a function without any help from the NIF loader which will link it to it's destructor. Upon looking at the erl_nif docs, the function which links a resource to it's destructor: enif_open_resource_type() can only be called from the NIF loader function. How would I be able to do this locally inside any other function then?

If there are any examples/projects that might shed some light or some documentation you could point me to, it would help a lot. Sorry for the inconvenience!

josevalim · 2018-05-16T18:09:41Z

@anshuman23 you are correct. You need to define the resource type in the loader. As long as the allocation is not global we should be good. It seems the convention in C is to use SCREAMING_CASE and mark those as static, maybe we should do the same?

anshuman23 · 2018-05-16T18:46:52Z

As long as the allocation is not global we should be good.

So then the way I was writing the NIFs before was fine then, right? If it is fine, I'll move to writing the read_graph function as discussed and then writing some 'actions' we could do with the loaded graph to showcase the uses

It seems the convention in C is to use SCREAMING_CASE and mark those as static, maybe we should do the same?

I was unaware of this convention. I'll make the required changes

josevalim · 2018-05-16T19:23:19Z

So then the way I was writing the NIFs before was fine then, right?

Almost I think. The places where you return a released resource looks suspicious to me.

https://github.com/anshuman23/tensorflex/pull/3/files#diff-b4f59c9c67fc463dd1c8989a24fae9beR103

You need to keep the resources around and release it after use. But that leads to a very unnatural way of programming in Elixir (where we need to track resources). And that brings us to the earlier point: we should avoid them as much as we can.

anshuman23 · 2018-05-17T04:24:47Z

@josevalim I think that enif_release_resource() shouldn't be a problem. According to the erl_nif docs, after the destructor has been called by enif_release_resource() for the allocated resource the 'term' being returned will still persist because that corresponds to enif_make_resource(). And that can only be removed by the garbage collector. So I think we are fine.

But I understand your concerns and I'll write functions according to the guideline we decided. I'll submit the next PR with read_graph and some other changes.

Thanks!

josevalim · 2018-05-17T06:21:07Z

@josevalim I think that enif_release_resource() shouldn't be a problem. According to the erl_nif docs, after the destructor has been called by enif_release_resource() for the allocated resource the 'term' being returned will still persist because that corresponds to enif_make_resource(). And that can only be removed by the garbage collector. So I think we are fine.

You are 100% correct. The release is about the C side of things, then it means it is up to the garbage collector. Apologies for the confusion. I would say this was indeed correct after all.

It seems the convention in C is to use SCREAMING_CASE and mark those as static, maybe we should do the same?

The SCREAMING_CASE convention is actually about const/macros.

anshuman23 · 2018-05-17T11:22:39Z

You are 100% correct. The release is about the C side of things, then it means it is up to the garbage collector. Apologies for the confusion. I would say this was indeed correct after all.

Considering this, do you think it would be prudent to have these low level functions available too for users along with high level functions like read_graph or should we now go ahead and decide to only have these high-level functions? If someone wants to play around with the low-level variants of the tensorflow functions they would have the freedom to use just their direct Elixir variants, while if someone wants just ease of functionality (like in Keras), the high level functions would be their go-to choice.

I will move forward with implementation according to whatever we decide here.

josevalim · 2018-05-17T11:42:10Z

@anshuman23 I think the question is: can we do anything with the C API on the resource returned by new_import_graph_def_opts? Because if not... then I don't see reason to expose it. In this case it is really a support function.

anshuman23 · 2018-05-17T11:51:49Z

Yes, that makes sense. Alright, I'll move towards the improvements in the next PR. Thanks again for taking the time out to help! :)

Added support for reading pretrained graph buffers

56f4900

anshuman23 merged commit 1355198 into master May 15, 2018

josevalim reviewed May 15, 2018

View reviewed changes

anshuman23 mentioned this pull request May 17, 2018

Merged all support functions into read_graph; returning atoms #4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for reading pretrained graph definition files #3

Added support for reading pretrained graph definition files #3

anshuman23 commented May 15, 2018

josevalim commented May 15, 2018

josevalim May 15, 2018

anshuman23 May 16, 2018

josevalim May 15, 2018

anshuman23 May 16, 2018

josevalim May 15, 2018

anshuman23 May 16, 2018

josevalim commented May 15, 2018

anshuman23 commented May 16, 2018

josevalim commented May 16, 2018

anshuman23 commented May 16, 2018

josevalim commented May 16, 2018

anshuman23 commented May 16, 2018

josevalim commented May 16, 2018

anshuman23 commented May 16, 2018 •

edited

Loading

josevalim commented May 16, 2018

anshuman23 commented May 17, 2018

josevalim commented May 17, 2018

anshuman23 commented May 17, 2018 •

edited

Loading

josevalim commented May 17, 2018

anshuman23 commented May 17, 2018

Added support for reading pretrained graph definition files #3

Added support for reading pretrained graph definition files #3

Conversation

anshuman23 commented May 15, 2018

josevalim commented May 15, 2018

josevalim May 15, 2018

Choose a reason for hiding this comment

anshuman23 May 16, 2018

Choose a reason for hiding this comment

josevalim May 15, 2018

Choose a reason for hiding this comment

anshuman23 May 16, 2018

Choose a reason for hiding this comment

josevalim May 15, 2018

Choose a reason for hiding this comment

anshuman23 May 16, 2018

Choose a reason for hiding this comment

josevalim commented May 15, 2018

anshuman23 commented May 16, 2018

josevalim commented May 16, 2018

anshuman23 commented May 16, 2018

josevalim commented May 16, 2018

anshuman23 commented May 16, 2018

josevalim commented May 16, 2018

anshuman23 commented May 16, 2018 • edited Loading

josevalim commented May 16, 2018

anshuman23 commented May 17, 2018

josevalim commented May 17, 2018

anshuman23 commented May 17, 2018 • edited Loading

josevalim commented May 17, 2018

anshuman23 commented May 17, 2018

anshuman23 commented May 16, 2018 •

edited

Loading

anshuman23 commented May 17, 2018 •

edited

Loading