Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for reading pretrained graph definition files #3

Merged
merged 1 commit into from
May 15, 2018

Conversation

anshuman23
Copy link
Owner

This PR adds support for reading a pretrained Graph into Tensorflex. This has been made possibly with the porting and addition of three important functions (Creating a new empty graph is also required but that was already added long back as Tensorflex.new_graph):

  • Tensorflex.new_import_graph_def_opts: This is a wrapper around the Tensorflow function TF_NewImportGraphDefOptions() and is needed when a pretrained graph is required to be read in to Tensorflow/Tensorflex
  • Tensorflex.read_file('path_to_graphdef_file'): This function is not actually present directly in Tensorflow but is written to allow reading in the pretrained graph definition protobuf file as a Tensorflow TF_Buffer datatype
  • Tensorflex.graph_import_graph_def(graph, graph_def, graph_opts): This function takes in a new empty graph, the graph_def obtained using read_file and the graph_opts which were obtained using the first function mentioned here

For clarity an example has also been added to the README which reads in the Inception graph definition created by Google.

@anshuman23 anshuman23 merged commit 1355198 into master May 15, 2018
@josevalim
Copy link
Contributor

This is great @anshuman23! Can we run something very simple against the graph? I want us to have a minimal story such as this: load the graph and run a query against it and then we will deliver this feature from beginning to end. This means also docs and tests.

static ERL_NIF_TERM graph_import_graph_def(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[])
{
TF_Graph **graph;
enif_get_resource(env, argv[0], graph_resource, (void *) &graph);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My C is limited but I think we don't want to rely on such resources because they are global? This means I can't import two graphs at the same time? Shouldn't we build those resources only when necessary?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will be making changes to only allocating resources when they are needed. This actually did cross my mind some time back, but then I forgot about it.

return 1;
}
else {
fprintf(stderr, "Successfully imported graph\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return atoms here. Something like :ok if the import works, {:error, reason_string} otherwise.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! This makes more sense. I will modify this as well.

memcpy((void *) graph_opts_resource_alloc, (void *) &new_graph_opts, sizeof(TF_ImportGraphDefOptions *));
ERL_NIF_TERM graph_opts = enif_make_resource(env, graph_opts_resource_alloc);
enif_release_resource(graph_opts_resource_alloc);
return graph_opts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that you don't want to return a resource that you have just released because it doesn't point to anything after this moment.

What you want to do is to call enif_make_resource and return that. The result of enif_make_resource will be passed to another NIF that will call enif_get_resource to get the pointer. Once you are done you deallocate it.

You are being able to get away without doing this because you are relying on graph_opts_resource as a global to pass data between C functions, which means it won't work concurrently. You need to treat those C functions as "pure" as much possible. They need to return all the data that is necessary to be used later.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the issue. I'm going to make the required changes.

@josevalim
Copy link
Contributor

I have added some comments. We should avoid globals in C, we need to get all of the data from NIFs as resources, pass then to new functions and then release them. basically, we should not rely on the loaders feature of NIFs at all.

Let's give this a try and make this API great before adding new features. :)

PS: I forgot to watch the project, so I wasn't getting any notification. But now I am watching it so I should be able to review everything in time.

@anshuman23
Copy link
Owner Author

Understood! I'll be working on the comments first and once those are done I will resort to working on the 'story' aspect you had mentioned after loading the graph. Thank you for all the inputs so far. I will ask if I come across any problems. :)

@josevalim
Copy link
Contributor

Right, think about it as a file resource. You need to open the file and then you need to make sure to close it.

The other thing we should discuss is if it makes sense to expose a function like new_import_graph_def_opts. Can we change those options using the tensorflow API in any way? Because if not, all we are doing is to send a reference to the Elixir code that we cannot change in any way, so I would rather streamline this into a read_graph API that calls a couple functions in tensorflow directly.

PS: I know we have said that initially we want to provide low-level bindings for all C functions but maybe we should skip some of those if there is nothing we can do with its output besides passing it to another function.

@anshuman23
Copy link
Owner Author

That is actually what I had in mind initially when I had thought of sweeping over some of the low level Tensorflow functionality. As far as I have used/seen TF_NewImportGraphDefOptions isn't usually used in any other way than I have currently used it. Would make sense to hide it under a read_graph function that runs in C. In any case, if someone does need to use it, we can expose it later on if required.

There are many other functions such as this, especially ones that require a TF_Status type to be passed into it. For the time being, I think we can do with having them remain in C and not exposing them to Elixir. They can be exposed later on as well.

So just to make sure we're on the same page, I should not expose functions that do not do any explicit work from the user end in Elixir right? I can start writing just a read_graph function then

@josevalim
Copy link
Contributor

So just to make sure we're on the same page, I should not expose functions that do not do any explicit work from the user end in Elixir right? I can start writing just a read_graph function then

We are on the same page.

We can have this guideline: if all a NIF returns is a reference that can only be exclusively passed to another function that will use and deallocate that reference, then we don't need that NIF.

Also, generally speaking we want to avoid references. As we want to avoid mutable structures in an immutable language. We may need to use in some places though, such as a tensorflow session (I am guessing).

@anshuman23
Copy link
Owner Author

Got it! This makes a lot of sense. Thanks for the help :)

I have one more question which could be rather silly. So the way I was 'globally' defining the resource types and allocating them is the only way I have seen resources in NIF based projects being used so far. I am a little unsure of how I would declare a resource type 'locally' inside a function without any help from the NIF loader which will link it to it's destructor. Upon looking at the erl_nif docs, the function which links a resource to it's destructor: enif_open_resource_type() can only be called from the NIF loader function. How would I be able to do this locally inside any other function then?

If there are any examples/projects that might shed some light or some documentation you could point me to, it would help a lot. Sorry for the inconvenience!

@josevalim
Copy link
Contributor

@anshuman23 you are correct. You need to define the resource type in the loader. As long as the allocation is not global we should be good. It seems the convention in C is to use SCREAMING_CASE and mark those as static, maybe we should do the same?

@anshuman23
Copy link
Owner Author

anshuman23 commented May 16, 2018

As long as the allocation is not global we should be good.

So then the way I was writing the NIFs before was fine then, right? If it is fine, I'll move to writing the read_graph function as discussed and then writing some 'actions' we could do with the loaded graph to showcase the uses

It seems the convention in C is to use SCREAMING_CASE and mark those as static, maybe we should do the same?

I was unaware of this convention. I'll make the required changes

@josevalim
Copy link
Contributor

So then the way I was writing the NIFs before was fine then, right?

Almost I think. The places where you return a released resource looks suspicious to me.

https://github.com/anshuman23/tensorflex/pull/3/files#diff-b4f59c9c67fc463dd1c8989a24fae9beR103

You need to keep the resources around and release it after use. But that leads to a very unnatural way of programming in Elixir (where we need to track resources). And that brings us to the earlier point: we should avoid them as much as we can.

@anshuman23
Copy link
Owner Author

@josevalim I think that enif_release_resource() shouldn't be a problem. According to the erl_nif docs, after the destructor has been called by enif_release_resource() for the allocated resource the 'term' being returned will still persist because that corresponds to enif_make_resource(). And that can only be removed by the garbage collector. So I think we are fine.

But I understand your concerns and I'll write functions according to the guideline we decided. I'll submit the next PR with read_graph and some other changes.

Thanks!

@josevalim
Copy link
Contributor

@josevalim I think that enif_release_resource() shouldn't be a problem. According to the erl_nif docs, after the destructor has been called by enif_release_resource() for the allocated resource the 'term' being returned will still persist because that corresponds to enif_make_resource(). And that can only be removed by the garbage collector. So I think we are fine.

You are 100% correct. The release is about the C side of things, then it means it is up to the garbage collector. Apologies for the confusion. I would say this was indeed correct after all.

It seems the convention in C is to use SCREAMING_CASE and mark those as static, maybe we should do the same?

The SCREAMING_CASE convention is actually about const/macros.

@anshuman23
Copy link
Owner Author

anshuman23 commented May 17, 2018

You are 100% correct. The release is about the C side of things, then it means it is up to the garbage collector. Apologies for the confusion. I would say this was indeed correct after all.

Considering this, do you think it would be prudent to have these low level functions available too for users along with high level functions like read_graph or should we now go ahead and decide to only have these high-level functions? If someone wants to play around with the low-level variants of the tensorflow functions they would have the freedom to use just their direct Elixir variants, while if someone wants just ease of functionality (like in Keras), the high level functions would be their go-to choice.

I will move forward with implementation according to whatever we decide here.

@josevalim
Copy link
Contributor

@anshuman23 I think the question is: can we do anything with the C API on the resource returned by new_import_graph_def_opts? Because if not... then I don't see reason to expose it. In this case it is really a support function.

@anshuman23
Copy link
Owner Author

Yes, that makes sense. Alright, I'll move towards the improvements in the next PR. Thanks again for taking the time out to help! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants