Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backends prototyping #35

Closed
wants to merge 47 commits into from
Closed

Backends prototyping #35

wants to merge 47 commits into from

Conversation

nietras
Copy link
Collaborator

@nietras nietras commented Oct 24, 2017

Creating a PR for the backends stuff to more easily see the changes.

cc: @mdabros

mada@ihfood.dk added 5 commits October 23, 2017 20:39
…project has to use the old project format because of how the CNTK nuget package is made. When the package is fixed, the project can be migrated to the new format and .net standard 2.0.
Copy link
Collaborator Author

@nietras nietras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Backends is perhaps primarily for doing learning/training, how can we separate/handle evaluation only scenarios with minimal dependencies?

</Reference>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" />
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up includes. just the core ones should be needed.

@@ -0,0 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="CNTK.GPU" version="2.2.0" targetFramework="net461" />
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we support both GPU and CPU by using both packages? would we need two projects for that? Does CNTK not have a separate package independent of device?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CNTK.GPU package supports both CPU and GPU, while the CNTK.CPUOnly, only supports CPU. As far as I know, the CPUOnly package was only made in order to minimize the size of the package, avoiding all the GPU dependencies, for CPU only uses. So we should be able to support both devices only using the GPU package.

This might be different with tensorflow, which to my knowledge has a separate build for CPU and GPU.

Copy link
Collaborator Author

@nietras nietras Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we would have the same issue then, if we wanted to minimize size and want to run CPU only, it is a bit unfortunate that "runtime" hasn't been separated from "type definitions/API", so only the project that executes the code would depend on the "runtime". Anyway, for now focus is on training so we have to live with it, but we should keep it in mind when defining types.

// datasource prefetcher

// for iterations
// Tuple<ITensorSymbol> batch = source.NextBatch(batchSize: 64); // load next minibatch
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be very careful here... symbols cannot be direct representations of loaded data. This is important. Symbols are not actual "instances". TF couples instances with symbols by coupling placeholder symbols to actual instances if I remember correctly.

The same applies to all the below. The suggestion is based on symbols, but it is unclear whether the suggestion is actually intending to use concrete instances or define some kind of operation via symbols.

No doubt this will get more concrete, when we start actually working on it. The notion of forward/backward is interesting, although I have not seen this divide explicitly in TF for example.

Also, don't like "Tuple<>" use ;)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is rather superficial, but does serve as a starting point to chose a direction from. When we start to implement types, it will become more concrete and then we can decide which parts should be handled via symbols and which part should be concrete instances.

I agree with the use of Tuple<>, it is just a placeholder until we decide on the real types.

@nietras
Copy link
Collaborator Author

nietras commented Jan 15, 2018

@mdabros did a bit of updating and cleanup for projects. Tensorflow backend project needs to target net461 since TensorFlowSharp is not net standard compatible. Updated to 1.4.0 too, also updated cntk.

Cntk project should be changed to sdk project type, but for now I am just continuing on trying to get minimal backends in place.

@nietras
Copy link
Collaborator Author

nietras commented Jan 15, 2018

@mdabros I should note that the way these wrapper libraries for TensorFlow and Cntk have been made makes it annoying to use due to the fixed targetting of x64 and net461. The problem is that the packages are poorly factored, there is no split between api surface (pinvoke) and runtime. Like we have done for some of our native stuff.

@nietras
Copy link
Collaborator Author

nietras commented Jan 15, 2018

We should consider adding a Backends solution only, for only these projects, otherwise the solution is too heavy to work with, especially with regards to tests. Right now I simply unload all the other projects.

using (var b = new CntkBackend(deviceType))
using (var g = b.CreateGraph())
{
g.Placeholder(DataType.Single, new int[] { 1, 2, 3 }, "p");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with:

Message: Test method SharpLearning.Backend.Cntk.Test.CntkBackendTest.CntkBackendTest_CreateGraph_Placeholder threw exception: 
System.TypeInitializationException: The type initializer for 'CNTK.CNTKLibPINVOKE' threw an exception. ---> System.TypeInitializationException: The type initializer for 'SWIGExceptionHelper' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'Cntk.Core.CSBinding-2.3.1.dll': The specified module could not be found. (Exception from HRESULT: 0x8007007E)

@mdabros have you seen this before? The nuget package has been added to the project...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dll can be found at:

SharpLearning\packages\CNTK.GPU.2.3.1\support\x64\Release

but it is not copied to output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CNTK package definition is really weird... seems to be ill defined.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I have not experienced that error before. But I do know that there are some issues with the current CNTK package: microsoft/CNTK#2527 and microsoft/CNTK#2807

However, I am not sure if that relates to the error above.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I tried changing the SharpLearning.Backend.Cntk.Test test project to .net framework to see if this was causing the issue. Sadly, this still result in the same error.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the only way to get this running is to add a x64 solution platform, and run the test using this specific platform. Using anycpu together with x64 test settings is not enough.

I guess this ties back to the packaging you mentioned earlier. This might also be another indication that the backend projects should have their own solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdabros please do not add x64 to solution platform, that is not necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdabros no doubt this is because the Cntk targets file does some logic on the Platform attribute of the project. So basically, you just have to change the Cntk.Test project from AnyCPU to x64, you can try and make a new branch from backends and create a PR back to it again. The devil lies in trying not to mess all project files up, which can quickly become a mess ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to change the Solution platform in order to do this. We might still consider whether we should in fact have just a x64 solution format since a lot is tied into this. But hopefully Cntk will become more cross-plat over time. And TF is.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nietras Good warning, I will change only the Cntk.Test project to x64. On a side note, it seems we still have to change the Cntk.Test project format to .net framework. If not, it is not possible to use CNTK directly from the project. I think this related to the two issues I linked to above.

}

// Helper class used to load and work with the Mnist data set
public class Mnist
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nietras This might not be the ideal way to represent and load the mnist dataset. But I guess it will suffice for initial testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdabros nah it is pretty dirty and quick way of loading as I said.

@nietras
Copy link
Collaborator Author

nietras commented Jan 23, 2018

@mdabros could you merge master into this, so my tensorflow PR doesn't have this changes shown?

Mads Dabros and others added 3 commits January 24, 2018 09:26
@mdabros
Copy link
Owner

mdabros commented Jan 25, 2018

@nietras The master has been merge to the backends branch, with the additional changes since you merged, so you can now merge the backends branch to the tensorflow PR to remove the changes shown.

nietras and others added 6 commits January 25, 2018 11:50
TensorFlow mnist simple example and python inspiration. Python and C# mnist simple are close to identical.
* cleanup

* more cleanup

* Start on mnist deep in tensorflow C#.

* Make C# and python mnist simple TF example identical.

* Add more python inspiration and docs. Trying to export graph from C# but cannot load this in tensorboard. More work on mnist deep.

* More tf deep.

* minor

* change to 64 batch size

* Make MnistDeep python deterministic with seeds and no data shuffling.

* Change python mnist deep to use gradient descent, so we can test C# version.

* Most code for mnist deep complete, but dropout fails...

* TF Mnist Deep runs, but does not give the same as python.

* Fix mnist simple target, after changing batch size.
* Add .txt to avoid adding cntk downloaded training files.

* Add simple mnist example from microsoft cntk examples. Including mnist download utils

* Updated readme with installation guide

* Update gitignore to include pyc files and pychache

* Only download and write train/test if they do not already exist

* Formatting

* Add .dnn to gitignore

* Add cntk conv net example. Almost direct copy from original cntk example:
https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Classification/ConvNet/Python/ConvNet_MNIST.py

* Modiefied SimpleMNIST CNTK example to be more similar to tensorflow example [WIP]

* Remove use of training_session to make the example more similar to the tensorflow equivilant

* Add MnistCnnExampleTest

* Update python simple mnist example

* Updated CNTK CSharp simple mnist example to be similar to python equivilant

* Added expected error from python implementation. Currently the CSharp version has lower error rate. But this is most likely caused by the two scritps running on different data, in training and in test, due to the loader differences

* Move cntk layers to separate file

* Remove unused methods

* Refactor MnistCnnExampleTest to be more similar to python equivilant

* Add comment about multiplication

* Add comments

* Add dropout

* Add Convolution to layers [WIP]

* Add parameter checks

* Set randomization to false to have same behaviour as C#

* Move mnist loader to separate file. Split Simple and CNN example to separate files.

* Update MnistSimpleTest_MinibatchSource to make randomization: false more clear

* Add load/Save to simple mnist C# examples. Currently fails. Loaded model returns 0 error while the trained model has around 20% error.

* Update saved model names

* Add comments

* Fix error in TestModel_Mnist_Loader and refactor simple C# cntk examples to test on the same data using the same metod.

* Remove validation set from mnist loader to get same order and number of samples in C# cntk minibatch source and mnist loader examples. For the same reason, also disabled normalization in the loader. If normalization is required, this can be added as an operation on the input of the network

* Update python simple mnist example to test using 1 sample at a time to make it more similar to the C# example. Also to go trough all samples. This was not done with the previous batch size of 1024

* Updated the somple C# mnist examples to use the correct python simple mnist error rate as assertion.
Python and C# cntk cnn examples now returns the same error. This is done using the CPU device. GPU is currently not deterministic, however a special deterministic mode might be available. Examples are also made to be as similar as possible between cntk in C# and cntk in python
@mdabros
Copy link
Owner

mdabros commented Aug 11, 2018

Since work on this has currently stopped, I am closing the pull request. We can always re-open it again if it becomes relevant.

@mdabros mdabros closed this Aug 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants