Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math.NET Matrices #6

Open
nmfisher opened this issue Sep 11, 2017 · 4 comments
Open

Math.NET Matrices #6

nmfisher opened this issue Sep 11, 2017 · 4 comments

Comments

@nmfisher
Copy link

This is a really great library. Was there a specific reason why you chose to roll your own Matrix class, rather than leveraging Math.NET?

Ideally I'd like to marry the two (not only for consistency with modules I've already written, but even for smaller things like using Matrix rather than Matrix). Before I jump in and start changing anything, though, I thought I'd check with the author to see if there was a specific reason behind it.

If I do proceed with integrating the two, more than happy to submit back a PR too, just let me know.

@mdabros
Copy link
Owner

mdabros commented Sep 12, 2017

Hi nmfisher,

Thanks for your interest in SharpLeraning, I am glad you find it useful.

Regarding using Math.Net matrices as a replacement for the SharpLearning matrices, it is a bit of a long story with several considerations.

Initially I created my own matrix class to avoid dependencies on other libraries. The main consideration being that without dependencies it is easier to change direction, for instance, to support .net core and .net standard for multiplatform support.

This changed when I added the SharpLearning.Neural project, which is using Math.Net for the matrix operations to utilize MKL/OpenBLAS. Overall, I think Math.Net is a great library, and switching from SharpLearning’s matrix implementation to Math.Net matrices is something I have considered more than once.

However, I have been a bit reluctant to take the step for a few reasons:

  • Most of the code in SharpLearning is simply using the matrix class as a container (no arithmetic or similar). So, in most cases Math.net would be a large dependency for only getting a container. This might indicate that the SharpLearning matrix class should rather be replaced by a multidimensional array with some extension methods.
  • Microsoft will soon release CNTK (their deep learning toolkit) with support for evaluation and training in C#. I am planning to adapt SharpLearning.Neural to use CNTK as backend since this will provide all the building blocks for learning and evaluating neural nets. So for SharpLearning.Neural, CNTK will probably replace the dependency on Math.Net, which is currently the only project which is using the math.net arithmetic.
  • I am considering to introduce a Tensor class and changing the interface of the learners from Learn(F64Matrix observations, double[] targets) to Learn(Tensor observations, Tensor targets). This would make it possible to support more types of problems, for instance, regression with multiple targets. The Matrix (be it F64Matrix, or Math.Net Matrix) and vector (be it double[] or Math.Net vector) should be implicitly convertible to a Tensor.

Personally, I would like to wait with changing the matrix implementation until a decision has been made on introducing a tensor class (yes or no), since the two changes will overlap a lot.

However, if you go ahead and integrate SharpLearning and Math.Net in a fork, there are a few things to be aware of:

  • SharpLearning F64Matrix has a row-wise layout and Math.Net matrices has a column-wise layout.
  • SharpLearning F64Matrix support a few unmanaged operations to create views over the matrix without copying the memory. This is primarily used in the TreeBuilders from SharpLearning.DecisionTrees. I believe Math.Net has some similar operations but you might be able to avoid it completely.
  • I would recommend to make some timing benchmarks on a reasonable sized data set before and after the introduction of the Math.Net matrix, to ensure that the training time of the learners is similar (or better) after the change.

I am keeping the issue open, and if you do proceed with integrating the two, I will be interested in following your progress.

@nmfisher
Copy link
Author

Thanks for the detailed response. Definitely appreciate how much effort you've gone to.

I understand your desire to limit dependencies, particularly as the discussion re .NET Standard support in Math.NET Numerics is ongoing.

Your point about CNTK is interesting. Personally, I am using Math.NET to implement various neural word embedding models (e.g. word2vec) in C#. I know MS is planning to expand CNTK support to .NET/C#, but I don't have the luxury of waiting to see if I can leverage this. This means I'll have to implement these models manually (though knowing my luck, CNTK .NET support will be released the exact same day I finish my own implementation).

In other words, if I could standardize on CNTK, I would. Unfortunately it seems I don't have that option yet.

For the time being, I will hold off on forking to support Math.NET. Best to wait to see what's in store for .NET/C# CNTK support first.

A Tensor interface does sound reasonable, though, so if/when you decide to proceed with this, I would also be happy to help test.

@mdabros
Copy link
Owner

mdabros commented Sep 13, 2017

I believe Microsoft will release the initial/preliminary training API for C# during September/October. At least, it is included in their August – September iteration plan: microsoft/CNTK#2194

So hopefully, we will have something in the not so distant future :-)

Regarding introducing a Tensor interface, you are more than welcome to help contribute and test. If I proceed with it, I need to figure out a design so it will integrate efficiently with both the existing matrix implementations and with CNTK. When this is done, I will try to add issues describing what needs to be done, this should make it easier to pick op tasks and help improve the library.

In general, I have a lot of ideas for new features and improvements to SharpLearning. Currently, I have these stored in a private backlog, but they might as well be added as issues for the project, making it easier to help contribute.

You are also more than welcome to add issues and contribute if you find something lagging or have ideas for new features.

@mdabros
Copy link
Owner

mdabros commented Jan 8, 2018

It seem like microsoft will eventually add support for a Tensor type. So if a tensor class is to be introduced SharpLearning, it would probably make sense to use the microsoft implementation.

This is also relevant for: #20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants