-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensors in .NET #98323
Comments
I hope we can take #89730 into consider together as it's critical for achieving high performance linear algebra operations on arbitrary ND tensors. Such feature has been a fundamental of libraries like the native implementation part of pytorch. |
Tagging subscribers to this area: @dotnet/area-meta Issue DetailsGoal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks.
|
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsGoal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks.
|
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors Issue DetailsGoal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks.
|
I found my original comments somehow away from my actual intentions. The key point is not ET, layz evaluation, or any specific techniques. It is the ability of eliminating intermediate arrays (known as deforestation, or fusion https://en.wikipedia.org/wiki/Deforestation_(computer_science)). Users can always use lowlevel primitives to manually archive this. But if the highlevel api can not handle this, its usages will be largely limited. original comments For the arithmetic and linear algebra operations, I suggest considering the "expression template"(ET) techniques commonly used in C++ world (eg. eigen3 and blaze), which is a kind of lazy evaluations that can reduce the allocation of temp object. This is helpful especially for a GC language. For a quick impression, Last month, as part of attemp at rebasing my research from C++ to .NET, I made a experiment with ET in .NET via generics: https://github.com/Shuenhoy/DotnetETExp . The (not sufficient but illustrative) results look promising and suggest ET may also be helpful in .NET world. However, there are serveral barries that prevent me further investigating:
// ...
member (*.) (left: 'scalar & #INumberBase<'scalar>, right: 'MatExp & #IMatExp<'matExp, 'scalar> ) = // ...
member (*.) (left: 'matExp & #IMatExp<'matExp, 'scalar>, right: 'scalar & #INumberBase<'scalar> ) = // ... There is an workaround to use a wrapper struct like
Some other features like existential types (dotnet/csharplang#5556) may also be useful. But I cannot provide more information for now. In case someone is interested, I have uploaded my attempts with F# here https://github.com/Shuenhoy/Furzn/ . These feature requrests have been existed for a while and some may require changes in runtime and even metadata. I do not expect they can be all implemented any sooner. But I hope .NET foundation can take a closer look on them for a potentialy better presentation of Tensors and Linear algebra in .NET. |
Graph optimizations like that are a separate/independent concern and are not something that should be part of the default type experience. None of the major tensor libraries force such handling. They all allow trivial direct usage and provide a separate way to do lazy evaluation over an expression tree in order to allow dynamic code generation for additional performance. |
Thanks for your reply!
At least almost all common used libraries in C++ world use this, Eigen3, blaze, xtensor, etc.
In most cases, the users should not experience any difference between an ET library and a "direct usage" library. VectorXf x = m*(a + b + c);
var y = m*(a + b + c); // only with type inference will the expression types expose to user In fact, this is a major advantage of ET. It allows the users to write high performance code as natural as "direct usage" code. I propose ET because it's the most commonly used techniques to the best of my knoweldge. Though this is currently not possible in .NET, as my previous comment. I understand something may be out of the initial scope of the design goal and do not expect this can be solved in current stage. However, immediate array evaluation is definitely a fundamental problem and should be considered. Of course, techniques other than ET can be considered if they can be the same natural to write and performant (probably with the help of JIT?). |
I think we have different classifications of "major" here. C++ has several, but they tend to see far less usage than things like PyTorch, NumPy, TensorFlow, Jax, etc The C++ libraries you called out are notably depending on templating very heavily here and don't really fit into the broader "framework design guidelines" that .NET has for it's APIs.
This itself makes several assumptions including features that the consuming language supports and coding style that developers use in their codebase. Neither are things that we can rely on for something we're shipping from dotnet/runtime. A good design here is going to end up following the tried and true API design guidelines we have for .NET. It is going to consider how it integrates into the broader .NET ecosystem, how languages like C# and F# will consume it, and will be appropriately layered to correctly balance ease of use, extensibility, versioning, performance, and layering. I expect that this will ultimately come in the general shape of an This then gives a solid foundation on which it can be extended to support additional features. For example, it should be possible to design a By properly considering the core needs, the layering considerations, and ensuring we can have our tensor types appropriately expose the underlying memory or cheaply wrap other memory with the correct layout, we have a very robust and extensible system that follows the framework design guidelines and doesn't leave anything on the table. I'm working on the general design doc currently and hope to have more to share in the coming weeks. |
A lot of the lazy-evaluation could be handled by returning an ITensor interface instance, and optimizing which lazily-evaluated version of ITensor is returned, LINQ-style. Would that be too much of a performance hit? |
I think it's a problem of area. I am not sure this new tensor libaray is ML-specific or tends to be more general. For ML, of course you will see a lot of PyTorch etc. But there are also physical-based simulation, geometry processing, numerical optmization, graphics etc., that also need a tensor/linear algebra libarary and may have different usages. These areas are definitely smaller than ML, so in total there are more usage of PyTorch etc. Anyway, I proposed ET here only because it's the only technique I am aware of. It will be great if there are other way better suited for .NET to handle it like the
From my experiments https://github.com/Shuenhoy/DotnetETExp , using interface seems to be slower than even eager evaluation. For LINQ, the operations themself are usually heavy enough, so the overhead of boxing and dynamic dispatch with interfaces can be ignored. But for tensor operations, 1) the dimension may be not as large as the overhead of interface can be ignored, 2) there are usually more frequent operations, it is very common for something like |
There are MANY hundreds of books written in python which TorchSharp can leverage When dealing with Tensors in .NET, only very advanced users understand what to do with them in .NET I urge more discussions using TorchSharp as context so we can see and then share where Tensors in .NET is attempting to acheive. What kinds of gaps and use cases that demand Tensors in .NET? The whole concept of why we need to do this and how the goals not possible with e.g. TorchSharp is not very clear. The .NET community is SO FAR behind compared to python. Introducing something so that .NET community has something to play with without helping the communtiy to see THE WHOLE is making us nervous |
Moving remaining work here to .NET 10. We made significant progress in .NET 9 |
Goal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks.
The text was updated successfully, but these errors were encountered: