Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design question] I don't like factory. Why don't we just use modules? #1975

Closed
cshaa opened this issue Sep 26, 2020 · 38 comments
Closed

[Design question] I don't like factory. Why don't we just use modules? #1975

cshaa opened this issue Sep 26, 2020 · 38 comments

Comments

@cshaa
Copy link
Collaborator

cshaa commented Sep 26, 2020

I'm working on this PR and the current dependency system of math.js seems really frustrating to me.

image
^^ the first lines of a more complicated script look like this with factory

To be more precise, I don't understand why don't we use ES6 modules more, instead of the factory function which to me seems worse in many ways (the code isn't DRY at all, worse IntelliSense, problems with circular dependency).

I have a vague understanding that math.js has custom bundling support and some package-wide settings and that's the reason why factory was invented, but I think these should be doable with modules too.

So what's the reason why we use factory again? 😁️

@cshaa cshaa changed the title [Design question] factory is terrible. Why don't we just use modules? [Design question] I don't like factory. Why don't we just use modules? Sep 26, 2020
@josdejong
Copy link
Owner

Good question. Thanks for asking.

One side-note about the screenshot you shared for createEigs: maybe there you should inject realSymmetric and complex instead of creating them inside createEigs and having to manage their dependencies too in createEigs?

To explain the context: What I wanted to achieve with mathjs is an environment where you can do calculations with mixed data types, like multiplying a regular number to a Complex number or a BigNumber, and work with all of those in matrices. And I wanted to make it possible to add a new data type, like say BigInt, with little effort.

The solution that we have in mathjs now is a combination of two things:

  • typed-function, which makes it easier to (dynamically) create and extend a single function with new data types, automatically do type conversions on function inputs, etc. So, if you create function multiply for two numbers, you can extend it with support for multiplying two BigInts, and if you define a conversion from BigInt to number, the typed-function will automatically allow you to multiply a BigInt with a number.

  • Dependency injection using factory. When I have extended my function multiply with support for BigInt, thanks to the dependency injection, the function prod and many others will automatically support BigInt too, since it uses multiply under the hood. This also works the other way around: if I don't need the heavyweight multiply (which supports BigNumbers, matrices, etc), and I just need plain and simple number support, I can use a lightweight implementation of multiply for numbers, and inject that in prod and other functions.

The dependency injection indeed complicates the code hugely. If you or anyone can come up with a simpler approach I would love hear! Maybe we can do something smart during a compile/pre-build step instead or so.

@cshaa
Copy link
Collaborator Author

cshaa commented Sep 28, 2020

Thanks for your reply!

One side-note about the screenshot you shared for createEigs: maybe there you should inject realSymmetric and complex instead of creating them inside createEigs and having to manage their dependencies too in createEigs?

You mean I should bundle complex and createEigs using factory in the same way as eg. identity is bundled? I was thinking about that... but it would make them visible for the end users, right? And that in turn would mean that the functions have to have a reasonable, user-proof API, which isn't the case right now 😁️ Or are you talking about a different type of injection?

typed-function, which makes it easier to (dynamically) create and extend a single function with new data types, automatically do type conversions on function inputs, etc.

I think I understand typed quite well already. It seems pretty smart to me 👍️ and there doesn't seem to be much space for further improvement... except maybe adding TypeScript typings to it once I figure out how to implement #1779

Dependency injection using factory. When I have extended my function multiply with support for BigInt, thanks to the dependency injection, the function prod and many others will automatically support BigInt too, since it uses multiply under the hood. [...] Maybe we can do something smart during a compile/pre-build step instead or so.

Hmm... I was convinced that if was possible to add call signatures to a typed function, but I couldn't find that anywhere in the examples. Okay, I take back my last statement, maybe there are things to improve in typed after all 😁️. Specifically, if something like this was possible:

const fn4 = typed({
  'number': function (a) {
    return 'a is a number';
  }
});

fn4.addSignature({
  'number, number': function (a, b) {
    return 'a is a number, b is a number';
  }
})

fn4(1,2) // a is a number, b is a number

Then the use_bigint example could be rewritten into modules as simply as:

import math from 'mathjs'


math.typed.addType({
  name: 'BigInt',
  test: (x) => typeof x === 'bigint'
})

math.bigint = (x) => BigInt(x)

math.add.addSignature({
  'BigInt, BigInt': (a, b) => a + b
})

math.pow.addSignature({
   'BigInt, BigInt': (a, b) => a ** b
})

export default math

This is arguably much simpler than the original, and no compile-time magic was needed – just vanilla modules.

This also works the other way around: if I don't need the heavyweight multiply (which supports BigNumbers, matrices, etc), and I just need plain and simple number support, I can use a lightweight implementation of multiply for numbers, and inject that in prod and other functions.

This one sounds a lot more difficult to achieve with modules. Generally speaking, it's impossible to make a module "unlearn" some dependency – at least without some compile-time magic. But at the same time, it seems quite impractical to manually remove dependencies on something like BigNumbers... Since more than half of the code directly mentions them, one would have to rewrite almost all functions to remove the dependency – I struggle to understand why would anyone do that. Are there any more practical examples of where such "unlearning" might be used?

@josdejong
Copy link
Owner

Hm, good point. We should probably not expose realSymmetric publicly. On the other hand... maybe that's no problem at all? We could think about being able to inject a function but not having it exposed publicly, but in my experience if you want to internally create realSymmetric as a separate function, such a function can often be useful for others too. I.e. there should be nothing to hide.

So far, typed-function is created in an immutable way. So you can merge typed functions like const fnMerged = typed(fn1, fn2) into new ones, but not change an existing typed function like fn4.addSignature. You could of course do something like fn4 = typed(fn4, typed({ ...new signatures.... })).

The world would be simple if we have a single mathjs instance and can extend functions there like you describe, and also change the config there. However, suppose you're using two libraries in you application, library A and library B, and both use mathjs. If those two libraries both need different config and extend functions in some ways, they would get in conflict with each other. Therefore, I think it's important that you can create your own instance of mathjs with your own config and extensions, and the global instance should be immutable.

Are there any more practical examples of where such "unlearning" might be used?

I suppose you mean loading light-weight functions with just number support instead of the full package? Reasons are performance (typed-function does introduce overhead) and bundle size for the browser. Currently, functions like add contain implementations for each supported data type. One idea I have is to rewrite this to split the implementation per data type. That would result in a huge amount of tiny functions, which are merged together when loading a specific selection of functions and data types.

@cshaa
Copy link
Collaborator Author

cshaa commented Oct 6, 2020

Huh, I didn't realize mathjs is immutable now and making it mutable could break some real world code... This makes things infinitely more complicated. After some time of listening to music and thinking hard, I've come up with this design idea. It's probably still full of holes, so if you notice something that wouldn't work, doubt that something is a good idea, or simply don't understand my explanation, please do comment on that. It's definitely more complicated than “just exporting and importing” as I initially hoped, so please be patient with my explanation 😅️

Proposal v0

In any mathjs bundle, according to this proposal, there would always be two abstract types:

  • Real
    • all number types that correspond to real numbers
    • default subtypes: number, BigNumber, Fraction
  • Scalar
    • all “tensors of order zero”, ie. elements of a division ring, ie. things you can multiply a vector with
    • default subtypes: Real, Complex, Unit

The subtypes of these can be modified (you can remove some, or add your own), but these two abstract types shall always remain.

In any mathjs bundle, the math object will always have these core methods:

  • addScalar, subtractScalar, multiplyScalar, divideScalar, squareScalar
  • absScalar (returns a Real (what about Unit?))
  • compareReal, equalReal, largerReal, largerEqReal, smallerReal, smallerEqReal, unequalReal
  • expReal, logReal, powReal, sqrtReal, signReal
  • roundReal, floorReal and other rounding functions
  • sinReal, cosReal and other trigonometric functions

If a user wants to add their own Real type or Scalar type, they should provide these methods – for the best results they should provide all of them.

Ordinary methods (like dotPow, gcd, factorial, ...) are in individual files, exported, only wrapped in typed (no factory). If such a method depends on core methods, it can access them using this, eg. this.addScalar. Similarly the configuration can be accessed with this.config. If the method depends on a more complicated/niche function, it imports it directly from the corresponding file. If the method has overloads that are reasonable to include in some bundles, but exclude from others, those overloads should be in separate files – for example multiply should be split into multiply_base, multiply_DenseMatrix, multiply_SparseMatrix.

Pseudo-code to showcase this convention:

// file: multiply_DenseMatrix.js

import { typed } from "../core/typed.js"
import { DenseMatrix, pointwise } from "../type/matrix/DenseMatrix.js"

export const multiply_DenseMatrix = typed('multiply', {
  'Scalar, DenseMatrix': (a, B) => pointwise(B, e => this.multiplyScalar(a, e)),
  'DenseMatrix, DenseMatrix': (A, B) => DenseMatrix([ /* actual multiplication code */ ])
})

There are various pre-made bundles like all.js or number/all.js that set up the two abstract types, import all the important (hehe) methods and re-export them in one math object. This object is not yet ready to be served to the user, as it contains various overloads separated into different methods (multiply_base, multiply_DenseMatrix, ...) and doesn't contain any configuration. It is the job of the create method to do this.

This is a pseudo-code implementation of create:

export create(methods, config = {})
{
  const math = { config }

  for (fname of keysOf(methods))
  {
    if (!fname.contains('_'))
    {
      math[fname] = bind(methods[fname], math)
    }
    else
    {
      const name = fname.split('_')[0]
      const method = mergeAllOverloads(name, methods)
      math[name] = bind(methods, math)
    }
  }

  return math
}

The result of create(all) is the math object the user wants.

How a user uses this new API

  1. import { math } from 'mathjs' – use the default (immutable) instance of the default bundle
  2. import { create, all } from 'mathjs'; const math = create(all) – make your own instance of the default bundle
  3. make an instance of a different pre-made bundle
  4. put together your own bundle and instantiate it
  5. create a bundle with custom extensions and instantiate it
    • one can either extend the two abstract types and thus modify the behavior of existing methods
    • or create new methods and new overloads for existing methods, which adds new features while leaving the pre-existing functionality unchanged
    • or combine these two approaches

Pros & Cons

  • Cons
    • less flexible – originally you had the power to remove any one method and extend any method including its internal use
    • the code might internally use methods that aren't in the math object – the user has less control over the “size” of the bundle
    • the change requires a lot of refactoring
  • Pros
    • there's a clear division between which internally used functions can and cannot be altered by the user
    • no compile-time magic needed
    • dependency between methods is more clear – if you're extending eg. multiply to handle some different type but you also need the original multiply as a dependency, you put the new overload in a new file and import from the existing file
    • problems with circular dependency solved
    • the act of extending mathjs is hopefully more tangible and requires less knowledge of its internal workings
    • the standard module structure makes it possible to automatically generate TypeScript definitions

@josdejong
Copy link
Owner

josdejong commented Oct 7, 2020

Mathjs currently has a complicated hybrid: the lowest building blocks, functions, are immutable. On higher level, you can create a mathjs instance and change configuration. That will result in all functions being re-created with the new configuration.

Interesting idea to separate required core functions (like addScalar) from other functions like dotPow, and having a similar separation between the data types. It's a good distinction to know what functions to implement when adding a new data type. I'm not sure though whether there really is such a clear distinction in two groups: the higher-level functions not only depend on "core" functions, but also on other higher-level functions.

So if I understand your idea correctly, when creating a mathjs instance, the functions are simply bound to new math instance. So they would not be immutable but rely on say this.config to get the right config, and could use this.addScalar etc? Something like this:

function add (a, b) {
  return a + b
}

function sum (values) {
  let total = 0
  values.forEach(value => {
    total = this.add(total, value) // <-- here we use 'this'
  })
  return total
}

const instance = {}
instance.add = add.bind(instance)
instance.sum = sum.bind(instance)

console.log('sum', instance.sum([1, 2, 3]))

Relying on a this context would indeed remove the need for dependency injection and make things much simpler (except the dynamic binding may be tricky here and there). We've had this in solution the past, where all functions did use their dependencies simply from math.*. Difficulty with that was that you could not see how functions did depend on other functions, making it very hard to cherry pick functions.

It would be nice if it would be possible to export already bound functions which can be used directly, in such a way that tree-shaking would work and you should be able to re-bind the function to a different context later. Something like this works (but having to wrap it is still ugly):

// because we import all dependencies here, tree-shaking and cherry picking a function just works
import { add } from './add.js'
 
const sum = (function () {
  // default binding of all dependencies
  this.add = add

  return function sum (values) {
    let total = 0
    values.forEach(value => {
      total = this.add(total, value)
    })
    return total
  }
})()

Which can be used like:

import { sum } from './sum.js'

console.log('sum', sum([1, 2, 3]))

And you can bind it to a different context like:

const instance = {}
instance.add = // ... some different implementation
instance.sum = sum.bind(instance)
// now, instance.sum uses the new add function

I think though that it is not possible to bind instance.sum again to a different context, so maybe this is not the right approach. It would be great though if we can find a solution to allow for automatic resolving of the dependencies somehow. An alternative is indeed simply not allowing cherry picking functions, but offering a couple of 'bundles' where we make sure the bundles do work and contain all dependencies.

Really interesting to think this through 😎

@cshaa
Copy link
Collaborator Author

cshaa commented Mar 31, 2021

Bump! I'm interested in continuing this discussion and possibly making a few prototypes to test the possibilities!

Regarding your last comment: I think I have an idea how to solve this. I'll assume we'll be using TypeScript, but the idea would work without it too.

  • Each “area” of math.js (by that I mean matrix, parse & symbolic expressions, statistics ...) would have a fixed set of minimalistic functions that are needed for pretty much everything in that area. Eg. for matrix it would be multiply, transpose, inv, column... From now on, I will call these necessary sets of functions “an essential bundle” of that specific area.
  • Every function in an area would then have a typed this, which contains all these essential functions in its type. If someone wants to cherry-pick functions from different areas, they'd have to include at least this essential bundle, or else they get a descriptive compile-time error (or a runtime error if they use plain JS). This way, all functions from the essential bundles will be automatically extensible (or “rewritable”), because they are in this, and not imported directly from a specific file. This would also mean that we wouldn't have to import all the essential functions again and again in our codebase – the code will be shorter and easier to read.
  • If your function depends on a more specialized method from its area, a decision has to be made: would the code work even with an extended version of that function provided by the user, or is the algorithm closely tied to the datatypes you expect? If you don't think an extended function for a different datatype would work out-of-the-box, you can just import the function directly. If you think the function should be extensible, you can do something like this.add = this.add || add.
  • A more sophisticated way to have extensible non-essential methods would be to construct an extras object where we either copy a function from this or fall back to an imported one. This way we wouldn't have to mutate this and we could know its exact properties at compile time. An example in the (pseudo)code below:
// file: matrix/essential.ts
export { matrix } from './matrix'
export { add } from './add'
export { multiply } from './multiply'
export { transpose } from './transpose'
...
// file: matrix/solve.ts
import * as core from '../core'
import * as essential from './essential'
import { createExtras } from '../utils/createExtras'

// non-essential functions; for the sake of example, only one of them is made extensible
import { lup } from './lup' // extensible
import { usolve } from './usolve' // not extensible
import { lsolve } from './lsolve' // not extensible

export function solve(this: core & essential, M: Matrix, b: Vector) {
  const extras = createExtras(this, { lup }) // only creates the object once, then caches it
  
   const { L, U, P } = extras.lup(M)
   
   const c = usolve(U, b)
   const d = lsolve(L, c)

   return d
}

Then if you cherry-pick solve for your custom bundle:

import { create } from 'mathjs/custom'
import * as core from 'mathjs/core'
import * as matrix from 'mathjs/matrix/essential'
import { solve } from 'mathjs/matrix'

const fn = { ...core, ...matrix, solve, lup: ()=>{ my code }, usolve: ()=>{ my code }  }
const math = create(fn)

math.lup === fn.lup // true
math.usolve === fn.usolve //true

math.solve([[1]], [1])
// uses your custom lup
// but doesn't use your custom usolve

What do you think about the proposal? Should I make a prototype repo to test this in practise?

@cshaa
Copy link
Collaborator Author

cshaa commented Apr 1, 2021

An argument for designing a new architecture: In the current system, creating one function means modifying five different files in different parts of the codebase. I think this number should go down :^)

@josdejong
Copy link
Owner

Your proposal can be interesting @m93a . I guess a main difference with the proposal I made before (see #1975 (comment)) is that you're relying on a create function again to "bind and create" the functions. I'm not sure whether the function solve(this: core & essential,...) syntax would work out nicely, but that's something to try out in a prototype, right? 😄 . So the groups like essential and core are just for convenience, right? So you don't have to import a lot of individual functions?

I'll reply on the other issues in mathjs that you commented on soon but I'm not managing to keep up with all the issues at this moment.

@gwhitney
Copy link
Collaborator

gwhitney commented Mar 25, 2022

For a working tiny prototype using many of the ideas in the discussion of this issue, but which does not use any manipulation of "this" and takes a more simpleminded approach to bundle-gathering than the "areas" and "createExtras" later in the conversation, but nevertheless seems as though it may well be scalable to the size of mathjs, see: https://code.studioinfinity.org/glen/picomath

@josdejong
Copy link
Owner

Oohh, I'm really curious to have a look at your PoC 😎 ! Will be next week I expect though.

@gwhitney
Copy link
Collaborator

Great. I also went ahead and (in an additional feat/lazy branch, so as not complicate the base PoC) implemented invalidation and lazy reloading to handle changes to a math instance's global configuration object. I feel that also worked out pretty smoothly (although probably the API for an implementation registering itself for reloading could be smoothed out somewhat). Looking forward to your thoughts!

@gwhitney
Copy link
Collaborator

Unfortunately the picomath proof-of-concept relies on a possible version of typed-function in which typed-functions are mutable (e.g. they can have additional signatures added to them after initial creation, without changing object identity). That is possible, but the experiments in josdejong/typed-function#138 indicate that it comes at too heavy a performance cost. So I think that approach is a dead end for now

@josdejong
Copy link
Owner

Thanks for adding the pointer here to the experiments and benchmarking that you did in this regard. That was indeed quite a bummer. We have too keep experimenting and trying out new ideas :)

@gwhitney
Copy link
Collaborator

Yes, I have a new concept: in typed-function v3, we allow implementations to flag that they refer to another signature of themselves (or their entire self). I am imagining a variant of typed-function in which individual implementations can flag that they refer to a signature of any other typed-function (or the entire other typed-function) as well. Then we just load modules gathering up all implementations of all typed-functions (creating a huge directed graph among impementations, that is hopefully acyclic). But no actual typed-functions are instantiated yet, because we don't know if any will get more implementations as we load more modules. Then just before we actually start to compute (maybe triggered by the first attempt to call a typed function, rather than define one), the whole web of definitions is swept through in topological sort order, finally instantiating all of the typed-functions, so that all of their references have been instantiated as well, and they can all be compiled down to code that doesn't need to call functions through possibly changing variables, which is the slow operation.

This would have the likely effect of slowing down initialization (but perhaps it can be done incrementally so that this burden is spread out) but hopefully keeping computational performance once everything is initialized as high or higher than it currently is.

This description may be a bit vague; as time permits I will do a "pocomath" proof of concept and post here.

@gwhitney
Copy link
Collaborator

OK, the new proof-of-concept Pocomath is working. It does everything the original picomath did, and more. (I have not implemented the lazy-reloading of functions when a configuration option changes. I think it's quite clear from what's there that Pocomath can easily handle this capability, but I would be happy to do a specific reference implementation of it if anyone would like.)

Specifically, it uses the current, non-mutable, typed functions of typed-function v3, but allows gathering of implementations solely via module imports, it has no factory functions, and it should easily allow tree-shaking.

To show that this latest architecture makes it very easy to implement new types and more generic types, I have implemented a bigint type in Pocomath and in combination with the Complex type there it gives you Gaussian integers (ie. Complex numbers whose real and imaginary parts are bigints) automatically.

I am very encouraged by this proof-of-concept, and would be delighted for everyone interested to look it over. I humbly submit that adopting this basic architecture would make organizing the mathjs code as desired and adding new types and functions much easier. (In particular on @m93a's criterion of the number of files you have to touch to add a new operation.) Looking forward to your thoughts, @josdejong.

@josdejong
Copy link
Owner

I love it @gwhitney 😎. I love the simple, public API, which I think boils down to:

const math = new PocomathInstance('math')
math.install({
  [functionName]: {
    [functionSignature] : [dependencyNames, dependencies => function],
    ...
  },
  ...
})

This is really perfect to easily compose all kind of functions and data types. The lazy collection of signatures and only build as soon as you use a function ensures there is no double work done creating typed-functions (which is relatively slow). And this way we end up with typed-functions that are the same and should perform the same as the current mathjs instance. It is amazing how litte code is needed in PocomathInstance! (of course, it's a POC, and things will grow, but still...).

I think what we loose here compared to picomath is the ability to just pick a function, like const {add, subtract} = myPocoMath, and have these functions automatically up to date when importing new signatures in myPocoMath afterwards. So you'll have to reference them as myPocoMath.add, or only create a reference after you've imported all you wanted. I don't think this is a problem in practice, it's only a feature that I liked about picomath, but that came with a performance drawback.

it has no factory functions

It has factory functions, but the dependency injection now moved from the function level to the signature level. That is an interesting take, and the magic ingredient of pocomath I think, making this composable solution possible and very straightforward! We can discuss again about whether to use the array notation [dependencies, callback] or something like referTo(dependencies, callback) but that's an implementation detail 😉. We could think though if it has to be mandatory or optional. Having to define dependencies per signature may result in extra boiler plate, I'm not sure how that will work out in practice. It would also be interesting to see how we can facilitate referring to an other signature of the function itself, maybe that works out of the box with typed.referTo of typed-function@3.

About tree shaking: this indeed works, but with this approach we leave it up to the user to import all stuff needed to satisfy all dependencies. The generic subtract function requires two functions: add, and negate. In mathjs, the chain of dependencies is much larger. How can we ensure that when you import a function like subtract, you end up with a valid instance where all dependencies are resolved? (without going through an endless trail-and-error of trying to load, getting an error about a missing dependency, importing the missing dependency too, repeat). My feeling is that we should come up with ready-made modules with sets of functions that work out of the box. What do you think?

I see the use of dynamic imports in extendToComplex.mjs. Just noting here that this will not work when bundling code for the browser. This code is not part of the core of pocomath though, so no problem at all, you can use such an approach if you want but you don't have to.

One practical issue: Windows cannot handle two files with the same case-insensitive name: /complex/Complex.mjs and /complex/complex.mjs. Can you rename one of the two or put one in a sub directory or so?

@gwhitney
Copy link
Collaborator

OK, there was one major bug in Pocomath (which I have fixed): each PocomathInstance needs to have its own typed-function instance (if for example no functions on Complex numbers have been installed in one instance, it should have no notion of a Complex type). So for the purposes of the proof-of-concept I just made the "Types" operator of a PocomathInstance be interpreted specially as the list of types to be added to the typed-function universe of the instance. That all seems to work fine, although in a full production version there could be a somewhat more elegant way of expressing the Type dependencies.

This work surfaced one really desirable extension to typed-function and one outright bug in it that I was able to work around; I have filed them in its repo (josdejong/typed-function#154 and josdejong/typed-function#155).

I've also implemented a bunch of changes/extensions to Pocomath based on your great feedback, thanks! Specific comments interpolated below:

the simple, public API, which I think boils down to:

const math = new PocomathInstance('math')
math.install({
  [functionName]: {
    [functionSignature] : [dependencyNames, dependencies => function],
    ...
  },
  ...
})

Yes, that's captured very nicely.

This is really perfect to easily compose all kind of functions and data types. The lazy collection of signatures and only build as soon as you use a function ensures there is no double work done creating typed-functions (which is relatively slow).

Well, to be clear, if you install some signatures to math.add, say, and then call it, and then install some more, and then call it again, two typed functions will be constructed and the first will be thrown away when the second is made.

And this way we end up with typed-functions that are the same and should perform the same as the current mathjs instance.

I agree. Once all the installs are done and the functions you need are instantiated, the resulting collection of typed functions should be as performant as current mathjs, with the potential to do slightly better if the feature allowing direct reference to a specific signature is easy enough to use and applicable enough that it gets substantial use (once implemented).

It is amazing how little code is needed in PocomathInstance! (of course, it's a POC, and things will grow, but still...).

I do think this is a fair amount of what would be the core of a full production version. It is pleasant that there doesn't seem to be a lot of infrastructure beyond what typed-function provides.

I think what we loose here compared to picomath is the ability to just pick a function, like const {add, subtract} = myPocoMath, and have these functions automatically up to date when importing new signatures in myPocoMath afterwards.

Ahh, this excellent comment gave me the idea to use getter functions to produce the typed-functions instead of temporary self-removing values for the operations defined in the instance, so that nobody could be storing a fake function that doesn't do anything. That doesn't eliminate your point, though; but on the other hand, math.js currently has this issue: If you save math.add from math.js and then do an import that overrides add, your old copy of math.add is stale and won't do the new behavior.

So you'll have to reference them as myPocoMath.add, or only create a reference after you've imported all you wanted. I don't think this is a problem in practice, it's only a feature that I liked about picomath, but that came with a performance drawback.

Yeah, because in JavaScript you can teach an old function a new trick, but only at a high price in speed.

it has no factory functions

It has factory functions, but the dependency injection now moved from the function level to the signature level.

OK, I understand that perspective. But you don't have to think about writing factory functions, you just export your semantics decorated with what they depend on.

That is an interesting take, and the magic ingredient of pocomath I think, making this composable solution possible and very straightforward! We can discuss again about whether to use the array notation [dependencies, callback] or something like referTo(dependencies, callback) but that's an implementation detail. We could think though if it has to be mandatory or optional.

I agree there are lots of possible notations and the dependencies could be optional. I implemented a bunch of different options (I don't think we would actually want to allow them all in practice) and used a different one for each of the types that are currently implemented in Pocomath so that you could look at the possibilities visually and see which one or ones you liked to have in the end.

Having to define dependencies per signature may result in extra boiler plate, I'm not sure how that will work out in practice.

Yes, I am slightly worried that functions that have a long list of dependencies will be a pain, but the basics are pretty smooth so I am hopeful that won't be too bad.

It would also be interesting to see how we can facilitate referring to an other signature of the function itself, maybe that works out of the box with typed.referTo of typed-function@3.

As you probably saw right now, you can already just say you depend on self, which definitely seems smoother than the current interface of typed-function; and my plan for referring to a specific signature was to say that you depend on self(number,string), say. The drawback about this option is then currently you would have to write the dependency and implementation as

[['self(number,string)'], ref => actual_arg => some_expression(ref['self(number,string)'](99, actual_arg.toString()))]

which looks a bit ugly/cumbersome. One thought I had is that if you said you depended on NS=self(number,string), then you'd be allowed to write instead:

[['NS=self(number,string)'], ref => actual_arg => some_expression(ref.NS(99, actual_arg.toString()))]

which looks cleaner to me. If you have reactions/thoughts on this scheme or other approaches to make the notation convenient, I am very interested.

About tree shaking: this indeed works, but with this approach we leave it up to the user to import all stuff needed to satisfy all dependencies. The generic subtract function requires two functions: add, and negate. In mathjs, the chain of dependencies is much larger. How can we ensure that when you import a function like subtract, you end up with a valid instance where all dependencies are resolved? (without going through an endless trail-and-error of trying to load, getting an error about a missing dependency, importing the missing dependency too, repeat).

I've implemented two possibilities for this and given examples in Pocomath. First, for generic functions like subtract, we could have corresponding 'concrete' modules that also import enough other stuff that you can actually run subtract, at least on numbers, say. So if you only load 'concrete' modules you'll always be in a state that runs, at least in a minimal way. Second, I added a function to the core that (dynamically) loads all of the files you need to compute all of the functions you've installed so far on a list of types you specify. (We could add a diagnostic parameter that would also make it display all of the modules you need, so you could at least cut and paste to make a static import for what you desire.)

My feeling is that we should come up with ready-made modules with sets of functions that work out of the box. What do you think?

And yes, as a third but I think the most usual method, the idea was that modules like number/all.mjs or complex/all.mjs are examples of self-contained chunks that have all of the functions you need to work with those types, that you can install as you need. And presumably in the future some of the "topic areas" like algebra or combinatorics would also have corresponding "off the shelf" modules.

I see the use of dynamic imports in extendToComplex.mjs. Just noting here that this will not work when bundling code for the browser.

Are you sure? webpack and vite both say they work with dynamic imports. But I have exactly zero experience with building/bundling JavaScript, so "working" may mean something different than I think...

This code is not part of the core of pocomath though, so no problem at all, you can use such an approach if you want but you don't have to.

Right, and at least we can always enhance the dynamic loaders with options to emit the corresponding list of static imports you'd need to make a static version, which would then presumably lead to a nice lean bundle.

One practical issue: Windows cannot handle two files with the same case-insensitive name: /complex/Complex.mjs and /complex/complex.mjs. Can you rename one of the two or put one in a sub directory or so?

Done, and especially it made sense to move these files to distinct places when switching to the 'Types' pseudo-operation to specify the types that some functions would need to have around. I also added a nifty little check against future case-collisions.


In sum, do you think there's potential here for a route toward a reorganization of actual mathjs that will make it easier to add further functions and types (and possibly to supply other collections of functions and types as somewhat independent add-ons to mathjs)? What would be the next features/facilities you'd like to see Pocomath grow to have to be sure it could overcome the hurdles for full production use? Thanks very much for your encouragement and feedback.

@gwhitney
Copy link
Collaborator

gwhitney commented Jul 23, 2022

Update: after implementing all of the various possibilities for dependency/implementation notation yesterday, an idea for how it might be very nice to do occurred to me, which was encouraged and refined by a StackExchange answer by James Drew (see https://stackoverflow.com/a/41525264). Under the new scheme, the generic implementation of subtract becomes:

export const subtract = {
   'any,any': ({add, negate}) => (x,y) => add(x, negate(y))
}

and that's it. PocomathInstance can obtain the names of the dependencies from the keys in the destructuring parameter of the (outer) function -- thanks to James Drew's trick -- plus that destructuring makes the resulting referred-to functions super easy to use in the body of the implementation. For signature-specific references (when they are implemented), the notation will be:

export const foo = {
   string: () => s => 'Got' + s,
   any: ({'self(string)': selfS, 'concat(string,string)': cat}) => arg => selfS(cat(arg.toString(), '[converted]'))
}

This notation just seemed so clearly better than all the other options that I went ahead and switched Pocomath to use only this one. (You can rewind to the previous commit if you want to look at all the other options.) Its only tiny drawback is that for functions without any dependencies, I could not find any approach to make the initial () => optional. But that bit is so syntactically lightweight and I do not think it is so bad to always explicitly flag 'this implementation has no dependencies' that I just went ahead with this. I think the other advantages are well worth this aspect of having () => for all dependency-free implementations. (In practice I think the vast bulk of implementations have dependencies, anyway.)

@gwhitney
Copy link
Collaborator

gwhitney commented Jul 24, 2022

Update no. 2: I wasn't quite happy with the way types were being specified, so I fixed that up a bit and added the feature that signatures that use unknown types are simply left out of the typed-function. That way you can install an operation with definitions for lots of types, and the ones for (currently) undefined types will be ignored, but the operation will be rebundled whenever that type becomes defined. This would be convenient for operations where the definitions for all types are collected into a single file, as opposed to the type-segregated approach I have been using so far; likely eventual uses of this approach might mix the two styles of organization.

There is one slight concern about this relaxed approach to unknown types, that it won't catch typos in type names. i think those could be caught another way, I'm filing that as an issue in the Pocomath repository.

@gwhitney
Copy link
Collaborator

Update no. 3: I implemented reference to specific signatures (of another function or of yourself). Most of the work revolved around the fact that Pocomath is lazy about types not yet being defined, while typed-function insists they all be there already, but it wasn't too bad. I also realized that lazy reloading of functions upon a config change came almost for free with the basic framework, so I added that. Used these things to implement sqrt for number and Complex types. The definition should be easily extensible to bigint and the resulting Gaussian integers, I will check tomorrow. But I think most all of the features that would be needed to extend this PoC to a fulll mathjs-size production version are now there.

@gwhitney
Copy link
Collaborator

Update no. 4: There was a remaining problem with the type system. Since all types were being put as keys in the value of the single identifier Types, it was not possible for them to be collected up from different modules. The latest commit fixes this by putting the type info on different identifiers that start with Type_, so we have Type_number and Type_Complex and Type_bigint, etc. With that, sqrt is extended to bigint and indeed sqrt for Gaussian integers comes along for the ride.

There is an outstanding architecture issue, which you can see at https://code.studioinfinity.org/glen/pocomath/issues/29 about a similar phenomenon with merging different signatures of an operation. I would be happy to have any feedback you might have about a preferred route to go with this. All else equal, I am leaning toward option 4 in that issue. From my current perspective, when that point is ironed out, the architecture for Pocomath is reasonably developed and looking scalable.

Thanks, and looking forward to whether this could have a real positive impact on mathjs.

@josdejong
Copy link
Owner

Thanks for the updates.

Well, to be clear, if you install some signatures to math.add, say, and then call it, and then install some more, and then call it again, two typed functions will be constructed and the first will be thrown away when the second is made.

Indeed after importing a new signature, all dependent functions need to be re-created with typed-function. In general I think we do not need to worry. According to the load.js benchmark, creating all typed functions of mathjs takes like 40ms right now. And typically you import all you want upfront. We can just document the behavior so people understand how to optimize.

Good points on how to satisfy dependencies, with something like "concrete modules". The dynamic loading is indeed cool, though it can only be used in a node.js environment I think. In the current mathjs, I created scripts to generate files like "dependenciesAdd.generated.js", we could do something similar.

webpack and vite both say they work with dynamic imports

Let's just try it out to know for sure :). A dynamic import like await import('./myfile.js') indeed is supported, but a importing a dynamic path like await import(./${name}.mjs) requires the code to be executed. As far as I know bundlers do not (yet?) do that.

In sum, do you think there's potential here for a route toward a reorganization of actual mathjs

Yes, I like this approach. I would love to have it at the core of mathjs (and get rid of some complex, old backward compatible mechanisms too).

What would be the next features/facilities you'd like to see Pocomath grow to have to be sure it could overcome the hurdles for full production use?

Let me think. Some things that pop in my mind:

  1. We need to think through how to work with config, that is shared by all functions.
  2. For example Chain uses a listener that gets informed when something new is imported.
  3. To be sure of equally good performance, we need to test whether the new approach of resolving dependencies on every individual signature instead of on the function as a whole doesn't have a negative impact. We could do a benchmark with a few of the functions that we have.

About the dependency injection: that new notation is really interesting. The destructuring is indeed very neat. What I am wondering though is: in my head it should be possible to either provide a regular function if you do not need to inject dependencies, or provide a factory function. With the new notation like ({add, negate}) => (x,y) => add(x, negate(y)) there is no explicit way to distinguish whether dealing with a factory function or a regular function I think? Or do we enforce you to always provide a factory function? I'm hesitant to use smart Javascript tricks to make the core API work, these tricks tend to bite.

I do love your idea to use strings like 'self(string)' to inject references to yourself and specific signatures, that is nifty!

I still have to check out the latest version of pocomath, will keep you posted.

@gwhitney
Copy link
Collaborator

Good points on how to satisfy dependencies, with something like "concrete modules". The dynamic loading is indeed cool, though it can only be used in a node.js environment I think.

I am pretty sure it also works in a browser if one makes all of the files available at the relative URLs corresponding to the path names being imported.

In the current mathjs, I created scripts to generate files like "dependenciesAdd.generated.js", we could do something similar.

Absolutely.

webpack and vite both say they work with dynamic imports
Let's just try it out to know for sure :).

Agreed.

I would love to have it at the core of mathjs (and get rid of some complex, old backward compatible mechanisms too).

Great news, makes me feel like the time I have spent pondering this puzzle has been worth it.

What would be the next features/facilities you'd like to see Pocomath grow to have to be sure it could overcome the hurdles for full production use?

1. We need to think through how to work with config, that is shared by all functions.

Already done in the latest revision, config is just another property on the PocomathInstance that operations can depend on, and if they do they are passed it in their implementation producer, and they will be lazily reconstructed if any properties of config change.

2. For example `Chain` uses a listener that gets informed when something new is imported.

OK, there are already enough functions to make a Chain type, will add a PR for it and let you know when it's implemented.

3. To be sure of equally good performance, we need to test whether the new approach of resolving dependencies on every individual signature instead of on the function as a whole doesn't have a negative impact. We could do a benchmark with a few of the functions that we have.

Should actually have a zero or positive impact, I think If you want to point me to a particular benchmark or two in the mathjs collection of benchmarks I will implement the same computation in Pocomath so we can have a race ;-)

About the dependency injection: that new notation is really interesting. The destructuring is indeed very neat. What I am wondering though is: in my head it should be possible to either provide a regular function if you do not need to inject dependencies, or provide a factory function. With the new notation like ({add, negate}) => (x,y) => add(x, negate(y)) there is no explicit way to distinguish whether dealing with a factory function or a regular function I think? Or do we enforce you to always provide a factory function? I'm hesitant to use smart Javascript tricks to make the core API work, these tricks tend to bite.

Trying to see if I understand your concern. Are you worried about, say, the number-type implementation of say sqrt, where you'd like to be able to write:

export const sqrt = {
  number: Math.sqrt
}

Indeed, with the current architecture I could not find a way to make this work. Currently, the price of this "accumulate implementations and regenerate typed-functions whenever necessary" architecture is that you have to say:

export const sqrt = {
  number: () => n => Math.sqrt(n)
}

I don't think a dozen or so characters is a big price to pay, and the upside is that for any implementations that have no dependencies (which frankly I think are a fairly small minority), we are explicitly stating that they are dependency-free with the initial () => part.

That said, if you would like I could certainly add a notation something like

export const sqrt = {
  number: {plain: Math.sqrt}
}

for when you want the implementation just to be a plain function. We can use any value for the signature key in this case that you want that is not a function entity at the top level itself, even [Math.sqrt] if you want the most compact notation I can think of. The point is that if it's a function, the infrastructure is going to have to call it, and so then it had better be a function of the form

({<dependencies>}) => (<operation arglist>) => value

or the infrastructure is going to blow up.

Or is your concern that for some operations you don't want a typed-function at all, just to assign a regular JavaScript function that will be used for all arguments, like say a logger that writes whatever it gets to a log file or the console or something? I could easily support something like

export function logger(...stuff) {
  for (const item of stuff) console.log(item)
}

in a Pocomath module and then when that module is installed in a PocomathInstance pm, the given function becomes the definition of pm.logger (replacing any previous such definition and causing any typed-function implementations that depend on the logger to be lazily re-bundled as needed). The point is that the "usual" value of an identifier in a Pocomath module is a plain object mapping signatures to implementation producers, so the infrastructure can notice a function entity and handle it differently. Would you like me to do that?

I do love your idea to use strings like 'self(string)' to inject references to yourself and specific signatures, that is nifty!

Glad you like it. I find it more convenient than typed-function's current notation.

OK, sounds like this PoC is well on its way. I have some issues on it that you can see that I am going to implement, to which I will add the Chain type, as well as one or more of the "plain function" options discussed above if you think they would be useful in addition to the "standard" implementation-producers currenty used. I will check back in here when all that's done and we can plan next steps.

@josdejong
Copy link
Owner

I've been looking through the latest version of pocomath. I have to say I really love how it works out. It starts itching to get this approach implemented for real in mathjs 😄 . The solution to reference self, a specific signature like abs(Complex), and config makes a lot of sense. I love it that it is all in one place and that there are no "special" constructs needed for it. That makes the API plain and simple.

Trying to see if I understand your concern.

Yes you indeed understand what I mean. I've been thinking more about this, that it is now always necessary to pass a factory function for every signature. In cases like export const negate = {number: () => n => -n} this is overkill. I think though that in the majority of cases you will have some dependencies, so it probably isn't that much overhead. And it again keeps the API very simple and consistent if this is just the only way to do things. So let's just go for always requiring a factory function. Indeed we can create a util function for it. I do not really like number: {plain: Math.sqrt} which is a special syntax, but you could do number: create(Math.sqrt) or so (implemented as const create = fn => fn). But, it isn't really simpler than number: () => Math.sqrt so I'm not sure if it is really needed. Shall we only implement something for that when the need arises?

Or is your concern that for some operations you don't want a typed-function at all

That was a second concern actually :). The use case I'm thinking about is that it is very powerful if you can just import some statistics or numerical library from npm straight into pocomath. These are no typed-functions. Here is a mathjs example demonstrating that in mathjs: https://github.com/josdejong/mathjs/blob/develop/examples/import.js

One other thought: in mathjs, over the years, their grew a need to have a distinction between functions that operate on scalars vs the functions that accept matrices too. So now you have addScalar and add, divideScalar and divide, equalScalar and equal, and a few more. It was needed for performance reasons mostly, and in some cases to prevent circular references. It may be worth thinking through how to identify a type as a scalar (i.e. number, bigint, BigNumber, Fraction, Complex, Unit, string, boolean). I'm not sure if a special provisioning would be needed or helpful to deal with this. So, maybe the function add could have a signature Matrix,Matrix which requires a dependency like add(scalar,scalar) or so. Any ideas?

@gwhitney
Copy link
Collaborator

But, it isn't really simpler than number: () => Math.sqrt so I'm not sure if it is really needed. Shall we only implement something for that when the need arises?

Sure, I am perfectly happy with a "wait and see" approach on an "adapter for a plain-function implementation".

Or is your concern that for some operations you don't want a typed-function at all

That was a second concern actually :). The use case I'm thinking about is that it is very powerful if you can just import some statistics or numerical library from npm straight into pocomath. These are no typed-functions. Here is a mathjs example demonstrating that in mathjs: https://github.com/josdejong/mathjs/blob/develop/examples/import.js

Yes, it does seem powerful to import external libraries. I will add an issue for installing ordinary JavaScript functions to the list I am implementing in Pocomath.

One other thought: in mathjs, over the years, their grew a need to have a distinction between functions that operate on scalars vs the functions that accept matrices too. So now you have addScalar and add, divideScalar and divide, equalScalar and equal, and a few more. It was needed for performance reasons mostly, and in some cases to prevent circular references. It may be worth thinking through how to identify a type as a scalar (i.e. number, bigint, BigNumber, Fraction, Complex, Unit, string, boolean). I'm not sure if a special provisioning would be needed or helpful to deal with this. So, maybe the function add could have a signature Matrix,Matrix which requires a dependency like add(scalar,scalar) or so. Any ideas?

This is an excellent point. It is not yet clear to me if the reasons that addScalar is needed in the current architecture will arise with a Pocomath-based architecture. One question that has been kicking around in my head is why is there a Matrix type in mathjs in addition to Arrays? I assume the primary answer is so there can be both DenseMatrix and SparseMatrix. But it has also been kicking around in my head that another item that makes Matrix powerful is that it is (or at least can be) type-homeogeneous, i.e., all entries are the same type. And if this is part of the sauce, then it seems to me that it would be pleasant if Matrix<number> were actually a distinct type from Matrix<Complex> -- in other words, for Matrix to be a template type in the usual meaning of that.

It is indeed possible to build a concept of template types on top of typed-function as it exists (or eventually it could perhaps be incorporated internally therein). Then we might have something like (in pseudo-code):

export const add = {
  'Matrix<T>,Matrix<T>': ({'add(T,T)': eltAdd}) => (M,N) => {
    // check that dimensions match
    const result = M.clone()
    for (const index of result.indices) {
      result.set(index, eltAdd(result.at(index), N.at(index)))
    }
    return result
  }
}

(Hence, an item to implement templates is next up on my list of things to try in Pocomath.) I think an approach like this might make it unlikely that the need for a dependency on add(scalar,scalar) would come up.

On the other hand, if we find ourselves needing a way to refer to "add for scalar types", maybe to deal with potentially inhomogeneous collections like Arrays, I think it would be possible to introduce esentially "typedefs" into Pocomath in which we install a type 'scalar' into an instance with a designation that it simply means number | Complex | Fraction | etc. etc. and a way to construct a new typed-function on the fly (with something like a dependency on add(scalar, scalar)) that filters the installed implementations of add to include only those that mention types included in scalar.

(After all, the current version of Pocomath always filters implementations to include only those that mention a defined type, in order to allow operation-centric files like mathjs has where a single operation is defined for numerous types, but only the implementations for types that have actually been installed in the instance are employed. Pocomath is trying very deliberately to be completely agnostic as to how operations and their implementations are organized into source files, even though the demo so far uses one operation for one type per file. I plan to add an example of many operations for one type in a single file, and I will also add an example of one operation for many types in a single file just to verify it works.)

However, this feels like it may end up being a little complicated so I would definitely put this concept on the same list as plain-function-implementation-adapters, i.e., things to be implemented if the actual need arises.

@josdejong
Copy link
Owner

It is not yet clear to me if the reasons that addScalar is needed in the current architecture will arise with a Pocomath-based architecture.

I'm not sure either, but I think the same issues would pop up, it helps for performance and also better error messaging to have functions that only work for scalars. There was one (quite) specific circular dependency: divide needs inv, and inv needs divide. It may be possible to implement that differently though.

The reasons for a Matrix shell around a nested Array where:

  • Create support for one interface Matrix and multiple implementations (DenseMatrix, SparseMatrix).
  • In the constructor you can do validation checks once, so you don't have to do consecutive checks on all operations. Calculate the size, validate that all rows/cols have the same size.
  • In the constructor, you can once determine whether the matrix is type-homeogeneous, and if so, select a specific operation for that type. I.e. if you know that two matrices only contain numbers, and you want to add them, you can pick the signature addScalar(number, number) and use that instead of using the slower typed function addScalar.
  • Have a nice, user friendly API with methods like .size(), and be able to hide stuff behind the interface.

When rethinking matrices, here are some thoughts:

  1. The current "hybrid" solution of Matrix and Array is not ideal, it is a lot of back and forth. Life would be much simpler if it would if we would just have Arrays and no ceremony around it. But like I explained above, there are/where good reasons for it.
  2. Supporting a matrix with mixed contents may be needlessly complex. Maybe we should not support that at all, and like you say, only support Matrix<number> and Matrix<Complex>. Maybe it would be better to convert the full matrix with numbers to complex numbers as soon as one of it's values becomes complex. It's easier to reason about and optimize. I like your idea of 'Matrix<T>,Matrix<T>': ({'add(T,T)': eltAdd}) => (M,N) => { ...}, that makes sense indeed! And it would make the whole reason for addScalar redundant.
  3. I regularly got feedback that using nested arrays to hold the contents of a DenseMatrix is not ideal. If we would instead store the contents as a flat list with values and an index, all kind of algorithms would be much simpler and faster. And many implementations in say C/C++ use that structure, so it would be very easy to port them. This is something I would love to figure out further.

[...] construct a new typed-function on the fly (with something like a dependency on add(scalar, scalar))

That is a very interesting idea 🤔. So basically, that would allow not only to inject a function, or a single function signature, or self reference, but it would allow you to filter a set of signatures out of all of the signatures of a function and create a new, optimized function. It sounds very cool. I agree with you though, that something like this could open a lot of tricky complexities, so maybe best to keep it as an idea for now.

@gwhitney
Copy link
Collaborator

On scalar functions:

it helps for performance and also better error messaging to have functions that only work for scalars.

I am thinking/hoping that the template implementations will be as good or better on both of these counts.

There was one (quite) specific circular dependency: divide needs inv, and inv needs divide. It may be possible to implement that differently though.

The Pocomath-style of infrastructure has no problem with co-recursion (as long as it bottoms out at some point). For example, the exact circular dependency you mention currently exists in Pocomath as it stands right now: the generic implementation of divide is via the invert operation (invert and multiply), and Complex relies on the generic, supplying a definition of invert that in turn uses divide (on its components). No special handling/syntax/anything is needed for such references, they are all just resolved at "bundling" time, as I call it (constructing a typed-function from the accumulated implementations for different signatures via the _bundle method, nothing to do with build-time bundling of JavaScript with webpack etc.) So that aspect will not require separate scalar functions.

@gwhitney
Copy link
Collaborator

On Matrix and friends:
Sounds to me like that reconsideration is close to orthogonal to switching to a Pocomath-style infrastructure. Presuming that goes through, it should not be too hard to add a third StrideMatrix implementation of the Matrix interface that is strictly type-homogeneous, dense, and stores all matrices linearly. Then if it in fact works better and appears adequate for mathjs needs, eventually the current DenseMatrix could be dropped. I think Pocomath-style infrastructure would certainly make such a switch no harder, and might well ease the implementation and transition. But just to set expectations, I doubt such efforts in Matrix would be something I would take on. But hopefully we can leave all of the tools in place for someone who came along interested in carrying that torch.

@gwhitney
Copy link
Collaborator

gwhitney commented Aug 1, 2022

OK, template implementations (but not yet template types) are working in Pocomath now. I've switched as many of the generic operations to use them now as I could figure out how to. Also, there's a nifty built-in implementation of typeOf in every PocomathInstance that comes basically for free -- I definitely think at least this should be moved into typed-function, and possibly other features if it looks like this is the direction mathjs is headed.

@gwhitney
Copy link
Collaborator

gwhitney commented Aug 1, 2022

And now I have added an example of defining the 'floor' function in an operation-centric way. Note I am not proposing that in a putative reorganization of mathjs to use a Pocomath-like core that both operation-centric and type-centric code layouts would be mixed, I just wanted to demonstrate that the Pocomath core is completely agnostic as to the organization of the code. (Except for importDependencies which assumes a certain layout, but it shouldn't really be in the core, it's more of a tool, and a different version could be written for a different layout.)

@gwhitney
Copy link
Collaborator

gwhitney commented Aug 1, 2022

Pocomath now has a working Chain type (with the methods only being chainified when called through a chain and updating when they are modified on the underlying Pocomath instance). However, the chain methods are not quite as completely "lazily" updated in that at every extraction of a method, this implementation of Chain checks whether the underlying operation has changed on the Pocomath instance. If you think that's enough of a problem for performance (as opposed to the Pocomath instance invalidating the chainified version when it updates the underlying, so that the chainified version can simply be called directly whenever it's valid, without checking for an update) to be worth it, I can modify it to work that way as well. The only cost is somewhat tighter coupling between the instance and the Chain. (Right now, the instance provides a place for Chains to store their chainified functions, since of course the underlying items that are being chainified could vary from instance to instance. So that repository has to be associated with the specific instance. But the contents of that place are completely managed by Chain. To make things even lazier, Chain and PocomathInstance would have to have shared knowledge of the format used to store chainified functions in the repository so that PocomathInstance could do the invalidation directly when it was changing one of its operations. Let me know if it seems OK for the proof of concept or if you'd like the invalidations to be "pushed" from the instance.

@gwhitney
Copy link
Collaborator

gwhitney commented Aug 2, 2022

Import of plain functions is working now.

@gwhitney
Copy link
Collaborator

gwhitney commented Aug 7, 2022

Template operations and template types are both working in the prototype (Pocomath) now. There is a type-homogeneous vector type Tuple<T> where all operations are componentwise, and Complex has been templatized into Complex<T> to force its real and imaginary parts to have the same type. I even added a cute little demo that this scheme provides quaternions with absolutely no additional code, as Complex<Complex<number>> (for the components to be of number type).

@gwhitney
Copy link
Collaborator

gwhitney commented Aug 7, 2022

And now there is an adapter that just sucks in fraction.js (well actually bigfraction.js, because I think it's clear mathjs and fraction.js should move to the bigint version) with no additional code added elsewhere in Pocomath.

With that, my feeling is that Pocomath is "feature-complete" as a proof-of-concept: it demonstrates all of the aspects that I would be proposing to bring to mathjs with a reorganization along these lines. There are a couple of very minor concerns mentioned on the issues page, but I think they would easily come out in the wash in an integration with typed-function/mathjs so I don't feel they are worth running down now.

So I think the only remaining "proof" the concept needs for evaluation is some benchmarks, to ensure that the additions to typed-function do not bog the system down. I believe they won't, because in fact Pocomath tries very hard to bypass typed-function dispatch as much as possible and bundle in direct references to individual implementations rather than to typed-function entry points. So please just point me to some benchmarks for mathjs that you think might be reasonable for evaluating Pocomath in this regard, and I will duplicate them in Pocomath and post the results. Thanks!

@josdejong
Copy link
Owner

The Pocomath-style of infrastructure has no problem with co-recursion

just 😎

Sounds to me like that reconsideration is close to orthogonal to switching to a Pocomath-style infrastructure.

Yes you're right. I should better separate the different discussions (and at the same time, the discussions related to TypeScript and the pocomath architecture are a bit scattered right now).

template implementations (but not yet template types) are working in Pocomath now

The genererics are impressive, it makes sense. The Complex<T> example is very powerful. I think we should maybe just call this "generics" instead of "templates"?

I just wanted to demonstrate that the Pocomath core is completely agnostic as to the organization of the code.

👍

Pocomath now has a working Chain type

😎 again. I how to implement the lazy or less lazy updating of functions is an implementation detail. I think we can just start with the current approach, and do some benchmarks to see if there is a performance issue in the first place, and if so try out alternative solutions.

I even added a cute little demo that this scheme provides quaternions with absolutely no additional code, as Complex<Complex<number>>

🤯 ...Shortcut in my head... 😂

And now there is an adapter that just sucks in fraction.js

yeeees, that's where we want to go, easily import a new data type that has a set of built-in functions/methods. Fraction.js, Complex.js, BigNumber.js, UnitMath, ...

Optional dependencies
I also see the sqrt function now being able to return either number output or mixed number/complex output depending on the config. It also checks whether there is a complex implementation (if (config.predictable || !cplx) { ... }. I think in general it would be good to throw an exeption when a dependency is not satisfied, currently pocomath is just silent, which can make debugging hard. Currently in mathjs you can explictly mark a dependency as optional with a ? notation, like '?complex(number,number)': cplx,. What do you think?

Benchmarks
I think the only difference is that dependencies are now resolved on each individual signature instead of on the typed-function as a whole. I do not expect this has a performance impact, but would be good to verify. To test this, I think we can take a function with a reasonable number of dependencies, like distance, and inside createDistance create a temporary (second) dependency injection function to mimic the behavior of pocomath. And then verify the runtime performance.

With that, my feeling is that Pocomath is "feature-complete" as a proof-of-concept

Thanks a lot for working this concept out Glen. I really appreciate this. I think this concept addresses all the pain points of the current architecture, I really believe this is the way to go. There is one big open challenge: TypeScript support discussed in #2076. I want to find a satisfying solution for that first, before implementing this architecture for real in mathjs.

@gwhitney
Copy link
Collaborator

Return type annotations are now supported, and in fact supplied for all operations, in the Pocomath proof of concept. This move the POC closer, I suppose, to the sort of organization need for a conceivable TypeScript switch/unification. The change took a while both because I physically relocated, but also because it was a very good bench for strengthening the underlying infrastructure. Type checking and template instantiations are now significantly more robust.

As soon as I can I will get to the final piece of the proof of concept, namely some reasonable benchmark. I am virtually certain that building an instance with all of its dependencies satisfied will be slower in Pocomath; there's just a great deal to do as far as instantiating templates and resolving every individual implementation, which is clearly less efficient than resolving the dependencies for an entire source file all at once, which may have many implementations in it. What we're getting is a great deal more flexibility in organizing code. So in a benchmarking of bootstrapping time, Pocomath is bound to lose by a significant margin. But that's just a one-time initialization cost. The aspect I feel needs benchmarking is the performance of the resulting implementations of the operations, and there I think I can reasonably hope that Pocomath will noticeably outperform current mathjs because with the templating (or generics if you prefer that terminology), a nested Pocomath operation should perform many fewer typed-function dispatches.

I will report back here when I have results.

@gwhitney
Copy link
Collaborator

gwhitney commented Sep 1, 2022

Hmm, I looked into this but currently the only benchmarks that do a significant amount of numerical computation are the matrix benchmarks (but reimplementing matrices in Pocomath doesn't make sense) and isPrime (but that test just calls a single typed-function that has no dependencies, so there really isn't any difference to test). So I will need to add a new benchmark on the mathjs side. I need something fairly realistic that calls a variety of basic arithmetic functions. Please does anyone following this issue have a suggestion for an algorithm or calculation I might use as a benchmark for this purpose? I could solve a quadratic and a cubic using the respective general formulas... That's reasonably attractive because the cubic goes back and forth into complex numbers. But it's fairly simplistic. Any better suggestions?

@josdejong
Copy link
Owner

josdejong commented Sep 2, 2022

I see the Returns is used dynamically, inside the function body. I'm not sure, but I have the feeling that that will make it hard or impossible to statistically analyse with TypeScript. What do you think?

About benchmarks:
It is indeed the runtime performance that counts, bootstrapping performance is less important (though still relevant of course). I did a small benchmark to verify whether there is a performance penalty on doing dependency injection on a per-signature base, and that is not the case:

I thought I had reported about this but I probably forgot or wasn't yet happy with it yet, sorry.

This benchmark that I parked in this separate branch is quite limited, it would be better indeed to have a real expression with multiple operations.

Repository owner locked and limited conversation to collaborators Sep 2, 2022
@josdejong josdejong converted this issue into discussion #2741 Sep 2, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants