Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute the number of dimensions N in build_solution #28

Closed

Conversation

olivierverdier
Copy link

@olivierverdier olivierverdier commented Jun 30, 2024

My code got broken with (probably) a recent version of SciML. Possibly related to commit SciML/SciMLBase.jl@55d81af
This is my quickfix to the problem, devoid of testing or understanding.

Copy link

codecov bot commented Jun 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.13%. Comparing base (4f283a0) to head (87d9053).

Current head 87d9053 differs from pull request most recent head e442535

Please upload reports for the commit e442535 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
+ Coverage   95.12%   95.13%   +0.01%     
==========================================
  Files           8        8              
  Lines         328      329       +1     
==========================================
+ Hits          312      313       +1     
  Misses         16       16              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kellertuer
Copy link
Member

Hm indeed SciML does (for me) a bit too many breaking changes in non-breaking releases. Should we maybe also bump the compat entry then? (raise the version on the [compat]) If that is possible that is (i.e. we have that entry).

@mateuszbaran
Copy link
Member

Hm, I will try to fix this soon. Apart from SciML sometimes doing breaking changes in non-breaking releases, ManifoldDiffEq.jl hooks into some of their internals so they can justifiably feel free to change them as they see fit.

@ChrisRackauckas
Copy link

Hm indeed SciML does (for me) a bit too many breaking changes in non-breaking releases.

This dimension was added in 2016. What's the right time interval for such changes?

@@ -224,7 +224,8 @@ function build_solution(
interp.cache,
prob.manifold,
)
return ODESolution{T,1}(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless your solutions were scalar valued this was never correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ndims is required to be defined to use ODESolution then we can't use it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it looks like if we can't use ODESolution, SciMLBase has almost nothing that can be used in ManifoldDiffEq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, whether maybe the solution in this PR is enough, but I also could not find what N actually refers to by looking at either https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODESolution or (had to look for that manually since it was not linked) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractODESolution or (same manual search) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractTimeseriesSolution ... (and then I got a bit tired).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any solution that uses ndims is at best a workaround since ndims is not a valid concept for points on a manifold or tangent vectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree.

@ChrisRackauckas
Copy link

Apart from SciML sometimes doing breaking changes in non-breaking release

What breaking change? Please help me understand what you are asking for here and I can help out. But this dimension parameter has not changed in 8 years so I do not understand what's being asked for.

@kellertuer
Copy link
Member

Hm indeed SciML does (for me) a bit too many breaking changes in non-breaking releases.

This dimension was added in 2016. What's the right time interval for such changes?

I am not sure, I did not write this code, for now my main experience and interaction with SciML was sudden failures on CI – maybe because we used internals, I am not sure; so this looked like the same area, and I am not sure how the got to the parameter being 1, maybe because we adapted some example code?
I personally did not get much into that, since I was usually missing documentation every now and then.

@kellertuer
Copy link
Member

Apart from SciML sometimes doing breaking changes in non-breaking release

What breaking change? Please help me understand what you are asking for here and I can help out. But this dimension parameter has not changed in 8 years so I do not understand what's being asked for.

No the parameter did not change, but how you handle it and how it is documented. I did not work with SciML for the last year or so, but when I did, e.g. implementing this function

function alg_cache(
alg::ManifoldEuler,
u,
rate_prototype,
uEltypeNoUnits,
uBottomEltypeNoUnits,
tTypeNoUnits,
uprev,
uprev2,
f,
t,
dt,
reltol,
p,
calck,
::Val{true},
)

I have no clue (today) what any of the parameter is, I just copied that from an example. back then I could maybe figure out 20% of what the parameters mean because none of them are documented anywhere (and I am not a differential equations guy much). So when we do such guesswork, we maybe also guessed, the parameter 1 is the right thing to do from an example (where that was the case). Now in 2.40, you surely did not change the parameter itself, but how you interpret and work with it. Before you maybe were more relaxed about what the value was?
Now you are not, and since we guessed what it might mean, we were wrong but never noticed, and then...
what this ends up doing is that the commit linked above breaks other peoples code but is marked as nonbreaking. Usually this hits us often (I remember up to 5 times for now?) which I feel is annoying.
But maybe the main issue is documentation (and not the breaking part which one could also consider bug fixing).

@mateuszbaran
Copy link
Member

What breaking change? Please help me understand what you are asking for here and I can help out.

Thank you for offering help but everything except this issue has already been resolved. SciML folks help resolve any issues quite quickly but still, our CI failures caused by upstream changes or bugs mostly originate in SciML. At the same time SciML most likely has more development than other dependencies, so per commit you may even be more reliable than other libraries but it's hard to tell.

@mateuszbaran mateuszbaran mentioned this pull request Jul 1, 2024
@ChrisRackauckas
Copy link

ChrisRackauckas commented Jul 1, 2024

I have no clue (today) what any of the parameter is, I just copied that from an example. back then I could maybe figure out 20% of what the parameters mean because none of them are documented anywhere (and I am not a differential equations guy much). So when we do such guesswork, we maybe also guessed, the parameter 1 is the right thing to do from an example (where that was the case). Now in 2.40, you surely did not change the parameter itself, but how you interpret and work with it. Before you maybe were more relaxed about what the value was?

That code is a little scary 😅.

alg_cache, perform_step!, initialize!, etc. it looks like you're making dispatches on internals of OrdinaryDiffEq? It is described in the devdocs, though I do agree that we need to update the internals docs since it has been awhile.

I would highly recommend sticking to public APIs if you want stability. There are no guarantees to the outside that internal functions will not change any arguments or behavior. If that was the case then every release would have to be a breaking release! For information on Julia's public interfaces, I would recommend reading:

https://discourse.julialang.org/t/what-does-the-new-public-keyword-actually-do/108975

As a matter of style, we always define our public APIs in the top level file. You can see the public APIs of OrdinaryDiffEq here:

https://github.com/SciML/OrdinaryDiffEq.jl/blob/v6.85.0/src/OrdinaryDiffEq.jl#L236-L469

The public APIs of a solver package are simply the algorithms. The functions used to define the stepping behavior of the algorithms are generally not meant to be documented public API.

I am not sure, whether maybe the solution in this PR is enough, but I also could not find what N actually refers to by looking at either https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODESolution or (had to look for that manually since it was not linked) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractODESolution or (same manual search) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractTimeseriesSolution ... (and then I got a bit tired).

It corresponds to the dimension of the solution object through its indexing definition. The indexing interface is defined here:

I am not sure, whether maybe the solution in this PR is enough, but I also could not find what N actually refers to by looking at either https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODESolution or (had to look for that manually since it was not linked) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractODESolution or (same manual search) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractTimeseriesSolution ... (and then I got a bit tired).

Yes, sorry about that. We have been solidifying our interfaces over the last year but it seems like you found a place that's not well-defined in the docs. The solution types boil down to acting as an AbstractVectorOfArray, and we are missing a full definition of that interface.

https://docs.sciml.ai/RecursiveArrayTools/stable/array_types/#RecursiveArrayTools.VectorOfArray

I'll see if I can get around to it in the JuliaCon time frame. But basically, AbstractVectorOfArray is almost Vector{AbstractArray{T,_N}}, and thus N = _N + 1, i.e. a vector of vectors has a matrix structure on it. However, it's only truly an AbstractArray if the internal vector is always constant size, which isn't always the case. If the structure is ragged, it's still definable in indexing, conversions, broadcast, etc. but operations that assume fully rectangular cannot exist (conversion to Array). Thus AbstractVectorOfArray is not AbstractArray, and this is a change we made last year as, again, we're cleaning all of our interfaces in order to make sure they are well-defined and strict in their behavior. And we have setup a lot of error checking setup to ensure interfaces aren't violated.

The issue of course is that strict enforcement of interfaces which were previously loosely enforced causes a few hiccups in the edge cases, and that is exactly what we're seeing here.

If ndims is required to be defined to use ODESolution then we can't use it here.

Is there a more general interface function that should be considered here? I don't know of one in ArrayInterface.jl but I'd be interested to hear what the correct generalization is if it helps. I would think that the tensor dimensionality of a manifold object would be well-defined as that's more a definition of what kind of operator/vector an object is. The tensor dimensionality "are you a vector or matrix" is not the dimensionality of the manifold, i.e. the length of a state representation of an object in the vector space. And this would allow for example different charts/representations, because it's not the length of the data, and this would then use the ragged AbstractVectorOfArray structure as above.

If I'm misunderstanding though, I'd be happy to learn how this needs to be generalized for manifold cases.

@kellertuer
Copy link
Member

kellertuer commented Jul 1, 2024

Thanks for this very detailed answer. Concerning

I would highly recommend sticking to public APIs if you want stability.

I tried my best to do that ;)

For the ndims argument. The problem is that our tangent vectors might be more general than (just) an army, see for example https://github.com/JuliaManifolds/Manifolds.jl/blob/676b0f5d0751f4899bec67a0a81b7f313e1f6db7/src/manifolds/FixedRankMatrices.jl#L88-L106 (or any other <: TVector type). They have to still follow “Vectorian ideas” (addition, scaling etc.). But ndims does not make sense for the type linked. That might have been the idea to set it to 1, though that is a bit of cheating.

@kellertuer
Copy link
Member

One additional comment on the tutorial you mentioned

https://docs.sciml.ai/DiffEqDevDocs/stable/contributing/adding_algorithms/

There I have exactly the problem with for example

function alg_cache(alg::RK_ALG,u,rate_prototype,uEltypeNoUnits,uBottomEltypeNoUnits,tTypeNoUnits,uprev,uprev2,f,t,dt,reltol,p,calck,::Val{true})
[...]

I am missing a bit what all the parameters of the function are. I think for the above-mentioned Manifold-Explicit-Eurer I (or Metusz) took exactly that tutorial and tried to make sense of the parameters and what they mean for the algorithm to use the DiffEq framework, but I probably understand far too less (still) about the rest of the framework.
Yes the public API is basically all algorithms, but one idea here, is to implement new algorithms (on Manifolds that is). So for that it seems not all we have to use (or we think we have to use) seems to be public then. How can we then best avoid this package to fail regularly? Because that happened a few times already.

@mateuszbaran
Copy link
Member

I would highly recommend sticking to public APIs if you want stability. There are no guarantees to the outside that internal functions will not change any arguments or behavior. If that was the case then every release would have to be a breaking release! For information on Julia's public interfaces, I would recommend reading:

IIRC I needed to overload some internals (specifically, build_solution) to get the interpolation method I wanted. Or, at least the one close to what I wanted because it should be on-manifold Hermite interpolation but it is fairly difficult to implement.

Is there a more general interface function that should be considered here? I don't know of one in ArrayInterface.jl but I'd be interested to hear what the correct generalization is if it helps. I would think that the tensor dimensionality of a manifold object would be well-defined as that's more a definition of what kind of operator/vector an object is. The tensor dimensionality "are you a vector or matrix" is not the dimensionality of the manifold, i.e. the length of a state representation of an object in the vector space. And this would allow for example different charts/representations, because it's not the length of the data, and this would then use the ragged AbstractVectorOfArray structure as above.

The thing is, there are (to my knowledge) no generic and useful operations for arbitrary manifolds that generalize ndims. We either work on a specific representation for a specific manifold, and then ndims is well-defined and used as necessary. Or, we linearize representation using get_coordinates or get_parameters and we get plain vectors of numbers that have ndims equal to 1. We just didn't see the need to linearize to something other than a vector but it could be added. The thing about manifold linearizations is that you have to be explicit about how you linearize and that is by design.

We have charts and representations figured out here and it works perfectly fine without ndims. "Give me the number of indices" requires an assumption about indexing, and indexing is meaningless without a linearization or an embedding. Linearization needs to be explicitly specified by a certain additional object in JuliaManifolds, while the embedding can be a tuple of vectors and matrices. ArrayPartition sort of gives up here and pretends to be a vector but if that is enough, why can't we pretend that everything is a vector? If an operation can meaningfully distinguish between vectors and matrices, it will ignore the inner structure of ArrayPartition unless it's explicitly special-cased, which is prone to errors when we have about 10 or more other structs like that.

@mateuszbaran
Copy link
Member

The issue was fixed by #33 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants