Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force constant chunk size when specified in ForwardDiff #539

Merged
merged 4 commits into from
Oct 3, 2024

Conversation

gdalle
Copy link
Member

@gdalle gdalle commented Oct 3, 2024

Warning

This change is technically non-breaking due to ambiguous semantics in ADTypes, but it will make your code error if you specify an AutoForwardDiff chunk size to be larger than the input length. Such code did not error before.

DI extensions

  • ForwardDiff: for AutoForwardDiff{C}, always build a Chunk{C} without caring about the length of x. This will trigger errors when C > length(x).
  • PolyesterForwardDiff: pick same chunk size as ForwardDiff (forgotten to add that before)

DIT source

  • Adapt backend to input length of scenario by thresholding chunk size when necessary. It's either that or filtering the scenarios by input size, which is even worse.

DI tests

  • Test type stability of preparation for AutoForwardDiff with fixed chunk size

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 98.57%. Comparing base (1208b44) to head (f818238).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...DifferentiationInterfacePolyesterForwardDiffExt.jl 66.66% 2 Missing ⚠️
DifferentiationInterface/src/utils/batchsize.jl 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #539      +/-   ##
==========================================
- Coverage   98.63%   98.57%   -0.07%     
==========================================
  Files         106      107       +1     
  Lines        4606     4620      +14     
==========================================
+ Hits         4543     4554      +11     
- Misses         63       66       +3     
Flag Coverage Δ
DI 98.67% <81.25%> (-0.10%) ⬇️
DIT 98.35% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gdalle gdalle marked this pull request as ready for review October 3, 2024 19:22
@gdalle gdalle merged commit c47e26c into main Oct 3, 2024
74 of 75 checks passed
@gdalle gdalle deleted the gd/enforce_chunk branch October 3, 2024 20:00
@baggepinnen
Copy link

@Vaibhavdixit02 this breaking change appears to make it difficult to specify the chunk size in Optimization.jl, since the same AD backend is used for all funcitons, gradient, constraint jacobian, lagrangian hessian. I just had code broken by updating patch releases due to this.

Ideally, this stance

This change is technically non-breaking due to SciML/ADTypes.jl#90 in ADTypes, but it will make your code error if you specify an AutoForwardDiff chunk size to be larger than the input length. Such code did not error before.

would be avoided, relying on poorly defined semantics to justify making breaking changes in a patch release makes it difficult for users to guard themselves against breakage.

@gdalle
Copy link
Member Author

gdalle commented Oct 17, 2024

I completely agree, which is why I opened SciML/ADTypes.jl#90 to keep track and clarify the semantics. I decided that such a change wasn't worth the ecosystem-wide update, but I apologize because I know that some code was broken as a result.

Let me explain why this change occurred. Prior to the present PR, I did not understand what fixing the batch size meant for those who use AutoForwardDiff(; chunksize=10), mainly the SciML ecosystem. It is actually used to guarantee type-stability when ForwardDiff.gradient is called without prior creation of a config object. This requires creating a Chunk{C} with a statically-known C=chunksize, which is what this PR ensures. As a result, picking a chunk size smaller than the input length will now cause errors which did not exist before.
In a way, you could think of the previous implementation as incorrect. You asked DI to use ForwardDiff with chunksize=10, and it ended up creating a Chunk{C} with C=min(chunksize,length(x)) (which was type-unstable).

In the case of Optimization.jl, the real issue is that the same backend can be used for all operators. But since Optimization.jl now uses the preparation mechanism of DI, specifying the chunk size has basically no efficiency benefits (I think). So I would recommend using the plain old AutoForwardDiff() for this package. Do you have a specific reason to specify chunk size in your application @baggepinnen?

There are additional subtleties with sparse operators that appeared as a result of #575, where the chunk size is chosen after coloring so you cannot rely on the length of the input to guide you. Maybe you ran into one of those with a very sparse Jacobian or Hessian?

@baggepinnen
Copy link

Do you have a specific reason to specify chunk size in your application @baggepinnen?

Yes, I want to keep the chunk size smaller than the automatically chosen one in order to minimize compile time, which may otherwise reach 10+ minutes for my problem

@gdalle
Copy link
Member Author

gdalle commented Oct 17, 2024

I understand, and it is a consequence I didn't have in mind. As I said, this PR tries to fix chunk size semantics to be consistent with the rest of the ecosystem. I asked the question on Slack before merging it, and the consensus on the SciML side (to which Optimization.jl belongs) seemed indisputable.

@gdalle
Copy link
Member Author

gdalle commented Oct 17, 2024

Again, the fact that this happened in a patch release of DI is my fault, and arguably not optimal (unless you consider the previously wrong semantics to be a bug). But the semantics that are in place now seem to be the right ones.
The main issue that remains is sparse AD, because the compressed input dimension is only known after coloring, and usually much smaller than length(x). So there is no good way to set a chunk size at the moment, and I have opened #593 to keep track.

@baggepinnen
Copy link

I asked the question on Slack before merging it, and the consensus on the SciML side (to which Optimization.jl belongs) seemed indisputable.

This SciML slack is dominated by academics and students to whom stability and maturity of a software ecosystem matters very little. Industrial users, where a team may have few julia experts and tight timelines, breakage like this is seen as a red flag and reasons to not use a technology.

In this particular case, I caught the break before our customers were exposed to it, but us shipping code and customers experiencing breakage when they perform an update that should not break anything looks bad, and is a common complaint I hear about the julia ecosystem in general.

I don't want to give you too much headache here, you're doing a fantastic job with DI.jl! Julia is also making it very hard to nail down interfaces, causing unintended reliance on something poorly defined almost inevitable.

@gdalle
Copy link
Member Author

gdalle commented Oct 17, 2024

I understand, and I'm grateful for the feedback. To be fair we're also in a corner case of SemVer, where it could be considered either a bug fix or a breaking change. But next time I'll consider the lesson learned and tag a breaking release instead of relying on bad interface definitions.
For dense AD, the solution is to always pick a chunk size smaller than the input. For sparse AD, as explained in #593, it is much trickier, and I could use a hand in figuring things out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants