Force constant chunk size when specified in ForwardDiff #539

gdalle · 2024-10-03T18:42:16Z

Warning

This change is technically non-breaking due to ambiguous semantics in ADTypes, but it will make your code error if you specify an AutoForwardDiff chunk size to be larger than the input length. Such code did not error before.

DI extensions

ForwardDiff: for AutoForwardDiff{C}, always build a Chunk{C} without caring about the length of x. This will trigger errors when C > length(x).
PolyesterForwardDiff: pick same chunk size as ForwardDiff (forgotten to add that before)

DIT source

Adapt backend to input length of scenario by thresholding chunk size when necessary. It's either that or filtering the scenarios by input size, which is even worse.

DI tests

Test type stability of preparation for AutoForwardDiff with fixed chunk size

codecov-commenter · 2024-10-03T18:51:52Z

Codecov Report

Attention: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 98.57%. Comparing base (1208b44) to head (f818238).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...DifferentiationInterfacePolyesterForwardDiffExt.jl	66.66%	2 Missing ⚠️
DifferentiationInterface/src/utils/batchsize.jl	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #539      +/-   ##
==========================================
- Coverage   98.63%   98.57%   -0.07%     
==========================================
  Files         106      107       +1     
  Lines        4606     4620      +14     
==========================================
+ Hits         4543     4554      +11     
- Misses         63       66       +3

Flag	Coverage Δ
DI	`98.67% <81.25%> (-0.10%)`	⬇️
DIT	`98.35% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

baggepinnen · 2024-10-17T10:41:55Z

@Vaibhavdixit02 this breaking change appears to make it difficult to specify the chunk size in Optimization.jl, since the same AD backend is used for all funcitons, gradient, constraint jacobian, lagrangian hessian. I just had code broken by updating patch releases due to this.

Ideally, this stance

This change is technically non-breaking due to SciML/ADTypes.jl#90 in ADTypes, but it will make your code error if you specify an AutoForwardDiff chunk size to be larger than the input length. Such code did not error before.

would be avoided, relying on poorly defined semantics to justify making breaking changes in a patch release makes it difficult for users to guard themselves against breakage.

gdalle · 2024-10-17T10:59:31Z

I completely agree, which is why I opened SciML/ADTypes.jl#90 to keep track and clarify the semantics. I decided that such a change wasn't worth the ecosystem-wide update, but I apologize because I know that some code was broken as a result.

Let me explain why this change occurred. Prior to the present PR, I did not understand what fixing the batch size meant for those who use AutoForwardDiff(; chunksize=10), mainly the SciML ecosystem. It is actually used to guarantee type-stability when ForwardDiff.gradient is called without prior creation of a config object. This requires creating a Chunk{C} with a statically-known C=chunksize, which is what this PR ensures. As a result, picking a chunk size smaller than the input length will now cause errors which did not exist before.
In a way, you could think of the previous implementation as incorrect. You asked DI to use ForwardDiff with chunksize=10, and it ended up creating a Chunk{C} with C=min(chunksize,length(x)) (which was type-unstable).

In the case of Optimization.jl, the real issue is that the same backend can be used for all operators. But since Optimization.jl now uses the preparation mechanism of DI, specifying the chunk size has basically no efficiency benefits (I think). So I would recommend using the plain old AutoForwardDiff() for this package. Do you have a specific reason to specify chunk size in your application @baggepinnen?

There are additional subtleties with sparse operators that appeared as a result of #575, where the chunk size is chosen after coloring so you cannot rely on the length of the input to guide you. Maybe you ran into one of those with a very sparse Jacobian or Hessian?

baggepinnen · 2024-10-17T11:00:42Z

Do you have a specific reason to specify chunk size in your application @baggepinnen?

Yes, I want to keep the chunk size smaller than the automatically chosen one in order to minimize compile time, which may otherwise reach 10+ minutes for my problem

gdalle · 2024-10-17T11:04:24Z

I understand, and it is a consequence I didn't have in mind. As I said, this PR tries to fix chunk size semantics to be consistent with the rest of the ecosystem. I asked the question on Slack before merging it, and the consensus on the SciML side (to which Optimization.jl belongs) seemed indisputable.

gdalle · 2024-10-17T11:09:13Z

Again, the fact that this happened in a patch release of DI is my fault, and arguably not optimal (unless you consider the previously wrong semantics to be a bug). But the semantics that are in place now seem to be the right ones.
The main issue that remains is sparse AD, because the compressed input dimension is only known after coloring, and usually much smaller than length(x). So there is no good way to set a chunk size at the moment, and I have opened #593 to keep track.

baggepinnen · 2024-10-17T12:02:29Z

I asked the question on Slack before merging it, and the consensus on the SciML side (to which Optimization.jl belongs) seemed indisputable.

This SciML slack is dominated by academics and students to whom stability and maturity of a software ecosystem matters very little. Industrial users, where a team may have few julia experts and tight timelines, breakage like this is seen as a red flag and reasons to not use a technology.

In this particular case, I caught the break before our customers were exposed to it, but us shipping code and customers experiencing breakage when they perform an update that should not break anything looks bad, and is a common complaint I hear about the julia ecosystem in general.

I don't want to give you too much headache here, you're doing a fantastic job with DI.jl! Julia is also making it very hard to nail down interfaces, causing unintended reliance on something poorly defined almost inevitable.

gdalle · 2024-10-17T12:13:14Z

I understand, and I'm grateful for the feedback. To be fair we're also in a corner case of SemVer, where it could be considered either a bug fix or a breaking change. But next time I'll consider the lesson learned and tag a breaking release instead of relying on bad interface definitions.
For dense AD, the solution is to always pick a chunk size smaller than the input. For sparse AD, as explained in #593, it is much trickier, and I could use a hand in figuring things out.

Force constant chunk size when specified in ForwardDiff

9b5ac88

gdalle mentioned this pull request Oct 3, 2024

Document chunk size more precisely SciML/ADTypes.jl#90

Open

gdalle added 3 commits October 3, 2024 20:44

Fix

284a1d5

Fix

042c24e

Tests

f818238

gdalle marked this pull request as ready for review October 3, 2024 19:22

gdalle merged commit c47e26c into main Oct 3, 2024
74 of 75 checks passed

gdalle deleted the gd/enforce_chunk branch October 3, 2024 20:00

gdalle mentioned this pull request Oct 17, 2024

Setting ForwardDiff chunk size can error for sparse AD #593

Open

gdalle mentioned this pull request Nov 20, 2024

value_and_jacobian inference problem with AutoForwardDiff #632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force constant chunk size when specified in ForwardDiff #539

Force constant chunk size when specified in ForwardDiff #539

gdalle commented Oct 3, 2024 •

edited

Loading

codecov-commenter commented Oct 3, 2024 •

edited

Loading

baggepinnen commented Oct 17, 2024

gdalle commented Oct 17, 2024 •

edited

Loading

baggepinnen commented Oct 17, 2024

gdalle commented Oct 17, 2024

gdalle commented Oct 17, 2024

baggepinnen commented Oct 17, 2024

gdalle commented Oct 17, 2024

Force constant chunk size when specified in ForwardDiff #539

Force constant chunk size when specified in ForwardDiff #539

Conversation

gdalle commented Oct 3, 2024 • edited Loading

codecov-commenter commented Oct 3, 2024 • edited Loading

Codecov Report

baggepinnen commented Oct 17, 2024

gdalle commented Oct 17, 2024 • edited Loading

baggepinnen commented Oct 17, 2024

gdalle commented Oct 17, 2024

gdalle commented Oct 17, 2024

baggepinnen commented Oct 17, 2024

gdalle commented Oct 17, 2024

gdalle commented Oct 3, 2024 •

edited

Loading

codecov-commenter commented Oct 3, 2024 •

edited

Loading

gdalle commented Oct 17, 2024 •

edited

Loading