-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force constant chunk size when specified in ForwardDiff #539
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #539 +/- ##
==========================================
- Coverage 98.63% 98.57% -0.07%
==========================================
Files 106 107 +1
Lines 4606 4620 +14
==========================================
+ Hits 4543 4554 +11
- Misses 63 66 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@Vaibhavdixit02 this breaking change appears to make it difficult to specify the chunk size in Optimization.jl, since the same AD backend is used for all funcitons, gradient, constraint jacobian, lagrangian hessian. I just had code broken by updating patch releases due to this. Ideally, this stance
would be avoided, relying on poorly defined semantics to justify making breaking changes in a patch release makes it difficult for users to guard themselves against breakage. |
I completely agree, which is why I opened SciML/ADTypes.jl#90 to keep track and clarify the semantics. I decided that such a change wasn't worth the ecosystem-wide update, but I apologize because I know that some code was broken as a result. Let me explain why this change occurred. Prior to the present PR, I did not understand what fixing the batch size meant for those who use In the case of Optimization.jl, the real issue is that the same backend can be used for all operators. But since Optimization.jl now uses the preparation mechanism of DI, specifying the chunk size has basically no efficiency benefits (I think). So I would recommend using the plain old There are additional subtleties with sparse operators that appeared as a result of #575, where the chunk size is chosen after coloring so you cannot rely on the length of the input to guide you. Maybe you ran into one of those with a very sparse Jacobian or Hessian? |
Yes, I want to keep the chunk size smaller than the automatically chosen one in order to minimize compile time, which may otherwise reach 10+ minutes for my problem |
I understand, and it is a consequence I didn't have in mind. As I said, this PR tries to fix chunk size semantics to be consistent with the rest of the ecosystem. I asked the question on Slack before merging it, and the consensus on the SciML side (to which Optimization.jl belongs) seemed indisputable. |
Again, the fact that this happened in a patch release of DI is my fault, and arguably not optimal (unless you consider the previously wrong semantics to be a bug). But the semantics that are in place now seem to be the right ones. |
This SciML slack is dominated by academics and students to whom stability and maturity of a software ecosystem matters very little. Industrial users, where a team may have few julia experts and tight timelines, breakage like this is seen as a red flag and reasons to not use a technology. In this particular case, I caught the break before our customers were exposed to it, but us shipping code and customers experiencing breakage when they perform an update that should not break anything looks bad, and is a common complaint I hear about the julia ecosystem in general. I don't want to give you too much headache here, you're doing a fantastic job with DI.jl! Julia is also making it very hard to nail down interfaces, causing unintended reliance on something poorly defined almost inevitable. |
I understand, and I'm grateful for the feedback. To be fair we're also in a corner case of SemVer, where it could be considered either a bug fix or a breaking change. But next time I'll consider the lesson learned and tag a breaking release instead of relying on bad interface definitions. |
Warning
This change is technically non-breaking due to ambiguous semantics in ADTypes, but it will make your code error if you specify an
AutoForwardDiff
chunk size to be larger than the input length. Such code did not error before.DI extensions
AutoForwardDiff{C}
, always build aChunk{C}
without caring about the length ofx
. This will trigger errors whenC > length(x)
.DIT source
DI tests
AutoForwardDiff
with fixed chunk size