-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnostics tool for ill-posed constraints #1454
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1454 +/- ##
==========================================
+ Coverage 76.84% 76.86% +0.01%
==========================================
Files 376 376
Lines 61091 61258 +167
Branches 13505 13549 +44
==========================================
+ Hits 46948 47084 +136
- Misses 11762 11784 +22
- Partials 2381 2390 +9 ☔ View full report in Codecov by Sentry. |
@Robbybp , is there any possibility of combining a tool like this with block triangularization? My intuition is that a constraint in which catastrophic cancellation occurs (taking the difference between two large numbers to get a small number) might show up differently in the block structure than one in which it does not (adding a small number to a large number to get another large number), even though individual constraint might be identical. |
@dallan-keylogic My only initial thought is that a "catastrophic cancellation" has different implications depending on where it appears appears in the block triangular decomposition. If it is in a diagonal block, it could be a problem (as we're relying on that term for nonsingularity). Otherwise, it is fine, as the entry could be zero and the matrix's singularity would not change. |
@dallan-keylogic I would say that scaling |
@Robbybp Whether or not we should scale by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty reasonable. Some questions and comments (that should probably be addressed)
idaes/core/util/model_diagnostics.py
Outdated
for i in mm: | ||
mismatch.append(f"{c.name}: {i}") | ||
for i in cc: | ||
cancellation.append(f"{c.name}: {i}") | ||
if k: | ||
constant.append(c.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happier if the "collect" routine didn't do formatting (conversion to a string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one would take some more work and require changing other collect
methods as well. I think that might be better as a separate issue/PR.
if ( | ||
hasattr(node, "is_named_expression_type") | ||
and node.is_named_expression_type() | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have stumbled across a syntax that is a bit more concise. Not sure if you want to use it, though:
if getattr(node, "is_named_expression_type", bool)():
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll keep the current form as it is a little easier to understand what it is doing.
idaes/core/util/model_diagnostics.py
Outdated
# We will check for cancelling terms here, rather than the sum itself, to handle special cases | ||
# We want to look for cases where a sum term results in a value much smaller | ||
# than the terms of the sum | ||
sums = self._sum_combinations(d[0]) | ||
if any(i <= self._sum_tol * max(d[0]) for i in sums): | ||
cancelling.append(str(node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you are doing here, but wouldn't this complain loudly about the objective for every parameter estimation problem (MSE) if the problem actually solved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my main concerns with this tool is its inability to distinguish between true cases of catastrophic cancellation and benign situations, where, for example, we have a heat of adsorption term in a column which will be close to except near the breakthrough point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an easy way to test for that however? Note that this tool is only used for Cautions in the main toolbox, with the implication that these might be issues (but not guaranteeing that).
I.e., this is intended to be a simple check to find potential issues, but the user will have to look into them all further to decide if they are critical or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's an easy way to do it without assuming a well-scaled system. In a well-scaled system, it would show up in the SVD with extremely large or extremely small singular values.
However, once you get to a high enough noise/signal ratio, you start to question about whether including the tool in report_numerical_issues
is worthwhile or whether it is distracting the user from more fruitful avenues of diagnostics. I suppose we can just pull it from report_numerical_issues
without a deprecation cycle if it proves to not be useful, so we can try it out and see how users find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsiirola This walker was written specifically with Constraints in mind, so I had not considered that. The check for mismatched terms would make sense for an Objective however, so this could be extended to handle them as well. However, how would the expression walker know if it was dealing with an Objective or a Constraint (the input argument is the expression, not the component)?
idaes/core/util/model_diagnostics.py
Outdated
# (In)equality expressions are a special case of sum expressions | ||
# We can start by just calling the method to check the sum expression | ||
vals, mismatch, cancelling, const = self._check_sum_expression(node, child_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to negate the values for the second argument in the relational expression before treating it like a sum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had forgotten to make that correction - thank you.
idaes/core/util/model_diagnostics.py
Outdated
node_type_method_map = { | ||
EXPR.EqualityExpression: _check_equality_expression, | ||
EXPR.InequalityExpression: _check_equality_expression, | ||
EXPR.RangedExpression: _check_sum_expression, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need special handling for Ranged (like Equality / Inequality)? I would probably define a
def _check_ranged(self, node, child_data):
lhs_vals, lhs_mismatch, lhs_cancelling, lhs_const = self._check_equality(node, child_data[:2])
rhs_vals, rhs_mismatch, rhs_cancelling, rhs_const = self._check_equality(node, child_data[1:])
# Then merge the results and return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've fixed this - it would be good if you could double check my logic however.
idaes/core/util/model_diagnostics.py
Outdated
def _check_product(self, node, child_data): | ||
mismatch, cancelling, const = self._perform_checks(node, child_data) | ||
|
||
val = self._get_value_for_sum_subexpression( | ||
child_data[0] | ||
) * self._get_value_for_sum_subexpression(child_data[1]) | ||
|
||
return [val], mismatch, cancelling, const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all of your nonlinear handlers could be replaced by a single callback:
def _check_general_expr(self, node, child_data):
mismatch, cancelling, const = self._perform_checks(node, child_data)
val = node._apply_operation(list(map(self._get_value_for_sum_subexpression, child_data)))
return [val], mismatch, cancelling, const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - I did not know I could do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not review the tests cases yet because I do have some suggestions for changes in the main file that may require changes in test cases.
idaes/core/util/model_diagnostics.py
Outdated
def _sum_combinations(self, values_list): | ||
sums = [] | ||
for i in chain.from_iterable( | ||
combinations(values_list, r) for r in range(2, len(values_list) + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 is a magic number! Why 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are looking for any combination of terms which cancel, thus the minimum number of terms to consider is 2 (a single term cannot cancel with itself). I can add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this block of code does is go through combinations of terms in an expression first in groups of 2, then in groups of 3, all the way up to groups of len(values_list)
. So if we have the sum expression a+b+c+d
, we first iterate through (a, b), (a, c), (a, d), (b, c), (b, d), (c, d)
, then iterate through (a, b, c,), (a, b, d), (a, c, d), (b, c, d)
, then (a,b,c,d)
.
However, the number of terms you're checking grows exponentially in expression size. In particular, if len(values_list) == m
, you'll be checking 2 ** m - m -1
terms. I expect that this sort of check will take an extremely long time on any model with Expressions
of any significant length, much less an extreme chonker like eNRTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably make this more efficient by:
- Stripping any term with a value of 0
- Breaking the loop at the first failure - we do not count how many cancellations there are, just if there is at least 1.
@mrmundt I think I have addressed all of your comment. If you have time to do another review, it would be appreciated. |
@mrmundt Thank you - I fixed the left over print statements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have issues with one thing which is "magic numbers" - otherwise it looks good!
idaes/core/util/model_diagnostics.py
Outdated
for j in child_data[0][0]: | ||
vals.append(-j) | ||
mdata.append((vals, child_data[0][1])) | ||
mdata.append(child_data[1]) | ||
|
||
# Next, call the method to check the sum expression | ||
vals, const, _ = self._check_sum_expression(node, mdata) | ||
|
||
# Next, we need to check for canceling terms. | ||
# In this case, we can safely ignore expressions of the form constant = sum() | ||
# We can also ignore any constraint that is already flagged as mismatched | ||
# We will also ignore any constraints where a == b and neither a nor b are sum expressions | ||
if not child_data[0][2] and not child_data[1][2]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am... mildly irked about all of the magic numbers here. Perhaps it is more well-known than me what is inside child_data
, but as someone who doesn't know, what is in [0][0]
and why does it need to be negated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing as I had to remind myself what these meant, the point is well taken. I have added some comments that hopefully explain what is happening.
…/idaes-pse into cons_mag_diagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally content with this now!
Waiting on Pyomo/pyomo#3376
Summary/Motivation:
As part of working on the new scaling tools, I realised there are some simple checks we can do for detecting poorly-posed constraints that could cause scaling issues. This PR adds a new expression walker that looks for the following signs of poor scaling in constraints:
constant == sum()
).Changes proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution: