Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(SpdxExpression): Fix the DNF algorithm #6721

Closed
wants to merge 1 commit into from

Conversation

MarcelBochtler
Copy link
Member

Whenever a SpdxExpression contains an AND operator, the DNF needs to be calculated again.
For instance the correct DNF for a AND (b OR c OR d) is (a AND b) OR (a AND c) OR (a AND d). Before, the calculated DNF for this expression was (a AND (b OR c)) OR (a AND d) which is not a valid DNF.

@MarcelBochtler
Copy link
Member Author

MarcelBochtler commented Mar 20, 2023

I'm using this scanner result to check the performance impact of this change:
https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz

Results on my machine

Current main: 13 seconds
With fix: 1 min 14 seconds.

Therefore, the changes are currently not feasible :/

@MarcelBochtler MarcelBochtler force-pushed the dnf-fix branch 3 times, most recently from a5834b5 to 677c4d0 Compare March 20, 2023 12:42
@codecov
Copy link

codecov bot commented Mar 20, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (8a48a6d) 64.82% compared to head (58e0e6b) 64.82%.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #6721   +/-   ##
=========================================
  Coverage     64.82%   64.82%           
  Complexity     1938     1938           
=========================================
  Files           320      320           
  Lines         16024    16024           
  Branches       2281     2281           
=========================================
  Hits          10388    10388           
  Misses         4657     4657           
  Partials        979      979           
Flag Coverage Δ
funTest-non-docker 49.53% <ø> (ø)
test 31.06% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@MarcelBochtler MarcelBochtler force-pushed the dnf-fix branch 2 times, most recently from 70d48cf to 7685fc2 Compare March 21, 2023 08:05
@MarcelBochtler
Copy link
Member Author

MarcelBochtler commented Mar 21, 2023

Extracting the distributeAndOverOr function and only calling this recursively results in ~45 seconds evaluation time on my machine.
I doubt that this problem can be solved in a non-exponential time complexity (in the worst case).

@mnonnenmacher
Copy link
Member

mnonnenmacher commented Mar 21, 2023

Extracting the distributeAndOverOr function and only calling this recursively results in ~45 seconds evaluation time on my machine. I doubt that this problem can be solved in a non-exponential time complexity (in the worst case).

I guess you are right because according to Wikipedia the problem is np-complete: https://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm#Complexity (Cline-McCluskey calculates the minimal DNF which is not strictly required for our use case, but calculating a non-minimal DNF is probably not much less complex)

I see basically two options how we can proceed:

  • Check if implementing another algorithm like Quine-McCluskey provides significantly better performance.
  • Check why the algorithm fix affects performance so much: Do we calculate the DNF too often? Could we maybe cache the result? Do we create too many objects during the calculation? Is there one specific very long expression that takes so long or does the problem also exist for many simple expressions?

Whenever a `SpdxExpression` contains an `AND` operator, the DNF needs to
be calculated again.
For instance, the correct DNF for `a AND (b OR c OR d)` is
`(a AND b) OR (a AND c) OR (a AND d)`. Before, the calculated DNF for
this expression was `(a AND (b OR c)) OR (a AND d)`, which is not a
valid DNF.

Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.com>
@sschuberth
Copy link
Member

sschuberth commented Jan 11, 2024

I wonder whether disjunctiveNormalForm() is still needed after all... in #8106 it seems we could simply remove it.

@sschuberth
Copy link
Member

I'm using this scanner result to check the performance impact of this change:
https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz

@MarcelBochtler what operation exactly did you perform on the result for benchmarking?

@MarcelBochtler
Copy link
Member Author

I'm using this scanner result to check the performance impact of this change:
https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz

@MarcelBochtler what operation exactly did you perform on the result for benchmarking?

I simply ran the evaluator using the linked scan-result.json:

ort evaluate -i scan-result.json -o ort/ --license-classifications-file license-classifications.yml --rules-file evaluator.rules.kts

I cannot share the license-classifications and the evaluator rules.
In the evaluator.rules.kts we do use expression.validChoices() on every expression that offers a choice. Other than that, I don't see anything that could be related to this issue.

I wonder whether disjunctiveNormalForm() is still needed after all... in #8106 it seems we could simplify remove it.

Yes looks like it. Should I close this PR?

@sschuberth
Copy link
Member

Should I close this PR?

For the sake of cleaning things up a little, I'd say yes. I believe in the context of the current code base there now nothing anymore in this PR that we could take / reuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants