Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluator: Allow making arbitrary file checks on the project's source tree #5679

Closed
wants to merge 12 commits into from

Conversation

fviernau
Copy link
Member

See individual commits.

Note: The project's source tree will be cloned in the evaluator only if the rules needed it / if the file checks are actually used.

This implements parts of #5621.

@fviernau fviernau requested a review from a team as a code owner August 24, 2022 10:35
@fviernau fviernau force-pushed the evaluator-enable-arbitrary-file-checks branch from 2eeae8c to 6525ee2 Compare August 24, 2022 10:38
@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #5679 (48a0c9d) into main (5ce9aaf) will not change coverage.
The diff coverage is n/a.

❗ Current head 48a0c9d differs from pull request most recent head 47de9f5. Consider uploading reports for the commit 47de9f5 to get more accurate results

@@            Coverage Diff            @@
##               main    #5679   +/-   ##
=========================================
  Coverage     65.54%   65.54%           
  Complexity     2212     2212           
=========================================
  Files           271      271           
  Lines         16600    16600           
  Branches       3473     3473           
=========================================
  Hits          10881    10881           
  Misses         4575     4575           
  Partials       1144     1144           
Flag Coverage Δ
funTest-analyzer-docker 74.58% <0.00%> (ø)
test 32.01% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@mnonnenmacher mnonnenmacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use different prefixes for the example commits: "example.rules", "examples.rules", "example.rules.kts", please align.

@fviernau fviernau force-pushed the evaluator-enable-arbitrary-file-checks branch 2 times, most recently from 3d921be to 09d7f45 Compare August 24, 2022 20:00
import org.ossreviewtoolkit.model.config.DownloaderConfiguration
import org.ossreviewtoolkit.utils.ort.createOrtTempDir

class SourceTree private constructor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this new class? Can't we simply use the existing WorkingTree instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added this class will be more obvious when you look at what get's added in the following commits.
That's all IMO evaluator specific logic. Basically helper functions to use for implementing RuleMatchers and / or policy rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the upcoming changes, but I'm still not convinced. Looks like the helper functions would as well operate on a WorkingTree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the upcoming changes, but I'm still not convinced.

let me just try on more time to convince you:

I believe even if these helper functions could all be implemented in WorkingTree encapsulation does make sense, because

  1. It is impossible that API change in the analyzer (working tree) breaks a policy rule implementation. So, the rules API is
    independent from working tree.
  2. The API can be designed to reflect exactly the requirements from the rules. This is exactly what is needed to arrive at easy to read rules. In particular from my experience it's hard to foresee the exact API needs when implementing new policy rules use cases, and my gut feeling is that this encapsulation will make more and more sense the more functions are being added.
  3. The functions can be implement based on the working tree, but it is not required. The encapsulation allows
    changing the implementation. For example a file existence check could be changed to not work on the cloned source but
    on the ScanResult, if that contained a full list of all files.
  4. The helper functions are basically factored out logic from the rule matchers added to OrtResult. Exposing that logic is
    important to expose because the logic inside rule matchers is not re-usable. For example the rule matcher hasFile()
    contains logic to find the actual files. If you want to re-use the logic to find the files, the logic cannot be put into hasFile()
    matcher but needs to be exposed somewhere. That somewhere is the SourceTree class I've added. I don't see why that somewhere should be WorkingTree.

Is any of the above points somewhat more convincing?

BTW.: I'm planning to add matchers for the commit history, and therefore create (not expose) a working tree instance inside the sourcetree class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyhow, as I plan to make a couple of further PRs on top, can we just keep it for now and make that decision when that work is done (guess we know more by then)? If it's really not needed then I'll refactor it away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just keep it for now and make that decision when that work is done

I'd actually prefer to get that sorted out now. IMO we had to much of "I need this urgently now so let's merge it"-style of changes recently, and ORT's code base is starting to suffer from suboptimal code design decisions (with only a single use-case in mind).

Copy link
Member Author

@fviernau fviernau Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually prefer to get that sorted out now.

Ok, then let's sort it out asap.

IMO we had to much of "I need this urgently now so let's merge it"-style of changes recently, and ORT's code base is starting to suffer from suboptimal code design decisions (with only a single use-case in mind).

Maybe I wasn't clear enough. I said that I would do the refactoring in a later change if we by then consider it reasonable. So, the code base wouldn't suffer from it in the long run. Did you get that?

I prefer a rather iterative approach as I believe the proposed refactoring is a bit too early and can be done with gained knowledge a couple of days later in a following iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the code base wouldn't suffer from it in the long run. Did you get that?

I did get that. But such promises were made in the past, and then that planned refactoring never happened, or only after a very long time.

I prefer a rather iterative approach

Iterative is fine, but iterative should also mean that it's already going into the right direction, and not introducing stuff that gets removed later on (where foreseeable).

What I'd like to discuss first with you and at least also @mnonnenmacher is whether providing access to the source / working tree should really become a "first class citizen" of the evaluator as currently sketched. When I first read about #5621, I was hoping that would be mostly implemented by helper functions implemented in EPAM's rules.kts itself, and not so much in ORT upstream.

evaluator/src/main/kotlin/SourceTree.kt Show resolved Hide resolved
evaluator/src/test/kotlin/OrtResultRuleTest.kt Outdated Show resolved Hide resolved
@fviernau fviernau force-pushed the evaluator-enable-arbitrary-file-checks branch from 09d7f45 to 7c95528 Compare August 24, 2022 21:02
Move `getRepositoryPath()` to `OrtResultExtensions` to enable re-use in
an upcoming change.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
… tree

Allow access to the project's source tree in order to enable doing
arbitrary checks on files like the ones in repolinter [1]. Adding
arbitrary file checks do make the rules API more powerful, as it allows
highly customizable checks which can automate parts of the checks
typically done prior to open sourcing a project.

Not using a third-party tool makes sense as it is simpler to use,
because it sticks to a single way (rules.kts) for writing the policy rules.

Note that the implementation intentionally is limited to the project's
source tree, e.g. it does not work for dependency source, because the
doing such checks throught the dependency tree does not have obvious
need and is not feasible anyway in terms of exection time.

[1] https://github.com/todogroup/repolinter

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Some policy rules do not only require the result of hasFile(), but also
the actual matching files if any. The same applies to hasDirectory().
So, expose the logic for finding the files and directories.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@fviernau fviernau force-pushed the evaluator-enable-arbitrary-file-checks branch from 7c95528 to 47de9f5 Compare August 24, 2022 21:39
@fviernau fviernau changed the title Evaluator: Allow making arbitrary file checks on the project's source tree ON HOLD: Evaluator: Allow making arbitrary file checks on the project's source tree Sep 7, 2022
@fviernau
Copy link
Member Author

fviernau commented Sep 7, 2022

I've created a new PR which is more minimal: #5754.
Let's continue with that PR to agree on the concept and get it merged.

@sschuberth sschuberth added the on hold Pull requests that cannot currently be merged label Sep 7, 2022
@sschuberth sschuberth changed the title ON HOLD: Evaluator: Allow making arbitrary file checks on the project's source tree Evaluator: Allow making arbitrary file checks on the project's source tree Sep 7, 2022
@fviernau
Copy link
Member Author

@fviernau fviernau closed this Sep 12, 2022
@fviernau fviernau deleted the evaluator-enable-arbitrary-file-checks branch September 12, 2022 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on hold Pull requests that cannot currently be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants