Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check prefix before complete loading when checking yaml and cwl tools #12604

Merged
merged 6 commits into from
May 25, 2022

Conversation

bernt-matthias
Copy link
Contributor

I was wondering why planemo ci_find_repos tools-iuc/ (and ci_find_tools) need so much time: approx (3min). Turns out that all json files (some of them seem to be quite big .. which is a problem on its own) in there are parsed and checked if they are tools. Essentially almost the complete run time is caused by json files in tools/hamronization and tools/hyphy.

With this change the runtime is approx 10sec.

Support for json has been added here 0da66ee (before this json was explicitly forbidden) and likely is desired:

  • Semantically it makes sense since json is a subset of yaml
  • I know that tools can be yaml (or json .. unsure) .. but I never have seen one in the wild or documentation on how to write yaml/json tools.

I'm pretty sure that the PR is unacceptable as-is .. but should serve as a starting point for discussion:

  • would it be sufficient to parse the header of the json / yaml files for the quick check if a file might be a tool?
  • do we really want json?

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

@github-actions github-actions bot added this to the 22.01 milestone Sep 30, 2021
@jmchilton
Copy link
Member

Is this coming through planemo/runnable.py::for_path on the slow walk through the code? I think it would be better to restrict the types of runnables you load in Planemo.

@@ -21,7 +21,7 @@
TOOL_REGEX = re.compile(r"<tool\s")
DATA_MANAGER_REGEX = re.compile(r"\stool_type=\"manage_data\"")

YAML_EXTENSIONS = [".yaml", ".yml", ".json"]
YAML_EXTENSIONS = [".yaml", ".yml"]
CWL_EXTENSIONS = YAML_EXTENSIONS + [".cwl"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does adding .json to CWL_EXTENSIONS in the next line to make up for this help anything or does that revert the performance. I would be fine with that change but I'm not sure it helps. I've seen CWL json files I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not have any effect on the performance since: looks_like_a_tool also rund looks_like_a_tool_cwl (because it is in BETA_TOOL_CHECKERS) which considers the CWL_EXTENSIONS:

for tool_checker in BETA_TOOL_CHECKERS.values():

@bernt-matthias
Copy link
Contributor Author

Is this coming through planemo/runnable.py::for_path on the slow walk through the code? I think it would be better to restrict the types of runnables you load in Planemo.

No, this function seems not to be called at all during ci_find_repos or ci_find_tools

The trace is:

  • planemo/tools.py::yield_tool_sources_on_paths
  • planemo/tools.py::yield_tool_sources
  • planemo/tools.py::load_tool_sources_from_path
  • galaxy/tool_util/loader_directory.py::load_tool_sources_from_path
  • galaxy/tool_util/loader_directory.py::_load_tools_from_path
  • galaxy/tool_util/loader_directory.py::find_possible_tools_from_path
  • galaxy/tool_util/loader_directory.py::looks_like_a_tool
  • galaxy/tool_util/loader_directory.py::looks_like_a_tool_yaml
  • galaxy/tool_util/loader_directory.py::is_a_yaml_with_class

@nsoranzo nsoranzo changed the title only anncept yaml extensions as yaml extensions Only accept yaml extensions as yaml extensions Oct 1, 2021
@bernt-matthias bernt-matthias changed the title Only accept yaml extensions as yaml extensions Only accept yaml extensions as tool extensions Oct 1, 2021
@bernt-matthias bernt-matthias changed the title Only accept yaml extensions as tool extensions Only accept yaml as tool extensions Oct 1, 2021
@bernt-matthias bernt-matthias changed the title Only accept yaml as tool extensions Only accept yaml as additional tool extensions Oct 1, 2021
@bgruening bgruening requested a review from jmchilton October 23, 2021 22:08
@mvdbeek mvdbeek modified the milestones: 22.01, 22.05 Jan 14, 2022
@bernt-matthias
Copy link
Contributor Author

From the committers meeting: This should be implemented in a different way: e.g. by just checking header lines of potential tools.

@bernt-matthias
Copy link
Contributor Author

As a side node the tool parser factory only considers yml:

if config_file.endswith(".yml"):

@bernt-matthias
Copy link
Contributor Author

Also this logger output is pretty confusing:

Loading tool from YAML - this is experimental - tool will not function in future.

So, we are sure that YAML tools will never work?

@bernt-matthias bernt-matthias force-pushed the topic/yaml-extensions branch from cdf9253 to 4a257e8 Compare March 23, 2022 11:55
@bernt-matthias
Copy link
Contributor Author

Implemented a prefix check for yaml and cwl tools.

For the IUC repo

planemo ci_find_repos before 125.35s after 5.01s
planemo ci_find_tool before 182.35s after 6.55s

Also checked that the output is equal.

@bernt-matthias bernt-matthias changed the title Only accept yaml as additional tool extensions Check prefix before complete loading when checking yaml and cwl tools Mar 23, 2022
@bernt-matthias bernt-matthias force-pushed the topic/yaml-extensions branch 2 times, most recently from 3357036 to 86d7467 Compare March 23, 2022 12:17
lib/galaxy/tool_util/loader_directory.py Outdated Show resolved Hide resolved
lib/galaxy/tool_util/loader_directory.py Outdated Show resolved Hide resolved
Co-authored-by: Marius van den Beek <m.vandenbeek@gmail.com>
@mvdbeek mvdbeek merged commit ad87303 into galaxyproject:dev May 25, 2022
@bernt-matthias bernt-matthias deleted the topic/yaml-extensions branch June 16, 2022 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants