Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrates CRM to use yaml.CSafeLoader over yaml.safe_load() #115

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

schuylermartin45
Copy link
Collaborator

@schuylermartin45 schuylermartin45 commented Sep 6, 2024

As @JeanChristopheMorinPerso has pointed out, we should be able to notice a performance improvement by using the PyYaml's CSafeLoader over the default Python implementation.

In our testing so far, there is a performance improvement, but it may not be obvious until you start bulk parsing a very, very large set of recipe files.

@schuylermartin45 schuylermartin45 requested a review from a team as a code owner September 6, 2024 20:36
@schuylermartin45
Copy link
Collaborator Author

Update: this makes a much more noticeable reduction in the time it takes to complete the GHA tests. It is harder to notice the difference locally when you have an Apple Silicon number of CPU cores.

@@ -97,10 +97,10 @@ def _v1_sub_jinja() -> None:
# then we fall back to performing JINJA substitutions.
try:
try:
output = cast(JsonType, yaml.safe_load(s))
output = yaml.load(s, Loader=yaml.CSafeLoader)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to do something like

try:
    from yaml import CSafeLoader as SafeLoader
except ImportError:
    from yaml import SafeLoader

to support cases where the C bindings are not available.

@dholth
Copy link
Contributor

dholth commented Sep 6, 2024

In conda we've switched over completely to ruamel.yaml... mainly because it is YAML 1.2 instead of 1.1, and has among other things "Only true and false strings are parsed as booleans (including True and TRUE); y, yes, on, and their negative counterparts are parsed as strings."

@schuylermartin45
Copy link
Collaborator Author

In conda we've switched over completely to ruamel.yaml... mainly because it is YAML 1.2 instead of 1.1, and has among other things "Only true and false strings are parsed as booleans (including True and TRUE); y, yes, on, and their negative counterparts are parsed as strings."

Last I looked, switching is not a fast drop-in replacement. I think we'll have to look into it further to decide if it is worth the change. Personally, I have not had the best developer experience with ruamel.

@schuylermartin45 schuylermartin45 merged commit dafada1 into main Sep 9, 2024
12 checks passed
@schuylermartin45 schuylermartin45 deleted the smartin_csafeloader_refactor branch September 9, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants