Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force forward slashes when saving artifacts #2195

Merged
merged 7 commits into from
Jan 4, 2024

Conversation

christianversloot
Copy link
Contributor

@christianversloot christianversloot commented Dec 22, 2023

Describe changes

When saving visualizations into S3 artifact stores, Windows-based pipeline runs generate backward slash based URIs because that's how os.path.join works on Windows. An example would be s3://bucket\\visualize_principal_components\\output\\1a6d995d-1417-4c8c-89d0-99d0e5897d8e\\image_file.PNG. Following that, loading the artifact from the artifact store fails because of a malformed key.

This PR attempts fixing the issue by forcing all paths to have forward slashes when saving artifacts.

  • All URIs are forced to have forward slashes: those that are passed to save_artifact and when none are passed.
  • Windows and Linux based file systems seem to disallow creating files that have backslashes in their name, meaning that the risk of any failure modes being introduced there is very low to nonexistent (checked the approach with @strickvl).

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

Summary by CodeRabbit

  • Bug Fixes
    • Improved compatibility of artifact saving across different operating systems by ensuring URIs use forward slashes.

Copy link
Contributor

coderabbitai bot commented Dec 22, 2023

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The changes involve updates to several materializer modules within ZenML. These updates primarily focus on ensuring cross-platform compatibility by replacing backslashes with forward slashes in URIs and file paths. Additionally, a new assertion has been added to the testing module to validate URIs.

Changes

File Path Change Summary
src/zenml/.../utils.py
.../deepchecks_results_materializer.py
.../facets_materializer.py
.../pillow_image_materializer.py
.../whylogs_materializer.py
.../numpy_materializer.py
.../pandas_materializer.py
.../structured_string_materializer.py
tests/unit/test_general.py
Replaced backslashes with forward slashes in URIs and file paths. Added validation check for backslashes in URIs.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • You can tag CodeRabbit on specific lines of code or entire files in the PR by tagging @coderabbitai in a comment. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • You can tag @coderabbitai in a PR comment and ask questions about the PR and the codebase. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid.
    • @coderabbitai read the files in the src/scheduler package and generate README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@strickvl strickvl added the bug Something isn't working label Dec 22, 2023
@strickvl
Copy link
Contributor

@coderabbitai review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yaml

Commits Files that changed from the base of the PR and between 898e030 and 61ec65f.
Files selected for processing (1)
  • src/zenml/artifacts/utils.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • src/zenml/artifacts/utils.py

@christianversloot
Copy link
Contributor Author

christianversloot commented Dec 22, 2023

Unfortunately that proved to be too simple: replacing the URI at that location would only result in replacing the base path, not the eventual join between the base path and the file name.

This, unfortunately, required adding replaces at materializer level.

I tested with a locally running zenml-dashboard and 0.53.1 (with the diff), in a stack with a DigitalOcean based S3 bucket configured, and these changes result in saving files to correct paths and having them displayed in the Visualizations tab when running the pipeline from raw Windows.

@strickvl
Copy link
Contributor

Nice! This is potentially a breaking change, I think, specifically for anyone who already has an artifact saved that includes a backslash?

@christianversloot
Copy link
Contributor Author

Two more changes:

  • Added the backslash conversion to materializer_object.uri as well, to ensure that the UI displays the path correctly:

image

  • Added an assertion to the general unit tests for materializers writing visualizations, to assert that \\ is not present in any of the URIs in other words requiring any materializer that writes visualizations to add backslash conversion.

@christianversloot
Copy link
Contributor Author

Nice! This is potentially a breaking change, I think, specifically for anyone who already has an artifact saved that includes a backslash?

Not sure if breaking. Old pipeline runs with failing paths keep failing after applying this PR, only new ones are saved correctly. For example, this path was saved with the original PR contents (before adding backslash conversion to the materializers themselves); even though it is now working, the original path keeps failing, so strictly speaking no backwards incompatibility there:

File 's3://BUCKET/visualize_raw_data/output/9c1e5696-f324-41db-a3f0-23eb6cb6e590\image_file.PNG' does not exist in artifact store 'store'.

If you'd like to correct these as well (which I'm not sure of) we'll probably need to add a database migration which applies backslash conversion to the paths of these artifacts / visualizations as well.

@strickvl
Copy link
Contributor

strickvl commented Dec 22, 2023

@christianversloot no you're right. I think no need to retrospectively migrate / correct things that are still working. They're just saved in the DB using the 'incorrect' naming but they'd still be accessible etc as you say.

@strickvl
Copy link
Contributor

strickvl commented Jan 2, 2024

@coderabbitai review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yaml

Commits Files that changed from the base of the PR and between 61ec65f and e05bac2.
Files selected for processing (9)
  • src/zenml/artifacts/utils.py (2 hunks)
  • src/zenml/integrations/deepchecks/materializers/deepchecks_results_materializer.py (1 hunks)
  • src/zenml/integrations/facets/materializers/facets_materializer.py (1 hunks)
  • src/zenml/integrations/pillow/materializers/pillow_image_materializer.py (1 hunks)
  • src/zenml/integrations/whylogs/materializers/whylogs_materializer.py (1 hunks)
  • src/zenml/materializers/numpy_materializer.py (1 hunks)
  • src/zenml/materializers/pandas_materializer.py (1 hunks)
  • src/zenml/materializers/structured_string_materializer.py (1 hunks)
  • tests/unit/test_general.py (1 hunks)
Files skipped from review due to trivial changes (6)
  • src/zenml/integrations/deepchecks/materializers/deepchecks_results_materializer.py
  • src/zenml/integrations/facets/materializers/facets_materializer.py
  • src/zenml/integrations/pillow/materializers/pillow_image_materializer.py
  • src/zenml/integrations/whylogs/materializers/whylogs_materializer.py
  • src/zenml/materializers/numpy_materializer.py
  • src/zenml/materializers/structured_string_materializer.py
Additional comments: 3
tests/unit/test_general.py (1)
  • 93-98: The addition of the assertion to ensure that the uri does not contain backslashes is a good practice for maintaining cross-platform compatibility of file paths. This change aligns with the PR's objective to handle URIs correctly on Windows systems.
src/zenml/materializers/pandas_materializer.py (1)
  • 144-150: The modification to replace backslashes with forward slashes in the describe_uri within the save_visualizations method is correct and ensures that the file paths are compatible across different operating systems. This change is in line with the PR's objective to fix path issues on Windows.
src/zenml/artifacts/utils.py (1)
  • 146-154: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [122-154]

The changes to the save_artifact function, which include forcing URIs to have forward slashes and updating the materializer object's URI, are correct and align with the PR's objective to ensure that URIs are handled properly on Windows systems.

Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and the test failures are known.

@strickvl strickvl requested a review from safoinme January 3, 2024 09:16
@strickvl strickvl merged commit 3b38c07 into zenml-io:develop Jan 4, 2024
28 of 33 checks passed
@strickvl
Copy link
Contributor

strickvl commented Jan 4, 2024

@christianversloot thanks for handling this!

@christianversloot christianversloot deleted the bugfix/forwardslashes branch January 5, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants