Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve manifest file writing performance #1184

Merged
merged 1 commit into from
Jul 19, 2022

Conversation

danlamanna
Copy link
Contributor

@danlamanna danlamanna commented Jul 12, 2022

This significantly speeds up and reduces the memory usage of the write_manifest_files task that takes place on publish.

The manifest files are now written in a streaming fashion, and the YAML output now uses the C implementation. Worst case time/memory usage went from ~2m20s and ~1gb ram to ~20s and ~25mb (constant time) memory usage.

As a result, I'm removing the manifest-worker since this task doesn't need to be treated differently than any others on the main worker.

Fixes #985

@mvandenburgh mvandenburgh added patch Increment the patch version when merged release Create a release when this pr is merged labels Jul 19, 2022
@mvandenburgh
Copy link
Member

@danlamanna you'll also need to remove the resource from Terraform (https://github.com/dandi/dandi-infrastructure/blob/master/terraform/api.tf#L56-L61) after this PR is released.

@danlamanna danlamanna merged commit 8f69c81 into master Jul 19, 2022
@danlamanna danlamanna deleted the fix-manifest-performance branch July 19, 2022 22:14
@dandibot
Copy link
Member

🚀 PR was released in v0.2.38 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Jul 19, 2022
brianhelba added a commit that referenced this pull request Nov 6, 2023
This follows #1184, but reduces unnecessary re-encodings from bytes to
strings. It also cleans up the "manifest.py" module, by making more of
the API private and adding additional type annotations.
brianhelba added a commit that referenced this pull request Nov 6, 2023
This follows #1184, but reduces unnecessary re-encodings from bytes to
strings. It also cleans up the "manifest.py" module, by making more of
the API private and adding additional type annotations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve memory performance of write_manifest_files task
3 participants