Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

96 receiving potentially large uploads #577

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

TobiasKampmann
Copy link
Contributor

@TobiasKampmann TobiasKampmann commented Nov 22, 2023

No description provided.

@TobiasKampmann TobiasKampmann linked an issue Nov 22, 2023 that may be closed by this pull request
@TobiasKampmann
Copy link
Contributor Author

Potentially relevant:
JetBrains/Exposed#871

@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch 3 times, most recently from d79b4c4 to d3608d4 Compare November 27, 2023 16:46
@TobiasKampmann TobiasKampmann marked this pull request as ready for review November 27, 2023 16:46
@TobiasKampmann TobiasKampmann marked this pull request as draft November 27, 2023 16:55
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from d3608d4 to bfcebde Compare November 27, 2023 16:59
@TobiasKampmann TobiasKampmann marked this pull request as ready for review November 27, 2023 17:40
Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, but some details probably need improvement.

@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from fd68633 to 816eaef Compare November 28, 2023 15:19
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from 816eaef to 35a8db1 Compare November 28, 2023 15:24
@TobiasKampmann
Copy link
Contributor Author

Follow-up tickets

loculus-project/security-issues#23
#604

@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from 6dd476c to e69e83c Compare November 28, 2023 23:05
@TobiasKampmann
Copy link
Contributor Author

supported compressed file formats, tested manually:

@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from e69e83c to 0f04ee4 Compare November 28, 2023 23:09
Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor things 👍

@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from be974fc to e2b1de8 Compare November 29, 2023 13:47
@TobiasKampmann
Copy link
Contributor Author

TobiasKampmann commented Nov 29, 2023

Some performance measure:

  • 1 mio identical sequence entries zstd compressed
  • very small file size + local upload = negligible upload time
  • batch insert into aux tables with 1000 entries (100 seems also fine and can reduce RAM footprint even more, but is probably a bit slower)

Result:
Screenshot from 2023-11-29 15-10-24
(Plateau at the end is pretty much exactly 1GB)

Roughly 4 min (where validating and copying into prod table is ~ 10 sec each)

@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from 8dd241b to d9e868c Compare November 29, 2023 16:19
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from d9e868c to 28dd626 Compare November 30, 2023 09:47
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from a31e910 to c336612 Compare December 1, 2023 21:50
@TobiasKampmann TobiasKampmann added the preview Triggers a deployment to argocd label Dec 1, 2023
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch 2 times, most recently from e290fe3 to 4f18267 Compare December 4, 2023 17:34
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from 4f18267 to fe54e89 Compare December 4, 2023 17:58
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from 03388c0 to 483ac99 Compare December 5, 2023 12:22
 * introduce two auxiliary tables to efficiently validate and merge metadata and sequence data
 * remove singleton SequenceEntriesTable and replace it with provided and cached `Table`s to facilitate compression of sequence data
 * de-compress sequence strings with custom dictionary when de-serializing
 * support for zstd, gzip, xz, lzma, zip, bzip2
@TobiasKampmann TobiasKampmann force-pushed the 96-receiving-potentially-large-uploads branch from 483ac99 to 8c428b1 Compare December 5, 2023 12:43
@TobiasKampmann TobiasKampmann merged commit 73c35eb into main Dec 5, 2023
11 checks passed
@TobiasKampmann TobiasKampmann deleted the 96-receiving-potentially-large-uploads branch December 5, 2023 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Receiving (potentially large) uploads
3 participants