Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression of sequences in postgres DB #497

Closed
theosanderson opened this issue Nov 7, 2023 · 2 comments · Fixed by #639
Closed

Compression of sequences in postgres DB #497

theosanderson opened this issue Nov 7, 2023 · 2 comments · Fixed by #639
Labels
backend related to the loculus backend component discussion Open questions

Comments

@theosanderson
Copy link
Member

A SARS-CoV-2 genome is 30kb. We may have 16 million of them, which would be ~480 GB. But they are highly compressible. We planned to do this by embedding the reference genome into the compression dictionary if I understood right. We should figure out where this lies on our roadmap, assuming it isn't currently implemented.

@theosanderson theosanderson changed the title Compression of postgres DB Compression of sequences in postgres DB Nov 7, 2023
@fengelniederhammer fengelniederhammer added discussion Open questions backend related to the loculus backend component labels Nov 8, 2023
@TobiasKampmann
Copy link
Contributor

TobiasKampmann commented Dec 4, 2023

Will be resolved for submitted unaligned nucleotide sequences by #577

@TobiasKampmann
Copy link
Contributor

compress processed data similar to original data in #577

@TobiasKampmann TobiasKampmann linked a pull request Dec 7, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend related to the loculus backend component discussion Open questions
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants