Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ISD Compression #606

Merged
merged 9 commits into from
Jun 4, 2024
Merged

Adds ISD Compression #606

merged 9 commits into from
Jun 4, 2024

Conversation

amystamile-usgs
Copy link
Contributor

closes #604

Licensing

This project is mostly composed of free and unencumbered software released into the public domain, and we are unlikely to accept contributions that are not also released into the public domain. Somewhere near the top of each file should have these words:

This work is free and unencumbered software released into the public domain. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain.

  • I dedicate any and all copyright interest in this software to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

@amystamile-usgs amystamile-usgs requested a review from jlaura May 29, 2024 19:03
ale/isd_generate.py Outdated Show resolved Hide resolved
ale/isd_generate.py Outdated Show resolved Hide resolved
ale/isd_generate.py Outdated Show resolved Hide resolved
@jlaura
Copy link
Collaborator

jlaura commented May 30, 2024

LGTM! General question - does one have to read/write the intermediary uncompressed file or can one just work with the compressed ISD. It looks like the workflow right now would be write uncompressed, compress. Then read compressed, uncompress and write to disk, do something with the data. Ideally, could one just instantiate a sensor model using the compressed ISD without having to use all the disk space to uncompress? Maybe that is a knoten issue?

@Kelvinrr
Copy link
Collaborator

Kelvinrr commented May 30, 2024

@jlaura It kind of does seem like a knoten/SET issue, but if we are going to support compressed ISDs we might as well provide a basic API so people dont have to implement their own. Maybe a follow up task could be to add a CompressedJson object that subclasses dict and read/writes in the compressed format. Probably low priority since open compressed, work, write compressed kinda matches how one would have to work with JSON anyways.

Something else to consider: add a flag to generate_isd to export compressed ISDs?

ale/isd_generate.py Outdated Show resolved Hide resolved
jlaura
jlaura previously approved these changes May 30, 2024
Copy link
Collaborator

@jlaura jlaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying questions for me. If they are pedantic, ignore them and feel free to merge!

help="Output a compressed isd json file with .br file extension. "
"Ale uses the brotli compression algorithm. "
"To decompress an isd file run: python -c \"import ale.isd_generate as isdg; isdg.decompress_json('/path/to/isd.br')\""
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! So using this, one could write directly to compressed. Then a SET could read the compressed ISD straight to memory and use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything in ale takes in an isd file. I'm working on having knoten take in these compressed isds directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I didn't mean for this comment to spill into more work over there! Much appreciated.


os.rename(uncompressed_json_file, os.path.splitext(uncompressed_json_file)[0] + '.br')

return os.path.splitext(uncompressed_json_file)[0] + '.br'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to write and then re-write? No way from brotli to compress into memory to skip the intermediary file? Perhaps using something like StringIO or cStringIO (both standard library afaik.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this function to accept json data directly instead of reading from a file, eliminating the need to write the json data to a file.

with open(compressed_json_file, 'rb') as f:
data = f.read()
with open(compressed_json_file, 'wb') as f:
f.write(brotli.decompress(data))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by the decompression here. Is this read the compressed and then write the decompressed to disk? Would it make more sense to simply pipe the decompressed to stdout? That would be more bash-like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for situations where a user wants to revert to a readable JSON file saved locally. We could add a function to read the JSON from binary format, if you think that would be appropriate.

@Kelvinrr Kelvinrr merged commit f754b5f into DOI-USGS:main Jun 4, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: Support for ISD Compression
3 participants