Adds ISD Compression #606

amystamile-usgs · 2024-05-29T19:03:24Z

closes #604

Licensing

This project is mostly composed of free and unencumbered software released into the public domain, and we are unlikely to accept contributions that are not also released into the public domain. Somewhere near the top of each file should have these words:

This work is free and unencumbered software released into the public domain. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain.

I dedicate any and all copyright interest in this software to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

ale/isd_generate.py

jlaura · 2024-05-30T18:16:13Z

LGTM! General question - does one have to read/write the intermediary uncompressed file or can one just work with the compressed ISD. It looks like the workflow right now would be write uncompressed, compress. Then read compressed, uncompress and write to disk, do something with the data. Ideally, could one just instantiate a sensor model using the compressed ISD without having to use all the disk space to uncompress? Maybe that is a knoten issue?

Kelvinrr · 2024-05-30T18:22:52Z

@jlaura It kind of does seem like a knoten/SET issue, but if we are going to support compressed ISDs we might as well provide a basic API so people dont have to implement their own. Maybe a follow up task could be to add a CompressedJson object that subclasses dict and read/writes in the compressed format. Probably low priority since open compressed, work, write compressed kinda matches how one would have to work with JSON anyways.

Something else to consider: add a flag to generate_isd to export compressed ISDs?

ale/isd_generate.py

jlaura

Clarifying questions for me. If they are pedantic, ignore them and feel free to merge!

jlaura · 2024-05-30T22:08:03Z

ale/isd_generate.py

+        help="Output a compressed isd json file with .br file extension. "
+             "Ale uses the brotli compression algorithm. "
+             "To decompress an isd file run: python -c \"import ale.isd_generate as isdg; isdg.decompress_json('/path/to/isd.br')\""
+    )


Cool! So using this, one could write directly to compressed. Then a SET could read the compressed ISD straight to memory and use it?

I don't think anything in ale takes in an isd file. I'm working on having knoten take in these compressed isds directly.

Cool! I didn't mean for this comment to spill into more work over there! Much appreciated.

jlaura · 2024-05-30T22:09:39Z

ale/isd_generate.py

+
+    os.rename(uncompressed_json_file, os.path.splitext(uncompressed_json_file)[0] + '.br')
+
+    return os.path.splitext(uncompressed_json_file)[0] + '.br'


Does this have to write and then re-write? No way from brotli to compress into memory to skip the intermediary file? Perhaps using something like StringIO or cStringIO (both standard library afaik.

I updated this function to accept json data directly instead of reading from a file, eliminating the need to write the json data to a file.

jlaura · 2024-05-30T22:12:40Z

ale/isd_generate.py

+    with open(compressed_json_file, 'rb') as f:
+        data = f.read()
+    with open(compressed_json_file, 'wb') as f:
+        f.write(brotli.decompress(data))


I am confused by the decompression here. Is this read the compressed and then write the decompressed to disk? Would it make more sense to simply pipe the decompressed to stdout? That would be more bash-like.

This is for situations where a user wants to revert to a readable JSON file saved locally. We could add a function to read the JSON from binary format, if you think that would be appropriate.

amystamile-usgs added 3 commits May 29, 2024 11:24

isd compression

bb9b9e8

added docs

c56638e

Added changelog

33d5d77

amystamile-usgs requested a review from jlaura May 29, 2024 19:03

Kelvinrr reviewed May 30, 2024

View reviewed changes

ale/isd_generate.py Outdated Show resolved Hide resolved

ale/isd_generate.py Outdated Show resolved Hide resolved

ale/isd_generate.py Outdated Show resolved Hide resolved

addressed PR feedback

ac5d28f

jlaura mentioned this pull request May 30, 2024

Enhancement: Compressed ISD API? #607

Open

amystamile-usgs added 3 commits May 30, 2024 12:13

Adds compress option to isd_generate

0674abc

updated changelog

cba9259

fixed typo

5d02d26

Kelvinrr reviewed May 30, 2024

View reviewed changes

ale/isd_generate.py Outdated Show resolved Hide resolved

add decompress instructions

4b3f9fc

jlaura previously approved these changes May 30, 2024

View reviewed changes

compress_json to take in json data instead of file

13e460e

amystamile-usgs dismissed jlaura’s stale review via 13e460e May 31, 2024 15:28

jlaura approved these changes May 31, 2024

View reviewed changes

Kelvinrr merged commit f754b5f into DOI-USGS:main Jun 4, 2024
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds ISD Compression #606

Adds ISD Compression #606

amystamile-usgs commented May 29, 2024

jlaura commented May 30, 2024

Kelvinrr commented May 30, 2024 •

edited

Loading

jlaura left a comment

jlaura May 30, 2024

amystamile-usgs May 31, 2024

jlaura May 31, 2024

jlaura May 30, 2024

amystamile-usgs May 31, 2024

jlaura May 30, 2024

amystamile-usgs May 31, 2024


		os.rename(uncompressed_json_file, os.path.splitext(uncompressed_json_file)[0] + '.br')

		return os.path.splitext(uncompressed_json_file)[0] + '.br'

Adds ISD Compression #606

Adds ISD Compression #606

Conversation

amystamile-usgs commented May 29, 2024

Licensing

jlaura commented May 30, 2024

Kelvinrr commented May 30, 2024 • edited Loading

jlaura left a comment

Choose a reason for hiding this comment

jlaura May 30, 2024

Choose a reason for hiding this comment

amystamile-usgs May 31, 2024

Choose a reason for hiding this comment

jlaura May 31, 2024

Choose a reason for hiding this comment

jlaura May 30, 2024

Choose a reason for hiding this comment

amystamile-usgs May 31, 2024

Choose a reason for hiding this comment

jlaura May 30, 2024

Choose a reason for hiding this comment

amystamile-usgs May 31, 2024

Choose a reason for hiding this comment

Kelvinrr commented May 30, 2024 •

edited

Loading