Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Item] Asset size property (content length) #921

Closed
emmanuelmathot opened this issue Nov 23, 2020 · 12 comments · Fixed by #934
Closed

[Item] Asset size property (content length) #921

emmanuelmathot opened this issue Nov 23, 2020 · 12 comments · Fixed by #934

Comments

@emmanuelmathot
Copy link
Collaborator

Summary

Having a property to indicate the size of the asset

Context

It is sometimes useful to know the size of the assets referenced in the STAC item. For instance, to provision disk space or to avoid downloading the same asset doing a simple content length comparison.

Proposal

2 options

  1. An additional STAC Common Metadata with typical name such as size or content_length
  2. An dedicated or existing extension
@m-mohr
Copy link
Collaborator

m-mohr commented Nov 23, 2020

Sounds a bit related to the Checksum extension, I think it should start as an extension. I agree that this field can be useful for clients and users.

@matthewhanson
Copy link
Collaborator

The checksum is more useful for determining if something is a duplicate, I think size can be useful though.

I don't think there's an existing extension that would fit here, unless we combine it with checksum and rename.

It's not quite Common Metadata either, because that implies it can be an Item property (or an asset field), but this would be exclusive to assets (and possibly links), like the checksum extension.

@cholmes
Copy link
Contributor

cholmes commented Dec 1, 2020

I agree on starting as an extension. It does seem to group well with checksum, in an 'file info' extension or something? But if checksum is in use and we don't want to change its extension then it could just be a small dedicated extension.

@matthewhanson
Copy link
Collaborator

We could also create a new file-info extension that includes checksums and deprecate the existing checksum extension.

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 11, 2020

I've not seen it used very widely in public, maybe 1 or 2 collections max.

I should finally start crawling collections so that we have statistics for such things...

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 14, 2020

Writing up the CARD4L extension, I'm wondering whether there's more than just size and checksum for a fileinfo extension.
For example, I could think of

  • file:size (just in bytes?)
  • file:data_type (currently proposed in Rendering Hints extension (WIP) #879 as render:data_type)
  • file:checksum (currently checksum:multihash)
  • file:byte_order (one of big-endian or little-endian)
  • file:header_size (in bytes)
  • maybe: file:name (avoid parsing it from the path?)
  • maybe: file:compressed or file:archive (for assets in ZIP, TAR and other archives as recently discussed on Gitter, but that can probably also be figured out by the media type)

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 16, 2020

@emmanuelmathot (and all others reading this) Good that you like my comment ;-) Which fields would be useful for you? I'll probably write this up and I'm wondering what is actually useful for others. For example byte_order I could also define in card4l directly if that is too specific. Please name any of the fields mentioned or even additional ones. :-) Thank you!

@vincentsarago
Copy link

file:size (just in bytes?)

bytes 👌

it might be maybe overkill but knowing the header size could be really useful 🤷‍♂️

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 16, 2020

Great, I also need header size for CARD4L, so I'll just move it here. 2 use cases seems alright.

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 16, 2020

PR #934 is up for review :-)

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 16, 2020

Some reasons for not including the "maybe" options:

  • I did not add file:name as it seems that all languages have something like a basename() function or so.
  • A compressed/archive flag can be parsed basically from media types. And seems not so useful. What I thought would be useful to add something like an "archive" extension, which allows to specify individual files from archives in assets, e.g. linking to a xml and tiff in a zip so that you can use for example eo:bands to describe the tiff in the archive. Something like archive:path could describe the path of the file in the archive (e.g. ZIP) and allows direct mapping and extraction. May also be useful for Zarr or so?

@m-mohr m-mohr linked a pull request Dec 16, 2020 that will close this issue
4 tasks
@emmanuelmathot
Copy link
Collaborator Author

Wow. That was fast! That's great, thx! I have nothing more to add for now. We will implement this extension in DotNetStac with helpers to generate size or checksum for assets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants