Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file info extension (remove checksum extension) #934

Merged
merged 7 commits into from
Dec 16, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- 'via' and 'canonical' rel types as options in items.
- Added clarification about how collection-level asset object properties do not remove the need for item-level asset object properties in the `item-assets` extension ([#880](https://github.com/radiantearth/stac-spec/pull/880))
- Added [processing extension](extensions/processing/README.md)
- Added [file info extension](extensions/file/README.md) ([#879](https://github.com/radiantearth/stac-spec/pull/879), [#921](https://github.com/radiantearth/stac-spec/issues/921))
- Added additional acquisition parameters in the `sat` extension: sat:platform_international_designator, sat:absolute_orbit, sat:anx_datetime* ([#894](https://github.com/radiantearth/stac-spec/pull/894))

### Changed
Expand All @@ -19,11 +20,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Removed

- Checksum extension (field `checksum:multihash`). Use File Info extension (field `file:checksum`) instead.

### Fixed

- Label extension: `label:classes` was flagged as required in JSON Schema, but is only required for categorical data.
- Fixed JSON Schema for `providers` (Collections and Items) to be an object and require a `name`.
- JSON Schema for `sar:polarizations` in `assets` fixed
- JSON Schema for `sar:polarizations` in `assets` fixed

## [v1.0.0-beta.2] - 2020-07-08

Expand Down
2 changes: 1 addition & 1 deletion extensions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ An extension can add new fields to STAC entities (content extension), or can add

| Extension Title | Identifier | Field Name Prefix | Scope | Maturity | Description |
| ------------------------------------------------ | ----------------- | ------------------- | ------------------------- | ---------- | ----------- |
| [Checksum](checksum/README.md) | checksum | checksum | Item, Catalog, Collection | *Proposal* | Provides a way to specify file checksums for assets and links in Items, Catalogs and Collections. |
| [Collection Assets](collection-assets/README.md) | collection-assets | - | Collection | *Proposal* | Provides a way to specify assets available on the collection-level. |
| [Data Cube](datacube/README.md) | datacube | cube | Item, Collection | *Proposal* | Data Cube related metadata, especially to describe their dimensions. |
| [Electro-Optical](eo/README.md) | eo | eo | Item | *Proposal* | Covers electro-optical data that represents a snapshot of the earth for a single date and time. It could consist of multiple spectral bands, for example visible bands, infrared bands, red edge bands and panchromatic bands. The extension provides common fields like bands, cloud cover, gsd and more. |
| [File Info](file/README.md) | file | file | Item, Catalog, Collection | *Proposal* | Provides a way to specify file details such as size, data type and checksum for assets and links in Items, Catalogs and Collections. |
| [Item Asset Definition](item-assets/README.md) | item-assets | - | Collection | *Proposal* | Provides a way to specify details about what assets may be found in Items belonging to a collection. |
| [Label](label/README.md) | label | label | Item | *Proposal* | Items that relate labeled AOIs with source imagery |
| [Point Cloud](pointcloud/README.md) | pointcloud | pc | Item | *Proposal* | Provides a way to describe point cloud datasets. The point clouds can come from either active or passive sensors, and data is frequently acquired using tools such as LiDAR or coincidence-matched imagery. |
Expand Down
35 changes: 0 additions & 35 deletions extensions/checksum/README.md

This file was deleted.

76 changes: 76 additions & 0 deletions extensions/file/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# File Info Extension Specification

- **Title: File Info**
- **Identifier: file**
- **Field Name Prefix: file**
- **Scope: Item, Catalog, Collection**
- **Extension [Maturity Classification](../README.md#extension-maturity): Proposal**

Provides a way to specify file related details such as checksum, data type and size for assets and links in [STAC Items](../../item-spec/item-spec.md), [STAC Catalogs](../../catalog-spec/catalog-spec.md) and [STAC Collections](../../collection-spec/collection-spec.md).

- [Example](examples/sentinel1.json)
- [JSON Schema](json-schema/schema.json)

## *Link Object* and *Asset Object* fields

The following fields can be used for Links (in the [`Link Object`](../../item-spec/item-spec.md#link-object)) and assets (in the [`Asset Object`](../../item-spec/item-spec.md#asset-object)).

| Field Name | Type | Description |
| ------------------ | ------ | ------------------------------------------------------------ |
| file:byte_order | string | The byte order of integer values in the file. One of `big-endian` or `little-endian`. |
| file:checksum | string | Provides a way to specify file [checksums](#checksums) (e.g. BLAKE2, MD5, SHA1, SHA2, SHA3). The hashes are self-identifying hashes as described in the [Multihash specification](https://github.com/multiformats/multihash) and must be encoded as hexadecimal (base 16) string with lowercase letters. |
| file:data_type | string | The data type of the file. One of the [data types](#data-types) below. |
| file:header_size | integer | The header [size](#sizes) of the file, specified in bytes. |
| file:size | integer | The file [size](#sizes), specified in bytes. |

This extension can OPTIONALLY be used with the [Collection Assets Extension](../collection-assets/README.md).
File specific details should not be part of the [Item Assets Definition](../item-assets/README.md) in Collections.

### Sizes

Please be aware that the integer values (always unsigned) given for the sizes (especially `file:size`) may exceed the maximum value for the default integer data type in your environment / programming language. In this specification `integer` specifies a integer number without an upper limit. You might need to use other data types to store the values in. For example, files with a size larger than around 2,14 GB would exceed the maximum value for int32 and in JavaScript `BigInt` could be used then.

### Data Types

The data type gives information about the values in the file.
This can be used to indicate the (maximum) range of numerical values expected.
For example `unit8` indicates that the numbers are in a range between 0 and 255,
m-mohr marked this conversation as resolved.
Show resolved Hide resolved
they can never be smaller or larger. This can help to pick the optimal numerical
data type when reading the files to keep memory consumption low.
Nevertheless, it doesn't necessarily mean that the expected values fill the whole range.
For example, there can be use cases for `unit8` that just use the numbers 0 to 10 for example.
m-mohr marked this conversation as resolved.
Show resolved Hide resolved
Through other extensions it might be possible to specify an exact value range so
that visualizations can be optimized.
The allowed values for `file:data_type` are:

- `int8`: 8-bit integer
- `int16`: 16-bit integer
- `int32`: 32-bit integer
- `int64`: 64-bit integer
- `uint8`: unsigned 8-bit integer (common for 8-bit RGB PNG's)
- `uint16`: unsigned 16-bit integer
- `uint32`: unsigned 32-bit integer
- `uint64`: unsigned 64-bit integer
- `float16`: 16-bit float
- `float32`: 32-bit float
- `float64`: 64-big float
- `cint16`: 16-bit complex integer
- `cint32`: 32-bit complex integer
- `cfloat32`: 32-bit complex float
- `cfloat64`: 64-bit complex float
- `other`: Other data type than the ones listed above (e.g. boolean, string, higher precision numbers)

### Checksums

`file:checksum` was previously defined in the [`checksum` extension](https://github.com/radiantearth/stac-spec/tree/v1.0.0-beta.2/extensions/checksum/README.md) and the field name was `checksum:multihash` before STAC v1.0.0-beta.3. The specification of the field has not changed.

Checksum examples for some algorithms supported by [Multihash](https://github.com/multiformats/multihash) in `file:checksum`. The examples are given for a text file with file content `test`.

- Algorithm `sha1` (160 bits): `1114a94a8fe5ccb19ba61c4c0873d391e987982fbbd3`
- Algorithm `sha2` (256 bits): `12209f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08`
- Algorithm `sha2` (256 bits truncated to 160 bits): `12149f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b`
- Algorithm `blake2b-128`: `90e4021044a8995dd50b6657a037a7839304535b`

## Implementations

None yet, still in proposal stage.
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
"type": "Feature",
"stac_version": "1.0.0-beta.2",
"stac_extensions": [
"checksum"
],
"file"
],
"bbox": [-70.275032,-64.72924,-65.087479,-51.105831],
"geometry": {
"type": "Polygon",
Expand All @@ -26,32 +26,39 @@
"href": "./annotation/calibration/noise-s1a-ew-grd-hh-20181103t235855-20181103t235955-024430-02ad5d-001.xml",
"title": "Calibration Schema",
"type": "text/xml",
"checksum:multihash": "90e40210a30d1711e81a4b11ef67b28744321659"
"file:checksum": "90e40210a30d1711e81a4b11ef67b28744321659"
},
"calibrations": {
"href": "./annotation/calibration/calibration-s1a-ew-grd-hh-20181103t235855-20181103t235955-024430-02ad5d-001.xml",
"title": "Noise Schema",
"type": "text/xml",
"checksum:multihash": "90e402104fc5351af67db0b8f1746efe421a05e4"
"file:checksum": "90e402104fc5351af67db0b8f1746efe421a05e4"
},
"products": {
"href": "./annotation/s1a-ew-grd-hh-20181103t235855-20181103t235955-024430-02ad5d-001.xml",
"title": "Product Schema",
"type": "text/xml",
"checksum:multihash": "90e402107a7f2588a85362b9beea2a12d4514d45"
"file:checksum": "90e402107a7f2588a85362b9beea2a12d4514d45"
},
"measurement": {
"href": "./measurement/s1a-ew-grd-hh-20181103t235855-20181103t235955-024430-02ad5d-001.tiff",
"title": "Measurements",
"type": "image/tiff",
"sar:polarizations": ["HH"],
"checksum:multihash": "90e40210163700a8a6501eccd00b6d3b44ddaed0"
"file:byte_order": "little-endian",
"file:data_type": "uint16",
"file:size": 209715200,
"file:header_size": 4096,
"file:checksum": "90e40210163700a8a6501eccd00b6d3b44ddaed0"
},
"thumbnail": {
"href": "./preview/quick-look.png",
"title": "Thumbnail",
"type": "image/png",
"checksum:multihash": "90e40210f52acd32b09769d3b1871b420789456c"
"file:byte_order": "big-endian",
"file:data_type": "uint8",
"file:size": 146484,
"file:checksum": "90e40210f52acd32b09769d3b1871b420789456c"
}
},
"links": [
Expand All @@ -62,12 +69,12 @@
{
"rel": "parent",
"href": "https://example.com/collections/sentinel-1",
"checksum:multihash": "11146d97123fd2c02dec9a1b6d3b13136dbe600cf966"
"file:checksum": "11146d97123fd2c02dec9a1b6d3b13136dbe600cf966"
},
{
"rel": "root",
"href": "https://example.com/collections",
"checksum:multihash": "1114fa4b9d69fdddc7c1be7bed9440621400b383b43f"
"file:checksum": "1114fa4b9d69fdddc7c1be7bed9440621400b383b43f"
}
]
}
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://schemas.stacspec.org/v1.0.0-beta.2/extensions/checksum/json-schema/schema.json#",
"title": "Checksum Extension Specification",
"description": "STAC Checksum Extension to a STAC Item",
"$id": "https://schemas.stacspec.org/v1.0.0-beta.2/extensions/file/json-schema/schema.json#",
"title": "File Info Extension Specification",
"description": "STAC File Info Extension to a STAC Item, Catalog or Collection",
"oneOf": [
{
"allOf": [
Expand All @@ -13,10 +13,10 @@
"$ref": "#/definitions/stac_extensions"
},
{
"$ref": "#/definitions/checksum_links"
"$ref": "#/definitions/file_links"
},
{
"$ref": "#/definitions/checksum_assets"
"$ref": "#/definitions/file_assets"
}
]
},
Expand All @@ -29,7 +29,7 @@
"$ref": "#/definitions/stac_extensions"
},
{
"$ref": "#/definitions/checksum_links"
"$ref": "#/definitions/file_links"
}
]
},
Expand All @@ -42,10 +42,10 @@
"$ref": "#/definitions/stac_extensions"
},
{
"$ref": "#/definitions/checksum_links"
"$ref": "#/definitions/file_links"
},
{
"$ref": "#/definitions/checksum_assets"
"$ref": "#/definitions/file_assets"
}
]
}
Expand All @@ -61,44 +61,84 @@
"type": "array",
"contains": {
"enum": [
"checksum",
"https://schemas.stacspec.org/v1.0.0-beta.2/extensions/checksum/json-schema/schema.json"
"file",
"https://schemas.stacspec.org/v1.0.0-beta.2/extensions/file/json-schema/schema.json"
]
}
}
}
},
"checksum_links": {
"file_links": {
"type": "object",
"properties": {
"links": {
"type": "array",
"items": {
"$ref": "#/definitions/checksums"
"$ref": "#/definitions/file"
}
}
}
},
"checksum_assets": {
"file_assets": {
"type": "object",
"properties": {
"assets": {
"type": "object",
"patternProperties": {
".+": {
"$ref": "#/definitions/checksums"
"$ref": "#/definitions/file"
}
}
}
}
},
"checksums": {
"file": {
"type": "object",
"properties": {
"checksum:multihash": {
"file:byte_order": {
"type": "string",
"enum": [
"big-endian",
"little-endian"
],
"title": "File Byte Order"
},
"file:checksum": {
"type": "string",
"pattern": "^[a-f0-9]+$",
"title": "Multihash"
"title": "File Checksum (Multihash)"
},
"file:data_type": {
"type": "string",
"enum": [
"int8",
"int16",
"int32",
"int64",
"uint8",
"uint16",
"uint32",
"uint64",
"float16",
"float32",
"float64",
"cint16",
"cint32",
"cfloat32",
"cfloat64",
"other"
],
"title": "File Data Type"
},
"file:header_size": {
"type": "integer",
"minimum": 0,
"title": "File Header Size"
},
"file:size": {
"type": "integer",
"minimum": 0,
"title": "File Size"
}
}
}
Expand Down