Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point Cloud Extension #157

Closed
hgs-msmith opened this issue Aug 14, 2018 · 18 comments
Closed

Point Cloud Extension #157

hgs-msmith opened this issue Aug 14, 2018 · 18 comments
Labels
prio: should-have would be very good to have in the release
Milestone

Comments

@hgs-msmith
Copy link
Contributor

Define metadata extension for point cloud data.

@hgs-msmith
Copy link
Contributor Author

Consider adding the following as point-cloud specific metadata:

  • System Identifier | usually the name of the sensor used
  • Generating Software | (part of provenance?)
  • HasRGB | each point has an RGB value
  • Classification Codes | list of classes that the points are assigned to (ground, vegetation etc.) - does each class have additonal metadata (akin to bands)?

@cholmes cholmes changed the title Point Cloud Profile Point Cloud Extension Aug 23, 2018
@cholmes
Copy link
Contributor

cholmes commented Aug 23, 2018

Renamed issue title to 'extension' to be in line with #189 (OGC people define 'profile' as a subset, so it was even more confusing than I had thought).

@cholmes
Copy link
Contributor

cholmes commented Aug 23, 2018

@hobu - we'd like to get at least a placeholder for a STAC content extension for point clouds, to start to garner some expert feedback to evolve it. Any advice on additional point cloud metadata we should start with?

@cholmes
Copy link
Contributor

cholmes commented Aug 23, 2018

@hgs-msmith - would be great to make a PR for this. It does occur to me that for extensions we should maybe have some 'maturity' classification. Can perhaps be self selected, but would be good to make it clear that this extension isn't as fleshed out as the EO one in some obvious way...

@hobu
Copy link
Contributor

hobu commented Aug 23, 2018

We'll get some content for you on this in a few days. To be clear, this extension is about point cloud -specific content, not generics like SRS, BBOX, processing lineage, etc? Those are all taken care of by STAC, correct?

Here's a pdal info --all dump of a file for some metadata flavor of ASPRS LAS content.

autzen.json.txt

@cholmes
Copy link
Contributor

cholmes commented Aug 23, 2018

Awesome.

I wouldn't say all of those are taken care of by STAC yet. BBOX certainly, and we're putting in a very basic provenance thing in #179 For SRS we have an EPSG code in the EO profile, which we're going to make satellite specific. Idea is to get #156 #155 #149 and this and then pull up the common ones.

So do whatever is easier for you - put in everything and we can make recommendations for our 'common' ones. Or just assume some common stuff and let us know what they are.

I'll also try to check out the pdal dump.

@cholmes cholmes added prio: should-have would be very good to have in the release and removed new extension labels Aug 23, 2018
@cholmes cholmes added this to the 0.6.0-RC1 milestone Aug 24, 2018
@matthewhanson
Copy link
Collaborator

@hobu Also, if you haven't already it's good to take a look at the EO extension, https://github.com/radiantearth/stac-spec/blob/master/extensions/stac-eo-spec.md and use the smae terms here where they are applicable to point clouds.

@adamsteer
Copy link

heres an attempt I made. I looked at the eo: extension pattern - but more or less dumped most of the point cloud stuff into ‘properties’, without thinking too hard about what could be extracted as ‘point cloud specific’; or ‘what point cloud extension attributes should exist that I don’t use'

{'id': 'somerandomisedID',
 'bbox': {'X': {'max': 661999.99, 'min': 660000},
  'Y': {'max': 6055999.99, 'min': 6054000.01},
  'Z': {'max': 1751.554, 'min': 1314.028}},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[660016.09887684, 6053987.12779143],
     [662008.76960815, 6053992.85151772],
     [662008.76960815, 6056007.60317026],
     [659999.57590229, 6056007.60317026],
     [659996.27130738, 6053998.575244],
     [660016.09887684, 6053987.12779143]]]]},
 'crs': 'somerandomisedID',
 'type': 'feature',
 'properties': {'srs': {'horizontalCRS': 'PROJCS["GDA94 / MGA zone 55",GEOGCS["GDA94",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6283"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4283"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],AUTHORITY["EPSG","28355"],AXIS["Easting",EAST],AXIS["Northing",NORTH]]',
   'verticalCRS': '',
   'units': {'horizontal': 'metre', 'vertical': ''}},
  'ASPRSclasses': ['1.000000/25149',
   '2.000000/5014547',
   '3.000000/7792544',
   '4.000000/9477524',
   '5.000000/23191982',
   '7.000000/958',
   '18.000000/309'],
  'datetime': 'LASdateTime',
  'LASformat': {'majorVersion': 1, 'minorVersion': 4, 'dataformatID': 3},
  'numberOfPoints': 45503013,
  'area': 4054684.277,
  'density': 11.22233197},
 'assets': {'sourceFile': '/Volumes/Antares/ACT-lidar/all_rgb/ACT2015_4ppm-C3-AHD_6606054_55.laz',
  'waveformFile': 'pathToWaveformDataIfExisting',
  'octree': 'pathToOctree',
  'sourceMetadata': 'pathToSourceMetadata’}}

Looking at what might become a pc: attribute here, there are a bunch of candidates to consider -point density, number of points, vertical CRS,ASPRSclasses. I’m sure many more exist.

I made this with a catalogue of LAS/LAZ tiles in mind - and using the asset:octree attribute to point at where, say, an entwine index or potree containing this tile might exist. I haven’t considered the inverse case - I suppose an array of tile paths could be listed as assets if the entry here described an entwine index, as something like assets:tilepaths

This is mashed together from bunch of PDAL metadata calls and pretty ugly working parts ( https://gist.github.com/adamsteer/67feb3f4c34310be65154b5fcd7a24f7 ) - I haven’t looked at the case of ‘not using PDAL to extract metadata’ yet.

Hope that helps! I’ll be more or less ticking away at this over the next month and hoping to mint something we can commit to indexing lots of tiles (or a few octrees, or some mix of both) pretty soon!

@hobu
Copy link
Contributor

hobu commented Oct 4, 2018

I took another swing at this. Thanks for the inspiration @adamsteer.

It raises the following questions:

  • I see https://github.com/radiantearth/stac-spec/blob/master/extensions/stac-eo-spec.md has an eo:epsg entry. For point clouds we need both horizontal and vertical coordinate system information. I have expressed this as pc:compound_srs, but I wonder if we should have a higher-level SRS information. Also, not all EO has EPSG codes cough Mars cough

  • The document makes it clear that thumbnails are really useful. These are creatable for point clouds too, as long as we're talking about 2D rasterizations. Is there a standard thumbnail size/resolution?

  • Why is id and description under the c namespace but provider and license are not (or vice versa)?

      "c:id": "L1T",
      "c:description": "Landat 8 imagery that is ...",
      "provider": "USGS",
      "license": "PDDL-1.0",
    

Here's my python that takes a pdal info --all response and emits STAC:

#!/usr/bin/env python3

import sys
import json
import os


data = sys.stdin.read()
#with open('ak.json','rb') as d:
#    data = d.read()

j = json.loads(data)

def capture_date(pdalinfo):
    import datetime
    year = pdalinfo['metadata']['creation_year']
    day = pdalinfo['metadata']['creation_doy']
    date = datetime.datetime(int(year), 1, 1) + datetime.timedelta(int(day) - 1)
    return date.isoformat('T')

def convertGeometry(geom, srs):
    import ogr
    import osr
    in_ref = osr.SpatialReference()
    in_ref.SetFromUserInput(srs)
    out_ref = osr.SpatialReference()
    out_ref.SetFromUserInput('EPSG:4326')

    g = ogr.CreateGeometryFromJson(json.dumps(geom))
    g.AssignSpatialReference(in_ref)
    g.TransformTo(out_ref)
    return json.loads(g.ExportToJson())



output = {}

try:
    output['geometry'] = convertGeometry(j['boundary']['boundary_json'],j['metadata']['comp_spatialreference'])
except KeyError:
    output['geometry'] = j['stats']['bbox']['EPSG:4326']['boundary']

output['bbox'] = j['stats']['bbox']['EPSG:4326']['bbox']
output['id'] = os.path.basename(j['filename'])
output['type'] = 'Feature'

assets = {}
#assets['thumbnail'] =
properties = {}

properties['pc:schema'] = j['schema']['dimensions']
properties['pc:statistics'] = j['stats']['statistic']
properties['c:id'] = os.path.basename(j['filename'])
properties['c:description'] = "USGS 3DEP LiDAR"
properties['provider'] = "USGS"
properties['license'] = 'LICENSE'
properties['pc:type'] = 'lidar' # eopc, lidar, radar, sonar
properties['pc:density'] = j['boundary']['avg_pt_per_sq_unit']
properties['pc:count'] = j['metadata']['count']

properties['pc:encoding'] = 'LASzip' if bool(j['metadata']['compressed']) else 'None'

properties['datetime'] = capture_date(j)

output['properties'] = properties
output['assets'] = assets

sys.stdout.write(json.dumps(output,sort_keys=True,
                  indent=2, separators=(',', ': ')))


This ends up looking like:

{
  "assets": {},
  "bbox": {
    "maxx": -149.2527639,
    "maxy": 61.78936655,
    "maxz": 1981.200833,
    "minx": -149.3006585,
    "miny": 61.76668361,
    "minz": 11.48480563
  },
  "geometry": {
    "coordinates": [
      [
        [
          [
            -149.30045496703195,
            61.76682620765713
          ],
          [
            -149.25327277445712,
            61.76657763000873
          ],
          [
            -149.25272374552367,
            61.78921108684322
          ],
          [
            -149.30023535632952,
            61.7894132390075
          ],
          [
            -149.30080489041393,
            61.766924315502195
          ],
          [
            -149.30045496703195,
            61.76682620765713
          ]
        ]
      ]
    ],
    "type": "MultiPolygon"
  },
  "id": "AK_MatanuskaSusitna-Lot2_2011_000227.laz",
  "properties": {
    "c:description": "USGS 3DEP LiDAR",
    "c:id": "MODIFICATION                   ",
    "datetime": "2013-05-14T00:00:00",
    "license": "LICENSE",
    "pc:avg_pt_per_sq_unit": 3.163692583,
    "pc:count": 64473760,
    "pc:density": 0.9482590115,
    "pc:encoding": "LASzip",
    "pc:schema": [
      {
        "name": "X",
        "size": 8,
        "type": "floating"
      },
      {
        "name": "Y",
        "size": 8,
        "type": "floating"
      },
      {
        "name": "Z",
        "size": 8,
        "type": "floating"
      },
      {
        "name": "Intensity",
        "size": 2,
        "type": "unsigned"
      },
      {
        "name": "ReturnNumber",
        "size": 1,
        "type": "unsigned"
      },
      {
        "name": "NumberOfReturns",
        "size": 1,
        "type": "unsigned"
      },
      {
        "name": "ScanDirectionFlag",
        "size": 1,
        "type": "unsigned"
      },
      {
        "name": "EdgeOfFlightLine",
        "size": 1,
        "type": "unsigned"
      },
      {
        "name": "Classification",
        "size": 1,
        "type": "unsigned"
      },
      {
        "name": "ScanAngleRank",
        "size": 4,
        "type": "floating"
      },
      {
        "name": "UserData",
        "size": 1,
        "type": "unsigned"
      },
      {
        "name": "PointSourceId",
        "size": 2,
        "type": "unsigned"
      },
      {
        "name": "GpsTime",
        "size": 8,
        "type": "floating"
      }
    ],
    "pc:statistics": [
      {
        "average": 1765797.419,
        "count": 64473760,
        "maximum": 1769746,
        "minimum": 1761544.01,
        "name": "X",
        "position": 0,
        "stddev": 2340.346622,
        "variance": 5477222.31
      },
      {
        "average": 2842320.787,
        "count": 64473760,
        "maximum": 2846625,
        "minimum": 2838423.01,
        "name": "Y",
        "position": 1,
        "stddev": 2446.005498,
        "variance": 5982942.898
      },
      {
        "average": 3760.301515,
        "count": 64473760,
        "maximum": 6499.99,
        "minimum": 37.68,
        "name": "Z",
        "position": 2,
        "stddev": 953.5224496,
        "variance": 909205.062
      },
      {
        "average": 176.7857038,
        "count": 64473760,
        "maximum": 255,
        "minimum": 0,
        "name": "Intensity",
        "position": 3,
        "stddev": 92.00839536,
        "variance": 8465.544817
      },
      {
        "average": 1.214258343,
        "count": 64473760,
        "maximum": 5,
        "minimum": 1,
        "name": "ReturnNumber",
        "position": 4,
        "stddev": 0.7263398082,
        "variance": 0.527569517
      },
      {
        "average": 1.376616394,
        "count": 64473760,
        "maximum": 5,
        "minimum": 1,
        "name": "NumberOfReturns",
        "position": 5,
        "stddev": 1.070825579,
        "variance": 1.146667421
      },
      {
        "average": 0.5003097849,
        "count": 64473760,
        "maximum": 1,
        "minimum": 0,
        "name": "ScanDirectionFlag",
        "position": 6,
        "stddev": 0.4999999683,
        "variance": 0.2499999683
      },
      {
        "average": 0.000881211209,
        "count": 64473760,
        "maximum": 1,
        "minimum": 0,
        "name": "EdgeOfFlightLine",
        "position": 7,
        "stddev": 0.02967213003,
        "variance": 0.0008804353008
      },
      {
        "average": 17.21186159,
        "count": 64473760,
        "maximum": 129,
        "minimum": 2,
        "name": "Classification",
        "position": 8,
        "stddev": 31.34383656,
        "variance": 982.4360901
      },
      {
        "average": 0.3183486119,
        "count": 64473760,
        "maximum": 24,
        "minimum": -22,
        "name": "ScanAngleRank",
        "position": 9,
        "stddev": 9.934022398,
        "variance": 98.684801
      },
      {
        "average": 0,
        "count": 64473760,
        "maximum": 0,
        "minimum": 0,
        "name": "UserData",
        "position": 10,
        "stddev": 0,
        "variance": 0
      },
      {
        "average": 10733.9377,
        "count": 64473760,
        "maximum": 12042,
        "minimum": 1084,
        "name": "PointSourceId",
        "position": 11,
        "stddev": 3522.808499,
        "variance": 12410179.72
      },
      {
        "average": 26389458.29,
        "count": 64473760,
        "maximum": 30299322.58,
        "minimum": -2450342.801,
        "name": "GpsTime",
        "position": 12,
        "stddev": 10532949.21,
        "variance": 110943019000000.0
      }
    ],
    "pc:type": "lidar",
    "pc:units": {
      "horizontal": "US survey foot",
      "vertical": "US survey foot"
    },
    "provider": "USGS"
  },
  "type": "Feature"
}


@matthewhanson
Copy link
Collaborator

Hi @hobu, thanks for this.
It's not clear, but the latest STAC spec is on the dev branch, not the master, so there are some differences especially regarding collections, and your questions regarding c:id aren't really applicable anymore. Collections are now effectively Datasets (see the dataset extension) which are a little different, but I think we are renaming Datasets back to collections (clear as mud?).

EPSG codes are optional, and we figured if there wasn't a code that was fine. We really didn't want to include WKT as it just didn't seem a useful parameter to search on. At any rate the projection should be available in the datafiles or an original metadata file provided as an asset.

There's no recommendations currently on thumbnail size.

@adamsteer
Copy link

HI @hobu - thanks! I reconfigured my own code to use pdal info —all rather than repeated calls, and will likely borrow some of yours ;)

I need to think harder about how we need the collection/datasets thing to work (thanks @matthewhanson ) - ie what is the logical unit of which all things are children - for example a ‘survey’ (collection/dataset); which has ‘assets’ in STAC parlance of ‘LAS files’; ‘metadata records’; and ‘products’ (for example entwine or Potree derivative datasets, which represent a complete ’survey’ made of all the component files belonging to that survey)

clear as mud, right?

@hobu
Copy link
Contributor

hobu commented Oct 4, 2018

I started on the extension definition at https://github.com/hobu/stac-spec/tree/dev and pared down my fields for Item to the essentials.

@m-mohr m-mohr modified the milestones: 0.6.0-RC1, 0.7.0 Oct 16, 2018
@m-mohr
Copy link
Collaborator

m-mohr commented Nov 13, 2018

Hey @hobu,

thanks, that is a great start. Unfortunately, it seems nobody really noticed it so far.
I would suggest (updating your branch and) opening a Pull Request so that more people get aware of the extension and can review it. Once it's a PR, we can even ask our networks and Twitter folks to comment on it, similar to how we did it for the SAR extension.
cc @cholmes

@hobu
Copy link
Contributor

hobu commented Jan 1, 2019

PR made at #369

@adamsteer
Copy link

Thanks @hobu - I still think an optional field containing class/label information should be included in the pc: extension. it’s currently an extra pdal info query to get it (pdal info —filters.stats.dimensions=Classification —filters.stats.count=Classification), but IMO worth it. Could be conditional on more than one class appearing in stats.statistics[8], if that item appears at all...

@hobu
Copy link
Contributor

hobu commented Jan 3, 2019

@adamsteer but will you actually search on the classification label information/histogram? That information is slightly expensive to get and not every data source has it.

About your point in chat in regard to a full boundary, PDAL 1.9 pdal info --all will output --metadata, --boundary, and --stats output all at once, and when I make a pdal stac kernel, it will also output the filters.hexbin boundary of the data source.

@adamsteer
Copy link

@hobu yes - anytime I make a point cloud metadata set I will want to search for classifications.

For an example - if I want to make a map of all the tree heights in a region, I want to know whether the source points have appropriate labels and decide whether to use cheaper functions to do the task using existing labels; or develop more expensive processes to label all the points then make a tree height map. Sure, not all datasets have labelled/classified points - but where they do it’s really convenient (IMO) to be able to find out what classes exist and decide what to do next before pulling down data.

The point counts per class are less important (hopefully they’re non-zero) - maybe pc:statistics.labels (or pc:statistics.classes) could contain something like:

{ 
 “reference”: “ASPRS”
“labels”: [ 0,2,6,7,18,...]
}

...or

{
“reference”:”linktodocument”,
“labels”: [“road”, “cat”, “stopsign”,...]
}

@cholmes
Copy link
Contributor

cholmes commented Feb 14, 2019

Closing this, as we have the proposal extension merged in dev.

@cholmes cholmes closed this as completed Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio: should-have would be very good to have in the release
Projects
None yet
Development

No branches or pull requests

6 participants