ML-Enabler V3.0 RoadMap #4

martham93 · 2020-09-01T17:24:07Z

General Directions we would like to explore for the next major release of ml-enabler per conversations with @geohacker + @ingalls

Developer Support

Training data creation — support training data creation for classification models first (and then object detection). The user brings tile endpoint, or S3 bucket list of tiles along with vector file (geojson, on S3). ML Enabler facilitates the creation of tf-records via AWS batch.
Visualize training data — visualize training data locations (similar to how we visualize inferences), but color code based on how the training data is divided ie: test/test/val
Expand classification retraining workflow — expand, further generalize, and mature classification re-training workflows. Start to consider implementing Object Detection re-training workflows
Segmentation inference workflows - modify download and predict to handle segmentation so segmentation predictions are predicted + displayed as individual geometries
Post processing hooks — run custom code to do post processing of the inferences. For example, to compare the model inference with existing data in OSM.
Support uploading/serving models for inference ONNX as a more generalizable entry point for users that want to bring their own model to ml-enabler

Mapping Tool Integrations

MapRoulette — ML Enabler can push predictions to a MapRoulette task for easy mapping. This currently exists, but we want to setup a public instance.
Tasking Manager — ML Enabler can be immediately integrated into Tasking Manager during the project creation step to identify task complexity.
iD and JOSM — JOSM has limited support via a plugin. We should scope out what the integration looks like for iD.
Bring validated model predictions back into ML-Enabler from a mapping campaign like MapRoulette

Public Demo

Use publicly RGB model either from Dev Seed (HV grid classification) or outside of Dev Seed to showcase the whole ML-Enabler workflow into a public Map Roulette
- demo will focused on how iml-enabler can help users
A key thing to highlight that ML-Enabler can help with is flagging false positives - this is a huge challenge for many existing OSM mapping workflows

The text was updated successfully, but these errors were encountered:

rbavery · 2022-10-06T23:49:13Z

@geohacker @ingalls it looks like there is already a roadmap ticket so I'm adding my comments here. It'd be helpful if one or both of you could review Martha's roadmap in the demo and indicate what is implemented.

ML Enabler + STAC Suggested Roadmap

Nearer term, STAC features

Accept as input a STAC collection with the following extensions filled in as much as possible, or a UI to fill in the metadata for these extensions?
- https://github.com/stac-extensions/label (MLE would need some metadata to add to each label extension compliant prediction)
- https://github.com/stac-extensions/ml-model (this could maybe be created within the ui and from the model weights used to create the Torchserve container)
- https://github.com/stac-extensions/ml-aoi (MLE would need some metadata to create an aoi geojson within the stac extension to represent where the labels were searched for)
then, MLE automatically fills in what metadata it can do without user input,runs inference, assembles the labels, aoi, and model.
Return MLE inference results as STAC catalog, using the label extension with a time interval for labels: https://github.com/stac-extensions/label/tree/main/examples/spacenet-roads
- rationale: this enables easy loading into anything that takes STAC for analysis, reannotation: QGIS, python, JOSM
- label stac item should refer back to the source imagery the label was annotated on.
- time interval and annotation source imagery are needed for labels to be used in other contexts. DS-Annotate should also include these metadata items as well to interface with MLE
- Side Note: the ml-aoi extension should probably be officially renamed to annotation-aoi since it covers where labels were annotated and these don't necessarily need to be used for ML
once the metadata is generated and the geojson assets are on s3 or a pgstac database, how to publish? Radiant Earth? this would be after the final HITL loop, more on that below,a dn maybe this is or isn't handled by MLE.

Longer term stretch features to improve HITL

Integration with DS-Annotate. Support import/export of STAC collections with all three extensions from DS-Annotate to link the annotation tool to the inference tool.
- I think we would typically want to spend offline time on model development from some initial dataset, then begin to use the HITL loop. therefore, instead of only exporting the STAC Collection to MLE, it'd be helpful if DS-Annotate supported export of STAC collections to a STAC API (deployed by eoAPI?) which could serve the collection for easy loading in our current STAC > stackstac > xarray > torchdata workflow. Or for richer visualization/dataset exploration in another tool. Maybe MLE would be a consumer of this same STAC API?
load STAC items from PC into MLE and run inference on assets
- we will need to handle
  - assets with different data type (float32, int16, etc.) since each model expects a particular dynamic range
  - assets with different bands, since model expects a fixed number of dimensions for the input . might require supporting at a minimum, a 3 band model and a 1 band model if we wanted the experience to be really automated at the cost of less customization
  - would we support only running inference on 3 band assets? or assets with single bands stored as different assets? open question.

Miscellaneous things

post processing inference: support running inference on overlapping tiles #101
- Martha referenced this above:

Post processing hooks — run custom code to do post processing of the inferences. For example, to compare the model inference with existing data in OSM.

handle merge conflicts between t-1 predictions/labels and t predictions/labels using different merge strategies and confidence scores
- we could just assume a model trained on the t labels is more accurate but in practice this is not always the case because of label issues, noise in the model, etc.
STAC metadata format support and inference support for classification, object detection, semantic segmentation and instance segmentation. ability to adjust the post-processing hooks for all of these since this is a common requirement particularly for segmentation

rbavery · 2022-11-01T21:33:31Z

cc @Martinatav13 see the readme for a synopsis of what MLE does: https://github.com/developmentseed/ml-enabler#features-of-mle

martham93 pinned this issue Sep 1, 2020

geohacker transferred this issue from developmentseed/ml-enabler-deprecated Sep 2, 2020

geohacker pinned this issue Sep 2, 2020

geohacker mentioned this issue Sep 2, 2020

deprecate notice hotosm/ml-enabler#62

Merged

ingalls unpinned this issue Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML-Enabler V3.0 RoadMap #4

ML-Enabler V3.0 RoadMap #4

martham93 commented Sep 1, 2020 •

edited

Loading

rbavery commented Oct 6, 2022 •

edited

Loading

rbavery commented Nov 1, 2022

ML-Enabler V3.0 RoadMap #4

ML-Enabler V3.0 RoadMap #4

Comments

martham93 commented Sep 1, 2020 • edited Loading

rbavery commented Oct 6, 2022 • edited Loading

ML Enabler + STAC Suggested Roadmap

Nearer term, STAC features

Longer term stretch features to improve HITL

rbavery commented Nov 1, 2022

martham93 commented Sep 1, 2020 •

edited

Loading

rbavery commented Oct 6, 2022 •

edited

Loading