Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML-Enabler V3.0 RoadMap #4

Open
martham93 opened this issue Sep 1, 2020 · 2 comments
Open

ML-Enabler V3.0 RoadMap #4

martham93 opened this issue Sep 1, 2020 · 2 comments

Comments

@martham93
Copy link
Contributor

martham93 commented Sep 1, 2020

General Directions we would like to explore for the next major release of ml-enabler per conversations with @geohacker + @ingalls

Developer Support

  • Training data creation — support training data creation for classification models first (and then object detection). The user brings tile endpoint, or S3 bucket list of tiles along with vector file (geojson, on S3). ML Enabler facilitates the creation of tf-records via AWS batch.
  • Visualize training data — visualize training data locations (similar to how we visualize inferences), but color code based on how the training data is divided ie: test/test/val
  • Expand classification retraining workflow — expand, further generalize, and mature classification re-training workflows. Start to consider implementing Object Detection re-training workflows
  • Segmentation inference workflows - modify download and predict to handle segmentation so segmentation predictions are predicted + displayed as individual geometries
  • Post processing hooks — run custom code to do post processing of the inferences. For example, to compare the model inference with existing data in OSM.
  • Support uploading/serving models for inference ONNX as a more generalizable entry point for users that want to bring their own model to ml-enabler

Mapping Tool Integrations

  • MapRoulette — ML Enabler can push predictions to a MapRoulette task for easy mapping. This currently exists, but we want to setup a public instance.
  • Tasking Manager — ML Enabler can be immediately integrated into Tasking Manager during the project creation step to identify task complexity.
  • iD and JOSM — JOSM has limited support via a plugin. We should scope out what the integration looks like for iD.
  • Bring validated model predictions back into ML-Enabler from a mapping campaign like MapRoulette

Public Demo

  • Use publicly RGB model either from Dev Seed (HV grid classification) or outside of Dev Seed to showcase the whole ML-Enabler workflow into a public Map Roulette
    • demo will focused on how iml-enabler can help users
  • A key thing to highlight that ML-Enabler can help with is flagging false positives - this is a huge challenge for many existing OSM mapping workflows
@martham93 martham93 pinned this issue Sep 1, 2020
@geohacker geohacker transferred this issue from developmentseed/ml-enabler-deprecated Sep 2, 2020
@geohacker geohacker pinned this issue Sep 2, 2020
@ingalls ingalls unpinned this issue Oct 14, 2020
@rbavery
Copy link

rbavery commented Oct 6, 2022

@geohacker @ingalls it looks like there is already a roadmap ticket so I'm adding my comments here. It'd be helpful if one or both of you could review Martha's roadmap in the demo and indicate what is implemented.

ML Enabler + STAC Suggested Roadmap

Nearer term, STAC features

  • Accept as input a STAC collection with the following extensions filled in as much as possible, or a UI to fill in the metadata for these extensions?

  • then, MLE automatically fills in what metadata it can do without user input,runs inference, assembles the labels, aoi, and model.

  • Return MLE inference results as STAC catalog, using the label extension with a time interval for labels: https://github.com/stac-extensions/label/tree/main/examples/spacenet-roads

    • rationale: this enables easy loading into anything that takes STAC for analysis, reannotation: QGIS, python, JOSM
    • label stac item should refer back to the source imagery the label was annotated on.
    • time interval and annotation source imagery are needed for labels to be used in other contexts. DS-Annotate should also include these metadata items as well to interface with MLE
    • Side Note: the ml-aoi extension should probably be officially renamed to annotation-aoi since it covers where labels were annotated and these don't necessarily need to be used for ML
  • once the metadata is generated and the geojson assets are on s3 or a pgstac database, how to publish? Radiant Earth? this would be after the final HITL loop, more on that below,a dn maybe this is or isn't handled by MLE.

Longer term stretch features to improve HITL

  • Integration with DS-Annotate. Support import/export of STAC collections with all three extensions from DS-Annotate to link the annotation tool to the inference tool.

    • I think we would typically want to spend offline time on model development from some initial dataset, then begin to use the HITL loop. therefore, instead of only exporting the STAC Collection to MLE, it'd be helpful if DS-Annotate supported export of STAC collections to a STAC API (deployed by eoAPI?) which could serve the collection for easy loading in our current STAC > stackstac > xarray > torchdata workflow. Or for richer visualization/dataset exploration in another tool. Maybe MLE would be a consumer of this same STAC API?
  • load STAC items from PC into MLE and run inference on assets

    • we will need to handle
      • assets with different data type (float32, int16, etc.) since each model expects a particular dynamic range
      • assets with different bands, since model expects a fixed number of dimensions for the input . might require supporting at a minimum, a 3 band model and a 1 band model if we wanted the experience to be really automated at the cost of less customization
      • would we support only running inference on 3 band assets? or assets with single bands stored as different assets? open question.

Miscellaneous things

Post processing hooks — run custom code to do post processing of the inferences. For example, to compare the model inference with existing data in OSM.

  • handle merge conflicts between t-1 predictions/labels and t predictions/labels using different merge strategies and confidence scores

    • we could just assume a model trained on the t labels is more accurate but in practice this is not always the case because of label issues, noise in the model, etc.
  • STAC metadata format support and inference support for classification, object detection, semantic segmentation and instance segmentation. ability to adjust the post-processing hooks for all of these since this is a common requirement particularly for segmentation

@rbavery
Copy link

rbavery commented Nov 1, 2022

cc @Martinatav13 see the readme for a synopsis of what MLE does: https://github.com/developmentseed/ml-enabler#features-of-mle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants