Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spatial dataset training functions #2141

Open
Jo316 opened this issue Jun 7, 2024 · 1 comment
Open

spatial dataset training functions #2141

Jo316 opened this issue Jun 7, 2024 · 1 comment

Comments

@Jo316
Copy link

Jo316 commented Jun 7, 2024

What you would like to be added?

I would like to request the addition of functions to the Training Operator for training models with spatial (geographical) datasets. These functions should enable seamless integration and processing of geographical data, leveraging state-of-the-art algorithms to enhance model accuracy and applicability in spatial contexts.

One potential reference is the R package CAST, which provides robust functions for training models with geographical data using random forest. The package offers a comprehensive approach to handling spatial data, including considerations for the Area of Applicability.

Functions Ranked By Importance/ Need (https://hannameyer.github.io/CAST/reference/index.html:

Why is this needed?

The integration of spatial dataset training functions will significantly enhance the Training Operator's capabilities, particularly for users working with geographical data. It will allow for more accurate and relevant model training in fields such as environmental science, urban planning, and geospatial analysis.

By incorporating these functions, the Training Operator will support a wider range of use cases and applications, making it a more versatile and powerful tool for data scientists and researchers.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

@andreyvelich
Copy link
Member

Thank you for creating this @Jo316!

Please can you explain what specific functionality are you looking from Training Operator to support training models with spatial datasets ?
Do you require some distributed capabilities and you want to leverage Training Operator controller to orchestrate the appropriate resources on Kubernetes ?

As long as you can create container from your training script where you use the geographical datasets, you can run it within Training Operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants