Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anyway to do finetuning jobs with custom processing scripts? #1091

Open
gnerativ opened this issue Aug 11, 2024 · 2 comments
Open

Anyway to do finetuning jobs with custom processing scripts? #1091

gnerativ opened this issue Aug 11, 2024 · 2 comments

Comments

@gnerativ
Copy link

Problem
I have a model finetuning job that takes around 80 minutes. There are two steps one is preprocessing and other is training.
Both preprocessing and training takes GPU models. But preprocessing has multiple steps and multiple models to be loaded on a step by step. Same with training too. I've put all this code inside predict function and left the model load empty.

Problems seen

  1. I personally don't think what i'm doing is not ideal for truss as the model load is needed for you to optimise the GPU utlization.
  2. How should an ideal solution for this to be? I've been checking chains but it looks too much of a work on top of what ive done with truss for so long.
  3. I deployed it this way but the API hit takes 80 minutes to execute but pod delete max time limit is 1 hour. For some reason though the call is sync API call, and training is happening on the pod, it doesn't realise and brings down the pod without API returning any result.

Reasons I can't use model load,

  1. My preprocessing step takes user input, so it can only be passed during predict step. Not sure if I can take inputs at predict, load a model based on that and then use model.load when ever my preprocessing is happening and again change the weight files in model load and do the training

Describe the solution you'd like

  1. For now if I were to go with current approach. I see a hacky way, increase the pod inactivity time for 2 hours for my finetuning task to work
  2. Make sure if one API is still in progress, don't bring the pod down.
  3. Ideally a feasible way for me to use Truss correctly as it's intended. May be using Model load.

Describe alternatives you've considered
As of now, I don't find a solution than custom host and setup on AWS. I'm not able to find enough documentation on finetuning tasks. And dynamic weight file loading based on API inputs

@bolasim
Copy link
Collaborator

bolasim commented Aug 12, 2024

Hi @gnerativ . We're currently working on more dedicated fine-tuning support. If you're up for it, would love to chat in more detail about your usecase and needs. Please book 30 minutes whenever it's convenient for you: https://meetings.hubspot.com/bola-malek?uuid=a8f3caa3-c771-4d16-9421-81927137fda5

@gnerativ
Copy link
Author

Hi @bolasim sure. Possible for more added slots :(. I work from IST and haven't found any matching slots with intersection here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants