Anyway to do finetuning jobs with custom processing scripts? #1091

gnerativ · 2024-08-11T04:48:42Z

Problem
I have a model finetuning job that takes around 80 minutes. There are two steps one is preprocessing and other is training.
Both preprocessing and training takes GPU models. But preprocessing has multiple steps and multiple models to be loaded on a step by step. Same with training too. I've put all this code inside predict function and left the model load empty.

Problems seen

I personally don't think what i'm doing is not ideal for truss as the model load is needed for you to optimise the GPU utlization.
How should an ideal solution for this to be? I've been checking chains but it looks too much of a work on top of what ive done with truss for so long.
I deployed it this way but the API hit takes 80 minutes to execute but pod delete max time limit is 1 hour. For some reason though the call is sync API call, and training is happening on the pod, it doesn't realise and brings down the pod without API returning any result.

Reasons I can't use model load,

My preprocessing step takes user input, so it can only be passed during predict step. Not sure if I can take inputs at predict, load a model based on that and then use model.load when ever my preprocessing is happening and again change the weight files in model load and do the training

Describe the solution you'd like

For now if I were to go with current approach. I see a hacky way, increase the pod inactivity time for 2 hours for my finetuning task to work
Make sure if one API is still in progress, don't bring the pod down.
Ideally a feasible way for me to use Truss correctly as it's intended. May be using Model load.

Describe alternatives you've considered
As of now, I don't find a solution than custom host and setup on AWS. I'm not able to find enough documentation on finetuning tasks. And dynamic weight file loading based on API inputs

bolasim · 2024-08-12T15:16:38Z

Hi @gnerativ . We're currently working on more dedicated fine-tuning support. If you're up for it, would love to chat in more detail about your usecase and needs. Please book 30 minutes whenever it's convenient for you: https://meetings.hubspot.com/bola-malek?uuid=a8f3caa3-c771-4d16-9421-81927137fda5

gnerativ · 2024-08-13T06:01:06Z

Hi @bolasim sure. Possible for more added slots :(. I work from IST and haven't found any matching slots with intersection here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anyway to do finetuning jobs with custom processing scripts? #1091

Anyway to do finetuning jobs with custom processing scripts? #1091

gnerativ commented Aug 11, 2024

bolasim commented Aug 12, 2024

gnerativ commented Aug 13, 2024

Anyway to do finetuning jobs with custom processing scripts? #1091

Anyway to do finetuning jobs with custom processing scripts? #1091

Comments

gnerativ commented Aug 11, 2024

bolasim commented Aug 12, 2024

gnerativ commented Aug 13, 2024