Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feasibility of multi-task training in lightning with dynamic model size #1502

Closed
Ge0rges opened this issue Apr 15, 2020 · 5 comments
Closed
Labels
question Further information is requested

Comments

@Ge0rges
Copy link

Ge0rges commented Apr 15, 2020

Questions and Help

Hello all. I am interested in using lightning for my research project. However I'm having trouble assessing the feasibility of my architecture in lightning due to some particularities.

The typical train loop that lightning abstracts looks like this:

for epoch in range(epochs):
      ...train code...

However my structure looks something more like this.

for task_number in range(number_of_tasks):
    dataloader = Dataloader(task=t)  # The datalaoder is task dependent. 

    if task_number == 0:
        for epoch in range(epochs):
             ...regular train code...

    else:
        for epoch in range(epochs):
            ...selective retraining...  # This uses pytorch hooks to only train certain nodes by setting grads to 0
            model = split(model)  # Logic that may add new nodes to the model (size change), also does training of newly added nodes
            if loss > loss_threshold:
                model = dynamic_expansion(model)  # More logic that will do a size change and training

As you can see there are some challenges that don't easily translate to lightning, first the concept of tasks, task dependent loaders (for example, first task is a subset of mnist, second task is a different subset), and more complex task dependent logic which may cause a model size change and require newly added nodes to be trained.

I'm interested in using lightning, but I'm having trouble seeing how this arch could fit.

Thank you.

@Ge0rges Ge0rges added the question Further information is requested label Apr 15, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@BartekRoszak
Copy link

I solved a similar problem by creating a DataLoader which yield batch that contains batch for all the tasks and writing the "task loop" inside the training step.

@Ge0rges
Copy link
Author

Ge0rges commented Apr 21, 2020

Well I want to train X epochs per task, you're way would be for each epoch train every task, where as I want to do for each task train every epoch.

@Ge0rges
Copy link
Author

Ge0rges commented Apr 30, 2020

One solution is to have a single data loader but then parse the targets and outputs to only take into account the task at hand (for example, one task per entry in a one hot label). Could require some extra engineering. Will post here if I ever end up doing this.

@Ge0rges Ge0rges closed this as completed Apr 30, 2020
@turian
Copy link
Contributor

turian commented Apr 27, 2021

@Ge0rges #1959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants