Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Test and Validation Set Loss #1897

Open
stepfunction83 opened this issue Jan 24, 2025 · 2 comments
Open

Implement Test and Validation Set Loss #1897

stepfunction83 opened this issue Jan 24, 2025 · 2 comments

Comments

@stepfunction83
Copy link

I propose creating a tracker to capture a stable loss measurement as proposed here: https://github.com/spacepxl/demystifying-sd-finetuning

Effectively, at regular intervals during training, a preselected image (or batch of images I imagine) and a preselected noise seed are used to calculate the loss. This ensures that over the course of training, the loss recorded can accurately show the progress of the training run.

By also incorporating a holdout set to use for calculating validation loss, this allows a proper evaluation of what point the model begins to overtrain.

@stepfunction83
Copy link
Author

stepfunction83 commented Jan 24, 2025

This could be implemented as part of the standard train loop, by selecting predetermined batches from the train_dataloader into a test_dataloader and by creating a val_dataloader by either removing samples from the train_dataloader (much easier) or loading from a directory (much harder).

Then to specify the number of items to include in each set as well as the number of noise/timestep iterations to perform per image:

  • --test_set_count 10 (Automatically create a test set using a specified number of images from the training set. These are not set aside and continue to be used for training.)
  • --val_set_count 10 (Automatically create a holdout set for validation using a specified number of images from the training set. These are set aside and not included in training.)

It would be substantially easier to automatically split out a set of the train_dataloader for use as val_dataloader rather than allow specification. By automatically selecting them, we can leverage the existing cache generation and significantly reduce the extra effort to load and cache an additional directory of images and captions.

Finally, a frequency should be specified to run loss calculations on these sets:

  • --test_val_loss_freq_steps 50 (calculate test/val loss every 50 steps)
  • --test_val_loss_freq_epochs 1 (calculate test/val loss every epoch)

The results of these calculations should be logged to the standard log and to wandb.

For an initial implementation, doing predetermined entries from the train_dataloader for a test loss would be simplest. This would involve the creation of:

  • --test_set_count 10
  • --test_val_loss_freq_steps 50 (calculate test/val loss every 50 steps)

Followed by:

  • val_dataloader load samples held out from the dataset and specified separately
  • --val_set_count 10
  • --test_val_loss_freq_epochs 1 (calculate test/val loss every epoch)

Finally due to the additional complexity of ingesting an additional directory:

  • --val_dataloader_dir "~/val_dataset_dir/" (To override --val_set_count if provided, and load a directory of validation images)

@stepfunction83 stepfunction83 changed the title Implement Stable Loss Calculation for Run Tracking Implement Test and Validation Set Loss Jan 25, 2025
@stepfunction83
Copy link
Author

Implemented in #1899. I am working on a number of enhancements to it in #1900.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant