API: Replace custom Trainer with Pytorch Lightning #22

carterbox · 2023-12-22T22:23:58Z

I have refactored the PtychoNN API to be functional and replaced the custom training management scripts with Pytorch Lightning.

This has the following advantages

reduces the amount of boilerplate in this project
automates device management
provides a more robust model checkpointing/saving/reloading method

Fixes #21

carterbox · 2024-01-02T22:15:35Z

@stevehenke, please take a look at the new API. The following functions specifically:

ptychonn.train
ptychonn.init_or_load_model
ptychonn.infer

The network model is now LitReconSmallModel instead of ReconSmallModel because some of the training parameters are moved into the parameters of the model.

stevehenke

These changes look like a significant improvement in PtychoNN training capabilities. Nice job!

carterbox · 2024-01-03T17:33:09Z

Create a custom logger that doesn't touch disk
Create a checkpointing function so downstream doesn't need to import pytorch lightning

carterbox · 2024-01-16T20:39:06Z

import numpy as np
import ptychonn
from threading import Event

if __name__== '__main__':

    x = np.random.rand(1_000, 256, 256).astype(np.float32)
    y = np.random.rand(1_000, 1, 256, 256).astype(np.float32)

    print("Transferring memory to PtychoNN")
    # t = ptychonn.Trainer(
    #     model=ptychonn.ReconSmallModel,
    #     batch_size=32,
    # )
    # t.setTrainingData(
    #     x,
    #     y,
    # )
    dataloader0, dataloader1 = ptychonn.create_training_dataloader(
        x,
        y,
        batch_size=32,
        training_fraction=0.8,
    )

    print("Waiting forever for you to check memory consumption.")
    Event().wait()

I have used the above script to test the difference in memory consumption between the old and new API. In my tests, the memory usage was decreased from 1.2GiB to 0.76GiB. That's like 40% reduction.

carterbox added 7 commits December 21, 2023 19:02

REF: Replace Tester/Trainer with Pytorch Lightning

dda1a3b

REF: Bring back some API functionality

52e53ed

REF: Remove external deps from API

a92c7de

REF: Move model init into separate function

243870e

Plot as soon as done stitching

19a9a6c

REF: Reimplement loss plots

8dceda1

Reimplement lr scheduler step rates

63c9b05

carterbox requested a review from stevehenke January 2, 2024 22:09

BUG: Allow training out_dir to be None

434de05

stevehenke approved these changes Jan 3, 2024

View reviewed changes

carterbox added 3 commits January 3, 2024 14:34

NEW: Implement an in-memory logger

0eca716

NEW: Add a function to save a model checkpoint

0aeef4e

REF: Create both test and validate dataloaders

4e68a77

BUG: Filter nans out of metrics plot

051f4dc

carterbox merged commit 78e4ec9 into mcherukara:package Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Replace custom Trainer with Pytorch Lightning #22

API: Replace custom Trainer with Pytorch Lightning #22

carterbox commented Dec 22, 2023

carterbox commented Jan 2, 2024

stevehenke left a comment

carterbox commented Jan 3, 2024 •

edited

Loading

carterbox commented Jan 16, 2024

API: Replace custom Trainer with Pytorch Lightning #22

API: Replace custom Trainer with Pytorch Lightning #22

Conversation

carterbox commented Dec 22, 2023

carterbox commented Jan 2, 2024

stevehenke left a comment

Choose a reason for hiding this comment

carterbox commented Jan 3, 2024 • edited Loading

carterbox commented Jan 16, 2024

carterbox commented Jan 3, 2024 •

edited

Loading