How to optimize Topaz training parameters and interpret training outputs #170

tonyl4 · 2023-06-08T22:46:16Z

tonyl4
Jun 8, 2023

Edit: This is a re-written post after I realized some of my problems came from an issue with moving particle stacks between Cryosparc and Relion. However, since fixing this issue I still have many questions about Topaz training and extraction, listed below.

Main Post:

Hi, newer Topaz user here. I have run into a few issues when using Topaz training / particle extraction with my datasets and this has led to several questions about how I can best train my models for Topaz particle picking. For completeness, let me give some background on my sample and workflow so far:

I have a dataset of ~11,000 micrographs which were imported and motion-corrected in Relion. I went through the conventional process of hand-picking particles, training an initial round of Topaz with these hand-picked particles, and picking/extracting on a subset of ~1000 micrographs (all done with Relion's implementation of Topaz). From there, I processed these extracted particles in Cryosparc to a ~5000 particle stack, re-trained Topaz, and picked on all ~11,000 micrographs. Finally, I processed this second-round extraction in Cryosparc down to a clean ~70,000 particle stack which gives a 3.5A reconstruction. I am now trying to train Topaz with these ~70,000 particles to do a final "big" extraction so I can squeeze out as many particles as I can.

Usually, I use the default Topaz settings in Relion for training. However, I noticed that with this dataset (and others for this protein), training often proceeds slowly (i.e. many epochs needed), and in the end, does not always yield that many good particles back. As an illustration, these are the plotted Topaz training test metrics after training with the "clean" ~70,000 particle stack on ~11,000 micrographs using default Relion Topaz training settings and 30 epochs ("Model 1"). The specific parameters for this training were: 50 expected particles/micrograph (estimated by eye, generally ranged from 20-100 / micrograph, a proportion of micrographs had few to no particles), resnet8_u32, number of epochs = 30, epoch size = 1000, mini-batch size = 256, 240A diameter particle with micrographs at 0.73A/pixel, Relion auto-downscaled 9x and set Topaz radius to 5 downscaled pixels.

Model 1: (defaults, 30 epochs, 50 particles/micrograph)

Looking at the metrics plotted here, it looks like the training is proceeding okay according to the AUPRC curve, but is going as long as 30 epochs without plateauing. I thought that maybe my model was training slowly because it is updating / training too little in each epoch, and so I tried to increase the amount of training per epoch. Specifically, I re-trained Topaz on the same ~70,000 particle stack, but increased epoch size from 1000 to 10000 and lowered mini-batch size from 256 to 64. I also experimented with the number of expected particles: "Model 2" = 50 particles / micrographs (same as Model 1), "Model 3" = 75 particles / micrographs, "Model 4" = 100 particles / micrographs.

Model 2: (30 epochs, 10000 epoch size, 64 mini-batch size, 50 particles/micrograph)

Model 3: (30 epochs, 10000 epoch size, 64 mini-batch size, 75 particles/micrograph)

Model 4: (30 epochs, 10000 epoch size, 64 mini-batch size, 100 particles/micrograph)

It looks like training may have improved with some of these parameter changes, but I am confused in how to properly interpret all of these metrics, and collectively have several questions about how to best understand and use Topaz:

Q1. How should I interpret the above training metrics? From the github walkthroughs, it seems that AUPRC is the recommended one to watch for, especially to watch for how high the curve goes and at what epoch it starts plateauing, but what about other metrics? Interestingly it looks like precision appears to stay flat across epochs for most training jobs, but does sometimes go up (see: Models 2 and 4). TPR also seems to stay somewhat flat, and looks to be influenced more so by the expected number of particles than training itself (see: Models 2 through 4). On the other hand, loss usually goes up with training for my datasets, but should it not instead go down if training is working properly?

Q2. (*also see related Q3) Why are my models training so slowly (i.e. many epochs needed, large epoch size needed, etc), even on clean particle subsets? It seems that training Topaz with as many as 30 epochs is uncommon, but I see consistent and continual increases for the AUPRC curve in my datasets. In fact, looking at the AUPRC curve for an earlier Topaz training with only ~5000 particles similarly reflects slow training and the need for >10 epochs, so I am not sure if its just about the number of particles I am training with.

I also noted above, that although TPR usually goes up with epochs, it rarely increases past epoch 10 and appears to be influenced more so by the number of expected particles (note the respective increases in TPR for Models 2, 3, and 4). Meanwhile, precision usually stays flat across epochs and only sometimes goes up with training (Models 2 and 4); so again, I am not sure if the AUPRC curve is really the best thing to watch for here and what the significance of these patterns are.

Considering this slow training, are there any other parameter changes you would recommend I make?

Q3. What is the relationship between epochs, epoch size, mini-batch size, and the number of training particles? I understand that an epoch is a training cycle that had one full pass through all of the data (i.e. went through all ~70,000 particles once) and the number of epochs determines how many times Topaz goes through all of this data. I also understand that epoch size is the number of model parameter updates per epoch.

I am not sure if I am thinking about this correctly, but would this mean that training on 1000 particles with an epoch size of 1000 gives you, abstractly, one parameter update per training particle per epoch? And if instead you trained on 70,000 particles with an epoch size of 1000, you would have now only one parameter update per ~70 training particles per epoch? Stemming from this, would the higher ratio of particles : parameter updates in the latter case mean that the model is "training less" on a per-particle basis?

If the above is true, to me, this would imply that using more particles would not always improve training per se when compared to training with less particles above some threshold. Instead, I would suspect that training with more particles would only improve training if you concomitantly increased epoch number and epoch size, and maybe decreased mini-batch size. Hence, should I always use a higher number of epochs / larger epoch size / smaller mini-batch size when training with more particles? If yes, what would recommended starting values be (is an epoch size of 10000 enough for 70000 particles, or is even higher better)?

I do understand that training with more particles is not always necessary, but assuming I did want to, and did so without changing the default settings, am I limiting the training process by keeping these default? I am not sure if an idea like this explains why I would see such different training behaviour when changing the above parameters. I think an interesting point to note is that for Model 1, the AUPRC does not plateau even by epoch 30, but for Models 2-4, where epoch size is higher and mini-batch size is smaller, epoch 30 appears to be the beginning of the AUPRC plateau.

Another idea to optimize training, is that maybe it is better to pick 10,000 particles from the 70,000 particle stack and train with many epochs / large epoch size / small mini-batch size, as opposed to keeping all 70,000 particles in for training?

Lastly, I am confused how the above (epoch size, mini-batch size, # of particles training on) relates to the more often talked about concepts of batches and iterations in machine learning?

Q4. Considering that I am now training for many epochs and with a large epoch size / small mini-batch size, at what point should I be concerned about "over-fitting"? Is there a general feel for what over-fitting looks like with ~50,000 - 100,000 training particles? Or, can we watch trends in the training output to know where a good cut off for training is? Is it the beginning of the plateau for an AUPRC curve?

Q5. In some training plot outputs it appears that the last epoch is not always "the best" one nor better than the ones preceding it. An example could be in the Model 1 output, where epoch 28 and 29 have higher AUPRC (as well as TPR and precision) values than epoch 30. Does this actually mean that epochs 28/29 are "better trained" than epoch 30, and if I had to choose one to extract with, I should choose epoch 28 or 29, and not epoch 30? Or, is there significant enough variability in the test procedures that these differences are maybe just noise and not necessarily meaningful?

Q6. Relatedly, I understand that because the particles are partially unlabeled, precision and AUPRC can never approach 1 in a real dataset and also vary for each dataset and training / test split. However, if I am always training and picking with the same set of particles, but changing parameters (e.g. epoch size = 1000 vs = 10000), can the training metrics be directly compared across different models, or will the random assortment of work and test particles during training prohibit this? Basically, can I ever cross-compare training statistics for models without using cross-validation?

If I do always need to use cross-validation for inter-model comparisons, is there an implementation for cross-validation in Relion, or can it only be done with the scripts provided in the cross-validation github walkthroughs and in Cryosparc?

Q7. I can see from Models 2 through 4, that increasing the number of expected particles directly increases recall (TPR goes up, in parallel, total number of picked particles in Topaz extraction goes up). However, the AUPRC curve looks to get a bit worse as we move from Models 2 to 4. I interpret this to mean that although we increase recall (i.e. TPR) with a higher number of expected particles, we also pick more junk (lower precision) as the model is trying to pick a higher set number of particles in each micrograph. From your experience, is this normal behaviour? Do Topaz models normally not reach TPRs of >=0.9 without setting a very high number of expected particles? Or is this actually routinely achievable and suggests that my model is just not very well trained and so other parameters need to be adjusted?

As well, in general, would you suggest that the number of expected particles be slightly on the higher side to increase recovery / TPR, or is it better for training for it to be on the lower side and to sacrifice TPR?

Q8. Relatedly, at the end of day, in your experience, what are metrics that you would accept as "good" for a final model that was trained to be the best it could be? I assume you would look for AUPRC to start to plateau, but would you look for certain metrics to hit specific value thresholds (e.g. TPR > 0.9, FPR < 0.01, loss < 0.1)?

Q9. Considering my dataset and training outputs, should I start routinely using different training settings to start with so I do not have to re-run training so much when working with this protein? As a side note, I looked to the default settings of different Topaz implementations to help decide what good starting settings would be, but I noticed that the Relion defaults (10 epochs, 1000 epoch size, 256 mini-batch size) differ considerably from the Topaz GUI defaults (10 epochs, 5000 epoch size, 256 mini-batch size) and the Cryosparc defaults (10 epochs, 5000 epoch size, 128 mini-batch size). I wonder which is generally most appropriate for the average dataset and workflow?

Q10. Lastly, moving away from training, I was confused by how Topaz extraction was implemented in Relion and was wondering if you could shed some light. I understand that in base Topaz there is a cut-off value for particle extraction that must be set ("--threshold / -t"), and that it is recommended to use the desired percent recall as a guideline for setting this extraction threshold. However, in Relion, particle extraction is split across 2 jobs: i) particle picking in "AutoPicker", and ii) particle extraction in "Extract". In (i) AutoPicker, there is a setting for a "picking threshold" (default I think is 0.8, I set mine to 0.05). And in (ii) Extract, there is different setting for optionally setting an "autopick FOM threshold". For this setting I usually look at the logfile.pdf output from the AutoPicker particle picking job and set it to a value that would include ~90% of micrographs by the micrograph average pick FOM (usually a cutoff of -2 to -4 works).

Is Relion's autopick FOM threshold setting in (ii) Extract the same as the base Topaz "--threshold / -t"? Conversely, what is the role of Relion's "picking threshold" in (i) AutoPicker, and is there an equivalent in base Topaz? Is there a recommended default to use (if not 0.8)? Additionally, how is each particle's FOM value calculated? Lastly, I also noticed that when I increase epoch size from 1000 to 10000 and decrease mini-batch size from 256 to 64, the range of FOM values for micrographs changes (from about -6 to 8 with defaults, to -10 to 60 with the adjusted settings) - why is that so?

I am sorry that this is so long and in the likely case that I have seriously misunderstood aspects of Topaz. Any clarifications on the above results and questions would be greatly appreciated. Thank you!

Guillawme · 2024-03-01T09:57:38Z

Guillawme
Mar 1, 2024

Hello,

I don't have time to read all your questions now, and am sure anyway I don't have answers to all of them. But some of this has also been discussed on the cryoSPARC forum, if this is helpful to you: https://discuss.cryosparc.com/t/understanding-topaz-output/12325

1 reply

tonyl4 Mar 1, 2024
Author

Hi,

Thanks for the response, I definitely wrote too many questions!
Thanks for link, I've read it before and other forums, definitely these have helped a lot.
In the end I realized that my particles simply cannot all be picked with one round of Topaz, no matter how you train the model.
Instead, training and picking particles in 2 - 3 rounds (and removing duplicates..) is what maximizes the number of picked particles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to optimize Topaz training parameters and interpret training outputs #170

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to optimize Topaz training parameters and interpret training outputs #170

tonyl4 Jun 8, 2023

Replies: 1 comment · 1 reply

Guillawme Mar 1, 2024

tonyl4 Mar 1, 2024 Author

tonyl4
Jun 8, 2023

Replies: 1 comment 1 reply

Guillawme
Mar 1, 2024

tonyl4 Mar 1, 2024
Author