Question about model selection on the ETH/UCY datasets #38

YuejiangLIU · 2021-03-27T22:46:31Z

Thanks for sharing the code of your great work.

From the provided commands and the pre-trained models, it looks like all training processes on the ETH/UCY datasets are terminated at the 100th epoch.

When using your code, shall we simply pick the last checkpoint to test or we'd better go through all checkpoints saved during training? Was the validation set used in model selection?

Thanks

BorisIvanovic · 2021-03-27T23:03:58Z

Hi @YuejiangLIU, thanks for the great questions!

We specifically chose the 100th epoch because the losses more or less flattened out by that point and the performance on the validation set didn't look as if it was going to improve further (we looked at the validation set just to make sure we weren't leaving tons of performance on the table, but yeah our hyperparameter tuning was pretty light overall).

Sure, we likely could have stopped training earlier and obtained similar (maybe even better) performance, but:

The performance of the models was already fine at that point, and
We didn't want to hyperparameter optimize ad nauseum, so we just trained all of the models to the same point with similar hyperparameters.

It's a balancing act: one could probably get better performance with more hyperparameter tuning/earlier stopping/etc, but at the end of the day we figured the performance gains would have been marginal and not really worth the added time investment.

For your personal use case, I'd say it's up to you. If you're looking to absolutely maximize performance then feel free to pick the best training iteration per model, otherwise I wouldn't worry about it and feel that 100 epochs is fine.

YuejiangLIU · 2021-03-30T22:11:36Z

Thanks a lot for your detailed response!

Just wanted to ask a follow-up question regarding the experimental results. I've been trying to validate a contrastive method on top of your model. To have a clean and reproducible comparison, I'd hope to make the training as deterministic as possible.

However, the variance of experimental results seems persistent and perceptible on different machines. When using a fixed random seed, I constantly see identical results across multiple runs on the same machine, but different values (~10%) of ADE/FDE on different machines. Have you come across the same issue before?

I've tried the tips in the official PyTorch Reproducibility docs.

The torch.use_deterministic_algorithms() / torch.set_deterministic() is unfortunately not applicable in the current code base, as it triggers 'RuntimeError: index_add_cuda_ does not have a deterministic implementation'.
The rest tips do not resolve the issue.

IIUC the current code does not explicitly call the 'index_add_()' method. Do you know if there is any part that may have used the 'index_add_()' at a lower level?

BorisIvanovic · 2021-03-31T00:29:22Z

First of all, very glad to see that you're building on top of our work!

Unfortunately, I have only been using one machine to develop Trajectron++ so constantly seeing identical results across multiple runs on the same machine is my experience based on my hardware setup. Sorry that I don't have more advice to give along this direction :(

Regarding index_add_(), I'm certain we don't call that explicitly in the code. Unfortunately, I also don't know a lot of the internal algorithms used within PyTorch, my first thought would be to search the actual PyTorch codebase for mentions of index_add_ and maybe that will shed some light on what component could be using it within our model?

YuejiangLIU · 2021-03-31T09:45:25Z

Thanks again @BorisIvanovic for your timely and helpful reply! I'll have a deeper search of this index_add_ operation in the source code then. Thanks!

BorisIvanovic closed this as completed Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about model selection on the ETH/UCY datasets #38

Question about model selection on the ETH/UCY datasets #38

YuejiangLIU commented Mar 27, 2021

BorisIvanovic commented Mar 27, 2021

YuejiangLIU commented Mar 30, 2021 •

edited

Loading

BorisIvanovic commented Mar 31, 2021

YuejiangLIU commented Mar 31, 2021

Question about model selection on the ETH/UCY datasets #38

Question about model selection on the ETH/UCY datasets #38

Comments

YuejiangLIU commented Mar 27, 2021

BorisIvanovic commented Mar 27, 2021

YuejiangLIU commented Mar 30, 2021 • edited Loading

BorisIvanovic commented Mar 31, 2021

YuejiangLIU commented Mar 31, 2021

YuejiangLIU commented Mar 30, 2021 •

edited

Loading