Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问单GPU如何训练、测试? #21

Open
sunyclj opened this issue Jun 1, 2023 · 4 comments
Open

请问单GPU如何训练、测试? #21

sunyclj opened this issue Jun 1, 2023 · 4 comments

Comments

@sunyclj
Copy link

sunyclj commented Jun 1, 2023

把python分布式命令
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT
$(dirname "$0")/train.py $CONFIG --launcher pytorch ${@:3}
改成
python ./tools/train.py ./configs/FTVSR_reds4.py
运行模型训练报错:KeyError: 'TTVSR is not in the model registry'

请问是什么原因呢?

@ericzw
Copy link
Member

ericzw commented Jun 2, 2023

尝试一下用分布式启动,GPU指定为1呢?

@sunyclj
Copy link
Author

sunyclj commented Jun 6, 2023

尝试一下用分布式启动,GPU指定为1呢?

谢谢,已解决!把模型改成X2超分训练至iter=48k,序列长度是10,日志输出 loss: 0.0099,但是测试结果包含很多类似马赛克的效果:
frame_0011
请问这是正常的现象吗?我不确定是否还有必要继续训练(总的iters=400k)

@ericzw
Copy link
Member

ericzw commented Jun 6, 2023

尝试一下用分布式启动,GPU指定为1呢?

谢谢,已解决!把模型改成X2超分训练至iter=48k,序列长度是10,日志输出 loss: 0.0099,但是测试结果包含很多类似马赛克的效果: frame_0011 请问这是正常的现象吗?我不确定是否还有必要继续训练(总的iters=400k)

我没有遇到过这个情况,或许等它再训练看看?

@sunyclj
Copy link
Author

sunyclj commented Jun 7, 2023

尝试一下用分布式启动,GPU指定为1呢?

谢谢,已解决!把模型改成X2超分训练至iter=48k,序列长度是10,日志输出 loss: 0.0099,但是测试结果包含很多类似马赛克的效果: frame_0011 请问这是正常的现象吗?我不确定是否还有必要继续训练(总的iters=400k)

我没有遇到过这个情况,或许等它再训练看看?

你好,麻烦再请教一下显存占用问题,有如下几种设置:(1)x2超分,输入LR尺寸大小是1281283,FTVSR模型的num_blocks=60;(2)x2超分,输入LR尺寸大小是64643,FTVSR模型的num_blocks=60;(3)x4超分,输入LR尺寸大小是64643,FTVSR模型的num_blocks=60;(4)x4超分,输入LR尺寸大小是64643,FTVSR模型的num_blocks=30;
在4张A100,每张A100显存40G,这四种设置在训练时,最大显存占用基本一致,都是39.4G左右;且在fix_ttvsr之后,显存也是一直增加,直到39.4G,请问这是什么原因呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants