We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Of note: I've read #3972 but it didn't help much.
I tried to train DeepLabV3+ architecture with a customized config having ResNet18 (converted to .pkl from https://download.pytorch.org/models/resnet18-f37072fd.pth) as the backbone:
.pkl
_BASE_: "detectron2/projects/DeepLab/configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml" MODEL: WEIGHTS: "r18.pkl" BACKBONE: NAME: "build_resnet_backbone" RESNETS: DEPTH: 18 RES2_OUT_CHANNELS: 64 STEM_OUT_CHANNELS: 64 RES5_DILATION: 1 NUM_GROUPS: 1 ROI_HEADS: NUM_CLASSES: 1
on MoNuSeg 2020 dataset.
cfg = get_cfg() add_deeplab_config(cfg) cfg.merge_from_file('/kaggle/input/deeplab-v3-plus-models/deeplab_v3_plus_R_18_os16_mg124_poly_90k_bs16.yaml') cfg.DATASETS.TRAIN = ('monuseg_train',) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 2 cfg.SOLVER.IMS_PER_BATCH = 16 cfg.SOLVER.BASE_LR = 0.01 cfg.SOLVER.MAX_ITER = 300 cfg.SOLVER.LR_SCHEDULER_NAME = 'WarmupMultiStepLR' cfg.SOLVER.STEPS = [] cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()
I also used the following lines to also try Trainer code (from DeepLab project train_net.py):
Trainer
train_net.py
cfg.SOLVER.LR_SCHEDULER_NAME = 'WarmupPolyLR' trainer = Trainer(cfg)
[11/20 16:16:06 d2.engine.defaults]: Model: SemanticSegmentor( (backbone): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (res2): Sequential( (0): BasicBlock( (conv1): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (res3): Sequential( (0): BasicBlock( (shortcut): Conv2d( 64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d( 64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (res4): Sequential( (0): BasicBlock( (shortcut): Conv2d( 128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d( 128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (res5): Sequential( (0): BasicBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv1): Conv2d( 256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) (sem_seg_head): DeepLabV3PlusHead( (decoder): ModuleDict( (res2): ModuleDict( (project_conv): Conv2d( 64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): SyncBatchNorm(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (fuse_conv): Sequential( (0): Conv2d( 304, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (res5): ModuleDict( (project_conv): ASPP( (convs): ModuleList( (0): Conv2d( 512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): Conv2d( 512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (2): Conv2d( 512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (3): Conv2d( 512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (4): Sequential( (0): AvgPool2d(kernel_size=(16, 32), stride=1, padding=0) (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) ) ) (project): Conv2d( 1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (fuse_conv): None ) ) (predictor): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1)) (loss): DeepLabCE( (criterion): CrossEntropyLoss() ) ) ) [11/20 16:16:07 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [RandomCrop(crop_type='absolute', crop_size=[512, 1024]), ResizeShortestEdge(short_edge_length=(512, 768, 1024, 1280, 1536, 1792, 2048), max_size=4096, sample_style='choice'), RandomFlip()] [11/20 16:16:07 d2.data.build]: Using training sampler TrainingSampler [11/20 16:16:07 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common.NumpySerializedList'> [11/20 16:16:07 d2.data.common]: Serializing 37 elements to byte tensors and concatenating them all ... [11/20 16:16:07 d2.data.common]: Serialized dataset takes 0.01 MiB [11/20 16:16:12 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone: | Names in Model | Names in Checkpoint | Shapes | |:------------------|:----------------------------------------------------------------------------------|:------------------------------------------| | res2.0.conv1.* | res2.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) | | res2.0.conv2.* | res2.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) | | res2.1.conv1.* | res2.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) | | res2.1.conv2.* | res2.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) | | res3.0.conv1.* | res3.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,64,3,3) | | res3.0.conv2.* | res3.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) | | res3.0.shortcut.* | res3.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,64,1,1) | | res3.1.conv1.* | res3.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) | | res3.1.conv2.* | res3.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) | | res4.0.conv1.* | res4.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,128,3,3) | | res4.0.conv2.* | res4.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) | | res4.0.shortcut.* | res4.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,128,1,1) | | res4.1.conv1.* | res4.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) | | res4.1.conv2.* | res4.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) | | res5.0.conv1.* | res5.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,256,3,3) | | res5.0.conv2.* | res5.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) | | res5.0.shortcut.* | res5.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,256,1,1) | | res5.1.conv1.* | res5.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) | | res5.1.conv2.* | res5.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) | | stem.conv1.* | stem.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,3,7,7) | [11/20 16:16:14 d2.engine.train_loop]: Starting training from iteration 0 ERROR [11/20 16:16:22 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 274, in run_step loss_dict = self.model(data) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/meta_arch/semantic_seg.py", line 108, in forward features = self.backbone(images.tensor) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 445, in forward x = self.stem(x) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 356, in forward x = self.conv1(x) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 117, in forward x = self.norm(x) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 731, in forward world_size = torch.distributed.get_world_size(process_group) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 867, in get_world_size return _get_group_size(group) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 325, in _get_group_size default_pg = _get_default_group() File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 430, in _get_default_group "Default process group has not been initialized, " RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. [11/20 16:16:22 d2.engine.hooks]: Total training time: 0:00:08 (0:00:00 on hooks) [11/20 16:16:22 d2.utils.events]: iter: 0 lr: N/A max_mem: 7348M
GPU T4 x2
GPU P100
---------------------- ------------------------------------------------------------------------------- sys.platform linux Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] numpy 1.21.6 detectron2 0.6 @/opt/conda/lib/python3.7/site-packages/detectron2 Compiler GCC 9.4 CUDA compiler CUDA 11.0 detectron2 arch flags 7.5 DETECTRON2_ENV_MODULE <not set> PyTorch 1.11.0 @/opt/conda/lib/python3.7/site-packages/torch PyTorch debug build False GPU available Yes GPU 0,1 Tesla T4 (arch=7.5) Driver version 470.82.01 CUDA_HOME /usr/local/cuda Pillow 9.1.1 torchvision 0.12.0 @/opt/conda/lib/python3.7/site-packages/torchvision torchvision arch flags 3.7, 6.0, 7.0, 7.5 fvcore 0.1.5.post20220512 iopath 0.1.9 cv2 4.5.4 ---------------------- ------------------------------------------------------------------------------- PyTorch built with: - GCC 9.4 - C++ Version: 201402 - Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX512 - CUDA Runtime 11.0 - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75 - CuDNN 8.0.5 - Magma 2.5.2 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.0, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, Testing NCCL connectivity ... this should not hang. NCCL succeeded.
---------------------- ---------------------------------------------------------------- sys.platform linux Python 3.7.15 (default, Oct 12 2022, 19:14:55) [GCC 7.5.0] numpy 1.21.6 detectron2 0.6 @/usr/local/lib/python3.7/dist-packages/detectron2 Compiler GCC 7.5 CUDA compiler CUDA 11.2 detectron2 arch flags 7.5 DETECTRON2_ENV_MODULE <not set> PyTorch 1.12.1+cu113 @/usr/local/lib/python3.7/dist-packages/torch PyTorch debug build False GPU available Yes GPU 0 Tesla T4 (arch=7.5) Driver version 460.32.03 CUDA_HOME /usr/local/cuda Pillow 7.1.2 torchvision 0.13.1+cu113 @/usr/local/lib/python3.7/dist-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20220512 iopath 0.1.9 cv2 4.6.0 ---------------------- ---------------------------------------------------------------- PyTorch built with: - GCC 9.3 - C++ Version: 201402 - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX2 - CUDA Runtime 11.3 - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86 - CuDNN 8.3.2 (built against CUDA 11.5) - Magma 2.5.2 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
The text was updated successfully, but these errors were encountered:
I've read #3972 but it didn't help much.
Sorry, something went wrong.
Closing as duplicate of #3972. If the answer there is not clear for you, ask for clarification there instead.
No branches or pull requests
Of note: I've read #3972 but it didn't help much.
Instructions To Reproduce the Issue:
I tried to train DeepLabV3+ architecture with a customized config having ResNet18 (converted to
.pkl
from https://download.pytorch.org/models/resnet18-f37072fd.pth) as the backbone:on MoNuSeg 2020 dataset.
I also used the following lines to also try
Trainer
code (from DeepLab projecttrain_net.py
):Environment:
GPU T4 x2
andGPU P100
:The text was updated successfully, but these errors were encountered: