Facing Error while Fine Tuning on my dataset #52

ujjawalcse · 2021-09-09T15:06:14Z

Hi,

I prepared a custom dataset in COCO format having 3 classes like below:
But there is no samples related to 'Resumes' class. Only 'heading' and 'text' classes are present there in my sample.

"categories": [
        {
            "id": 0,
            "name": "Resumes",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "heading",
            "supercategory": "Resumes"
        },
        {
            "id": 2,
            "name": "text",
            "supercategory": "Resumes"
        }
    ],
"images":[{
            "id": 159,
            "license": 1,
            "file_name": "outfile_Lavanya-4_10---2--docx-pdf_1.rf.bfcd3081e0b5399829ae9b1bdc1e67d4.jpg",
            "height": 842,
            "width": 596,
            "date_captured": "2021-09-09T13:55:03+00:00"
        },
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 2,
            "bbox": [
                7,
                6,
                266.66666666666663,
                101.28205128205127
            ],
            "area": 27008.547008547,
            "segmentation": [
                [
                    7,
                    6,
                    266.66666666666663,
                    6,
                    266.66666666666663,
                    101.28205128205127,
                    7,
                    101.28205128205127
                ]
            ],
            "iscrowd": 0
        },
    ]

I'm using this config file for finetuning

_BASE_: "Base-RCNN-FPN.yaml"
MODEL:
  MASK_ON: True
#  WEIGHTS: "detectron2://ImageNetPretrained/FAIR/X-101-32x8d.pkl"
  #WEIGHTS: "detectron2://COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl"
  PIXEL_STD: [57.375, 57.120, 58.395]
  ROI_HEADS:
    NUM_CLASSES: 5
  RESNETS:
    STRIDE_IN_1X1: False  # this is a C2 model
    NUM_GROUPS: 32
    WIDTH_PER_GROUP: 8
    DEPTH: 101
DATASETS:
  TRAIN: ("dla_train",)
  TEST: ("dla_val",)
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 125500
  IMS_PER_BATCH: 2
  #BASE_LR: 0.0009
  BASE_LR: 0.00005
DATALOADER:
  NUM_WORKERS: 1

Got following error:-

WARNING [09/09 20:18:55 d2.data.datasets.coco]: 
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

[09/09 20:18:55 d2.data.datasets.coco]: Loaded 638 images in COCO format from ./data/train/annotations.json
[09/09 20:18:55 d2.data.build]: Removed 320 images with no usable annotations. 318 images left.
[09/09 20:18:55 d2.data.build]: Distribution of instances among all 3 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|  Resumes   | 0            |  heading   | 1028         |    text    | 1951         |
|            |              |            |              |            |              |
|   total    | 2979         |            |              |            |              |
[09/09 20:18:55 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[09/09 20:18:55 d2.data.build]: Using training sampler TrainingSampler
[09/09 20:18:56 d2.engine.train_loop]: Starting training from iteration 75500
ERROR [09/09 20:18:57 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 216, in run_step
    self._detect_anomaly(losses, loss_dict)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 239, in _detect_anomaly
    self.iter, loss_dict
FloatingPointError: Loss became infinite or NaN at iteration=75501!
loss_dict = {'loss_cls': tensor(nan, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_mask': tensor(0.7118, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_cls': tensor(0.6949, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rpn_loc': tensor(0.4812, device='cuda:0', grad_fn=<MulBackward0>)}

I tried to change the NUM_CLASSES: 5 to 3 but no luck.
Some suggested to reduce the LR still no luck.
Can anyone please suggest a way to tackle this issue?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing Error while Fine Tuning on my dataset #52

Facing Error while Fine Tuning on my dataset #52

ujjawalcse commented Sep 9, 2021 •

edited

Loading

Facing Error while Fine Tuning on my dataset #52

Facing Error while Fine Tuning on my dataset #52

Comments

ujjawalcse commented Sep 9, 2021 • edited Loading

ujjawalcse commented Sep 9, 2021 •

edited

Loading