Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing Error while Fine Tuning on my dataset #52

Open
ujjawalcse opened this issue Sep 9, 2021 · 0 comments
Open

Facing Error while Fine Tuning on my dataset #52

ujjawalcse opened this issue Sep 9, 2021 · 0 comments

Comments

@ujjawalcse
Copy link

ujjawalcse commented Sep 9, 2021

Hi,

I prepared a custom dataset in COCO format having 3 classes like below:
But there is no samples related to 'Resumes' class. Only 'heading' and 'text' classes are present there in my sample.

"categories": [
        {
            "id": 0,
            "name": "Resumes",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "heading",
            "supercategory": "Resumes"
        },
        {
            "id": 2,
            "name": "text",
            "supercategory": "Resumes"
        }
    ],
"images":[{
            "id": 159,
            "license": 1,
            "file_name": "outfile_Lavanya-4_10---2--docx-pdf_1.rf.bfcd3081e0b5399829ae9b1bdc1e67d4.jpg",
            "height": 842,
            "width": 596,
            "date_captured": "2021-09-09T13:55:03+00:00"
        },
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 2,
            "bbox": [
                7,
                6,
                266.66666666666663,
                101.28205128205127
            ],
            "area": 27008.547008547,
            "segmentation": [
                [
                    7,
                    6,
                    266.66666666666663,
                    6,
                    266.66666666666663,
                    101.28205128205127,
                    7,
                    101.28205128205127
                ]
            ],
            "iscrowd": 0
        },
    ]

I'm using this config file for finetuning

_BASE_: "Base-RCNN-FPN.yaml"
MODEL:
  MASK_ON: True
#  WEIGHTS: "detectron2://ImageNetPretrained/FAIR/X-101-32x8d.pkl"
  #WEIGHTS: "detectron2://COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl"
  PIXEL_STD: [57.375, 57.120, 58.395]
  ROI_HEADS:
    NUM_CLASSES: 5
  RESNETS:
    STRIDE_IN_1X1: False  # this is a C2 model
    NUM_GROUPS: 32
    WIDTH_PER_GROUP: 8
    DEPTH: 101
DATASETS:
  TRAIN: ("dla_train",)
  TEST: ("dla_val",)
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 125500
  IMS_PER_BATCH: 2
  #BASE_LR: 0.0009
  BASE_LR: 0.00005
DATALOADER:
  NUM_WORKERS: 1

Got following error:-

WARNING [09/09 20:18:55 d2.data.datasets.coco]: 
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

[09/09 20:18:55 d2.data.datasets.coco]: Loaded 638 images in COCO format from ./data/train/annotations.json
[09/09 20:18:55 d2.data.build]: Removed 320 images with no usable annotations. 318 images left.
[09/09 20:18:55 d2.data.build]: Distribution of instances among all 3 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|  Resumes   | 0            |  heading   | 1028         |    text    | 1951         |
|            |              |            |              |            |              |
|   total    | 2979         |            |              |            |              |
[09/09 20:18:55 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[09/09 20:18:55 d2.data.build]: Using training sampler TrainingSampler
[09/09 20:18:56 d2.engine.train_loop]: Starting training from iteration 75500
ERROR [09/09 20:18:57 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 216, in run_step
    self._detect_anomaly(losses, loss_dict)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 239, in _detect_anomaly
    self.iter, loss_dict
FloatingPointError: Loss became infinite or NaN at iteration=75501!
loss_dict = {'loss_cls': tensor(nan, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_mask': tensor(0.7118, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_cls': tensor(0.6949, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rpn_loc': tensor(0.4812, device='cuda:0', grad_fn=<MulBackward0>)}

I tried to change the NUM_CLASSES: 5 to 3 but no luck.
Some suggested to reduce the LR still no luck.
Can anyone please suggest a way to tackle this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant