Skip to content

Fork of Tensorpack to make breaking performance improvements to the Mask RCNN example. Training is approximately 2x faster than the original implementation on AWS.

License

Notifications You must be signed in to change notification settings

aws-samples/mask-rcnn-tensorflow

Mask R-CNN

This is an optimized version of Mask R-CNN based on TensorFlow 2.x, and Tensorpack Faster R-CNN/Mask R-CNN on COCO implementation.

Overview

This implementation of Mask R-CNN is focused on increasing training throughput without sacrificing accuracy. We do this by training with a per-GPU batch size > 1. Tensorpack Faster R-CNN/Mask R-CNN on COCO implementation only supports a per-GPU batch size of 1.

This implementation does not make use of any custom TensorFlow Ops.

This implementation supports Horovod for multi-node, multi-GPU distributed training.

NOTE

The deprecated Mask R-CNN implementation based on TensorFlow 1.15 with custom TensorFlow Ops is available on the branch tf-1.15-with-custom-ops.

Training convergence

Training on N GPUs with a per-gpu batch size of M = NxM total batch size.

Training converges to target accuracy for configurations from 8x1 up to 32x4 training. Training throughput is substantially improved from original Tensorpack code.

Training data

  • We are using COCO 2017, you can download the data from COCO data.
  • The pre-trained resnet backbone can be downloaded from ImageNet-R50-AlignPadding.npz
  • The file folder needs to have the following directory structure:
  data/
    annotations/
      instances_train2017.json
      instances_val2017.json
    pretrained-models/
      ImageNet-R50-AlignPadding.npz
    train2017/
      # image files that are mentioned in the corresponding json
    val2017/
      # image files that are mentioned in corresponding json

Login to Amazon Elastic Container Registry (ECR)

Dockerfile used in this project is based on AWS Deep-learning Container Images. Before you can build the Dockerfile, You must login into Amazon Elastic Container Registry (ECR) in us-west-2 using following command in your Visual Studio Code Terminal:

aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com

When your AWS session expires, you will need to login again.

Launch training in Visual Studio Code

For training on a GPU enabled desktop, we recommend using Visual Studio Code. Install Python and Docker extensions for the Visual Studio Code.

The docker-run: debug task defined in tasks.json runs training in a Docker container using the docker extension. The Docker image for the task is built automatically using Dockerfile when you run the task. This Visual Studio Code task enables debugging, as well.

Configure Docker container volumes localPath in tasks.json. Configure training script configuration in launch.json.

Launch notebooks in Visual Studio Code

We provide two Jupyer notebooks:

The docker-run: notebooks Visual Studio Code task defined in tasks.json runs Jupyter Lab in a Docker container using the docker extension. The Docker image for the task is built automatically using Dockerfile when you run the task.

Configure Docker container volumes localPath in tasks.json.

When you run the docker-run: notebooks Visual Studio Code task, the Jupyter Lab server runs in a detached container. To connect to the Jupyter Lab notebook in your browser, execute following steps:

  • When you run the docker-run: notebooks task in Visual Studio Code, you will see the container-id printed in the Visual Studio Code Terminal.
  • Use the container-id to connect to the container using following command in a terminal: docker exec -it container-id /bin/bash
  • At the shell prompt inside the container, run the following command: cat nohup.out

This will print the instructions for connecting to the Jupyter Lab server in a browser. When you are done using the notebooks, close the browser window, and stop the Jupyter Lab container using the command: docker stop container-id.

Tensorpack compatibility

This implementation was originally forked from the Tensorpack repo at commit a9dce5b220dca34b15122a9329ba9ff055e8edc6. Tensorpack code in this repo has been updated since the original fork to support TensorFlow 2.x, and is approximately equivalent to Tensorpack commit fac024f0f72fd593ea243f0b599a51b11fe4effd.

Codebase

See Codebase for details about the code.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the LICENSE.

About

Fork of Tensorpack to make breaking performance improvements to the Mask RCNN example. Training is approximately 2x faster than the original implementation on AWS.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published