Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT-OSS build image is over 844GB? #1373

Closed
mkserge opened this issue Jul 15, 2021 · 4 comments
Closed

TensorRT-OSS build image is over 844GB? #1373

mkserge opened this issue Jul 15, 2021 · 4 comments
Labels
OSS Build Issues building open source code triaged Issue has been triaged by maintainers

Comments

@mkserge
Copy link

mkserge commented Jul 15, 2021

Description

Hello,

Since NGC does not yet provide TensorRT containers with 8.0.6.X versions, I decided to build one following the instructions on building the TensorRT-OSS build container from the README here.

I end up with 844GB (!) docker image as reported by docker image ls command. Is this expected?

Thank you.

Environment

TensorRT Version: 8.0.6.1
NVIDIA GPU: N/A
NVIDIA Driver Version: N/A
CUDA Version: N/A
CUDNN Version: N/A
Operating System: MacOS
Python Version (if applicable): N/A
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if so, version): N/A

Steps To Reproduce

Follow the README.md on MacOS (I did also try on a Linux machine with similar results, although I didn't let the job finish)

git clone -b master https://github.com/nvidia/TensorRT TensorRT
cd TensorRT
git submodule update --init --recursive
./docker/build.sh --file docker/ubuntu-18.04.Dockerfile --tag tensorrt-ubuntu18.04-cuda11.3 --cuda 11.3.1
@mkserge
Copy link
Author

mkserge commented Jul 15, 2021

Hi,

I did some more digging and was able to find the cause I believe.

The issue is in the following line in the Dockerfile:

RUN groupadd -r -f -g ${gid} trtuser && useradd -o -r -u ${uid} -g ${gid} -ms /bin/bash trtuser

The useradd command above adds the new user to the lastlog and faillog databases, which are very large sparse files in the filesystem. I believe docker is not handling this well, creating a massive layer in the image. See similar issues reported here, for example.

Adding a --no-log-init to useradd command resolves the issue, dropping image size from 844GB to 14GB!

RUN groupadd -r -f -g ${gid} trtuser && useradd -o -r --no-log-init -u ${uid} -g ${gid} -ms /bin/bash trtuser

I can submit an MR with a fix if you are interested.

Thank you!

@mkserge mkserge changed the title TensorRT-OSS build container is over 844GB? TensorRT-OSS build image is over 844GB? Jul 15, 2021
@ttyio ttyio added OSS Build Issues building open source code triaged Issue has been triaged by maintainers labels Aug 2, 2021
@rajeevsrao
Copy link
Collaborator

Thanks @mkserge for the tip! We will fix this in the upcoming 8.2 GA release.

@mkserge
Copy link
Author

mkserge commented Nov 11, 2021

My pleasure! Looking forward to 8.2 :)

@ttyio
Copy link
Collaborator

ttyio commented Dec 15, 2021

closing, 8.2 GA is release, thanks @mkserge !

@ttyio ttyio closed this as completed Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OSS Build Issues building open source code triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants