Training model on Quora dataset #8

TheMnBN · 2020-07-28T10:01:36Z

Hi all,

I ran into an OOM error while trying to train the model, on a 1-GPU workstation (GTX 1080TI). A snip of the error log can be seen below. Has anyone been successful in training this model with similar hardware?
Thanks in advance!

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,98,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[Node: block-1/align/sub_1/mul_5 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block-1/align/sub_1/mul_2, block-1/align/sub_1/add_1)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[Node: stack_8/_159 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6954_stack_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Edit: Decreasing batch size to 64 (default was 512) fixed this issue. I will try to see what's the maximum batch size that the 1080TI can handle.

The text was updated successfully, but these errors were encountered:

hitvoice · 2020-08-21T09:35:55Z

A single V100(32G) can run this experiment. For a 16GB GPU, in my previous runs setting the batch size to 488 leads to similar results.

Decreasing batch size to 64 may result in worse performance.

hitvoice · 2020-08-21T09:36:09Z

Sorry for the late reply....

Jch520 · 2021-09-10T06:03:35Z

Hello, I'm sorry to bother you, can it be convenient for you to provide the code for calculating network parameters? Thank you!

hitvoice · 2021-09-14T12:21:04Z

Something like sum(p.numel() for p in model.parameters() if p.requires_grad)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training model on Quora dataset #8

Training model on Quora dataset #8

TheMnBN commented Jul 28, 2020 •

edited

Loading

hitvoice commented Aug 21, 2020

hitvoice commented Aug 21, 2020

Jch520 commented Sep 10, 2021

hitvoice commented Sep 14, 2021

Training model on Quora dataset #8

Training model on Quora dataset #8

Comments

TheMnBN commented Jul 28, 2020 • edited Loading

hitvoice commented Aug 21, 2020

hitvoice commented Aug 21, 2020

Jch520 commented Sep 10, 2021

hitvoice commented Sep 14, 2021

TheMnBN commented Jul 28, 2020 •

edited

Loading