Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Remove vocab from cuda #955

Closed
wants to merge 1 commit into from

Conversation

snisarg
Copy link
Contributor

@snisarg snisarg commented Sep 6, 2019

Summary:
We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Differential Revision: D17114398

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Sep 6, 2019
snisarg added a commit to snisarg/pytext that referenced this pull request Sep 21, 2019
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Differential Revision: D17114398

fbshipit-source-id: ba7b004c6e2e75af1ee9cff64eee563cf3e52435
snisarg added a commit to snisarg/pytext that referenced this pull request Sep 24, 2019
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Differential Revision: D17114398

fbshipit-source-id: a9f1791d83d67f331094e64f1574cf1c149deabf
snisarg added a commit to snisarg/pytext that referenced this pull request Sep 24, 2019
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Differential Revision: D17114398

fbshipit-source-id: e28b2981fbcbb248a6a704fd3c6e325fd45490e9
snisarg added a commit to snisarg/pytext that referenced this pull request Sep 24, 2019
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Differential Revision: D17114398

fbshipit-source-id: 840f37f77c70089137f2cf23a262dc503e5e2080
@snisarg snisarg force-pushed the export-D17114398 branch 2 times, most recently from 8a15419 to a5e9775 Compare September 26, 2019 20:36
snisarg added a commit to snisarg/pytext that referenced this pull request Sep 26, 2019
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Differential Revision: D17114398

fbshipit-source-id: 56343dd90a9e05d021650b9d765274a721dffa13
snisarg added a commit to snisarg/pytext that referenced this pull request Sep 26, 2019
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Differential Revision: D17114398

fbshipit-source-id: 8da9f9628c64f23ba751d6ceb63ffe1ce9b05c17
Summary:
Pull Request resolved: facebookresearch#955

We have users who can't train models on extremely large embeddings because we try to allocate space for that on the GPU.

With this diff, in training, we add a flag which users can set explicitly to keep the embedding layer on CPU even when the model is getting trained on GPUs. This is not default because we need the user to know that there will be a cost associated moving the tensors on and off the GPU.

Note that this only applies during training.

Also note that this does not work in a multi-GPU environment because of the way the weights are synced via NCCL.

Reviewed By: chenyangyu1988

Differential Revision: D17114398

fbshipit-source-id: 1d4c41940af0d69415b8e606899afcecc843b064
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 84adc39.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants