Replies: 2 comments
-
Please try:
1) nvidia-smi on your computation node to see a GPU is working or not;
2) test it using the training example. Use water_smth.json as the input
file. The training time should be around 2s for 100 batches.
Best,
Linfeng
…On Thu, May 30, 2019 at 11:20 PM fkxie ***@***.***> wrote:
Hi,
I want to train data using gpu accelerate.
When I use dp_test, dp_frz , they all abort:
2019-05-30 15:09:16.824616: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-05-30 15:09:22.765409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-05-30 15:09:23.098334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-05-30 15:09:23.437009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
But when I try dp_train, there's no information about gpu dumped. I think
it's still using cpu for train. And I think it's maybe a bug.
F.K.xie
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/deepmodeling/deepmd-kit/issues/35?email_source=notifications&email_token=AEJ6DC6XKRY3QUEXDSGILA3PX7WCJA5CNFSM4HRGBEMKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GWX63TQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEJ6DC2RXFHCELCOTVQ24M3PX7WCJANCNFSM4HRGBEMA>
.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi, The training time on my machine is about 2s every 100 batches, so maybe there's no other problem for me. Anyway, thanks for your replying. Best, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I want to train data using gpu accelerate.
When I use
dp_test
,dp_frz
, they all abort:But when I try
dp_train
, there's no information about gpu dumped. I think it's still using cpu for train. And I think it's maybe a bug.F.K.xie
Beta Was this translation helpful? Give feedback.
All reactions