Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

Closed
drawfish opened this issue Jun 27, 2019 · 11 comments
Closed

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

drawfish opened this issue Jun 27, 2019 · 11 comments
Labels

Comments

@drawfish
Copy link

drawfish commented Jun 27, 2019

Below is the exec info:

batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-
chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf --word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/gr
cc/wav.scp ark:/dev/null-kndiscount-5.0e-10/words.txt exp/nnet3_tdnn_online_game_transfer_final/tdnn-lstm/final.mdl exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst scp:data/dahua/mfc
batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000
--mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf '--word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/graph/words.txt' exp/nnet3_tdnn_onl
ine_game_transfer_final/tdnn-lstm/final.mdl 'exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst' scp:data/dahua/mfcc/wav.scp ark:/dev/null
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [6]: GeForce GTX 1080 Ti   free:10945M, used:227M, total:11172M, free/total:0.979682 version 6.1
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 5 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 11 orphan components.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:Collapse():nnet-utils.cc:1463) Added 6 components, removed 11
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 2 control threads, 20 worker threads and batch size 200
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260
ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.402~1-53346]:ComputeFeatures():online-cuda-feature-pipeline.cc:66) Assertion failed: (false)

[ Stack-Trace: ]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7f3271b6b31a]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7f3271b6bd88]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x
ae) [0x7f326fdc633a]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPip
eline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7f32745b5529]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x4fe) [0x7f32745b60fa]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder:
:BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7f32745b7f28]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f3270a0fc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f32711566ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f327047e41d]

Aborted (core dumped)

gdb core info:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feat'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f32703ac428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f30e106c700 (LWP 41345))]
(gdb) bt
#0  0x00007f32703ac428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f32703ae02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f3271b6bda1 in kaldi::KaldiAssertFailure_ (
    func=func@entry=0x7f326fdca460 <kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)::__func__> "ComputeFeatures",
    file=file@entry=0x7f326fdc9c00 "online-cuda-feature-pipeline.cc", line=line@entry=66, cond_str=cond_str@entry=0x7f326fdc8d77 "false") at kaldi-error.cc:234
#3  0x00007f326fdc633a in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7f30e106b330, cu_wave=..., sample_freq=8000, input_features=0x7f323fcfcaa0, ivector_features=0x7f323fcfca90)
    at online-cuda-feature-pipeline.cc:66
#4  0x00007f32745b5529 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7ffe36a937b0, first=first@entry=0,
    tasks=std::vector of length 1, capacity 200 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:616
#5  0x00007f32745b60fa in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7ffe36a937b0, threadId=<optimized out>) at batched-threaded-nnet3-cuda-pipeline.cc:841
#6  0x00007f32745b7f28 in std::_Mem_fn_base<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), true>::operator()<int, void>(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int&&) const (__object=<optimized out>, this=<optimized out>) at /usr/include/c++/5/functional:600
#7  std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)>::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=<optimized out>) at /usr/include/c++/5/functional:1531
#8  std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)>::operator()() (this=<optimized out>)
    at /usr/include/c++/5/functional:1520
#9  std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run() (
    this=<optimized out>) at /usr/include/c++/5/thread:115
#10 0x00007f3270a0fc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f32711566ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f327047e41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

@drawfish drawfish added the bug label Jun 27, 2019
@danpovey
Copy link
Contributor

danpovey commented Jun 27, 2019 via email

@cloudhan
Copy link
Contributor

cloudhan commented Jul 1, 2019

I encountered another problem:

ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.413~1-c4490]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

And the GDB stacktrace:

(gdb) 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fffd5044231 in __GI_abort () at abort.c:79
#2  0x0000555555a9f2f4 in kaldi::KaldiAssertFailure_ (
    func=func@entry=0x555555ace7a0 <kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)::__func__> "AddMatMat", file=file@entry=0x555555acc143 "cu-matrix.cc", line=line@entry=1305, 
    cond_str=cond_str@entry=0x555555acce1d "k == k1") at kaldi-error.cc:234
#3  0x000055555589ee97 in kaldi::CuMatrixBase<float>::AddMatMat (this=this@entry=0x7fff1dff2930, alpha=alpha@entry=1, A=..., 
    transA=transA@entry=kaldi::kNoTrans, B=..., transB=transB@entry=kaldi::kTrans, beta=beta@entry=0) at cu-matrix.cc:1305
#4  0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
    at online-ivector-feature-cuda.cc:58
#5  0x0000555555739684 in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7fff1dff2fe0, cu_wave=..., sample_freq=16000, 
    input_features=0x555574ff2ef0, ivector_features=0x555574ff2ee0) at online-cuda-feature-pipeline.cc:64
#6  0x0000555555715152 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7fffffffd760, first=first@entry=0, 
    tasks=std::vector of length 1, capacity 50 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:615
#7  0x0000555555715e2f in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7fffffffd760, threadId=<optimized out>)
    at batched-threaded-nnet3-cuda-pipeline.cc:841
#8  0x000055555571771b in std::__invoke_impl<void, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(std::__invoke_memfun_deref, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/7/bits/invoke.h:73
#9  std::__invoke<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__fn=<optimized out>)
    at /usr/include/c++/7/bits/invoke.h:95
#10 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/7/thread:234
#11 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::operator() (this=<optimized out>) at /usr/include/c++/7/thread:243
#12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> > >::_M_run (this=<optimized out>) at /usr/include/c++/7/thread:186
#13 0x00007fffd569b96f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff0e7e5aa in start_thread (arg=0x7fff1dffb700) at pthread_create.c:463
#15 0x00007fffd5104cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And additional debug info:

(gdb) frame 4
#4  0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
    at online-ivector-feature-cuda.cc:58
58	    lda_feats_normalized.AddMatMat(1.0, spliced_feats_normalized, kNoTrans,
(gdb) print spliced_feats_normalized
$1 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7ffcaa000000, num_cols_ = 280, num_rows_ = 1024, stride_ = 320}, <No data fields>}
(gdb) print cu_lda_
$2 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7fff5fbd8e00, num_cols_ = 281, num_rows_ = 40, stride_ = 320}, <No data fields>}

It runs fine when you disable the cuda feature pipeline.

@luitjens Would you mind look at it?

@luitjens
Copy link
Contributor

luitjens commented Jul 1, 2019 via email

@luitjens
Copy link
Contributor

luitjens commented Jul 2, 2019

I think Dan's assertion about ivectors is probably right. We tried placing the hooks in to not have ivectors but my guess is we are not resizing the matrix the right size. I'm heading on vacation in a day and can look at it when I return in 2 weeks. In the meantime i'd suggest turning gpu feature extraction off in the binary for this model. Or if you are really ambitious trying to fix it while i'm gone. Maybe @LeviBarnes will have time to look into this.

@dpny518
Copy link
Contributor

dpny518 commented Aug 2, 2019

I had the same error

batched-wav-nnet3-cuda --cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2 --cuda-use-tensor-cores=false --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 --config=model/conf/online.conf --word-symbol-table=model/graph/words.txt --max-batch-size=1 --cuda-worker-threads=2 model/final.mdl model/graph/HCLG.fst scp:results/temp/wav.scp ark:/dev/null 
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [0]: GeForce GTX 1080 Ti	free:10453M, used:723M, total:11177M, free/total:0.935236 version 6.1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 25 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 50 orphan components.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Collapse():nnet-utils.cc:1463) Added 25 components, removed 50
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 3 control threads, 2 worker threads and batch size 1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.434~1-e167b]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

[ Stack-Trace: ]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7fc0c6a1330a]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7fc0c6a13d78]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudamatrix.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0xf7) [0x7fc0c7c7a62d]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::IvectorExtractorFastCuda::GetIvector(kaldi::CuMatrixBase<float> const&, kaldi::CuVector<float>*)+0x1b2) [0x7fc0c4c6abc4]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x8d) [0x7fc0c4c6c339]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7fc0c9465303]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x503) [0x7fc0c94662a1]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7fc0c9467e7c]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc0c58b7c80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc0c5ffe6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0c532641d]

Here is my script that I use to decode it

cuda_flags="--cuda-use-tensor-cores=true 
--iterations=5 --cuda-memory-proportion=.5 
--max-batch-size=32 --cuda-control-threads=3 
--batch-drain-size=8 --cuda-worker-threads=2"

batched-wav-nnet3-cuda $cuda_flags  \
      --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 \
      --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 \
	  --config="$model"/conf/online.conf \
	  --word-symbol-table="$model"/graph/words.txt \
	  --max-batch-size=1 \
	  --cuda-worker-threads=2 \
	  "$model"/final.mdl \
	  "$model"/graph/HCLG.fst \
      "scp:$results/temp/wav.scp" \
	  "ark:/dev/null" 2>&1  | tee -a result.txt

@luitjens
Copy link
Contributor

luitjens commented Aug 2, 2019 via email

@dpny518
Copy link
Contributor

dpny518 commented Aug 5, 2019

ivector

--splice-config=model/conf/splice.conf
--cmvn-config=model/ivector/online_cmvn.conf
--lda-matrix=model/ivector/final.mat
--global-cmvn-stats=model/ivector/global_cmvn.stats
--diag-ubm=model/ivector/final.dubm
--ivector-extractor=model/ivector/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100

online.conf

--feature-type=mfcc
--mfcc-config=model/conf/mfcc.conf
--ivector-extraction-config=model/conf/ivector_extractor.conf
--frame-subsampling-factor=3
--add-pitch=true
--acoustic-scale=1
--minimize=false
--max-active=10000
--beam=15
--lattice-beam=8

mfcc

# config for high-resolution MFCC features, intended for neural network training.
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false       # use average of log energy, not energy.
--sample-frequency=16000 # AISHELL-2 is sampled at 16kHz
--num-mel-bins=40        # similar to Google's setup.
--num-ceps=40            # there is no dimensionality reduction.
--low-freq=20            # low cutoff frequency for mel bins
--high-freq=-400         # high cutoff frequency, relative to Nyquist of 8000 (=7600)

Also my WER is a lot worse with GPU than CPU, 88% WER vs 9.95 WER when i set -gpu-feature-extract=false

@cloudhan
Copy link
Contributor

cloudhan commented Aug 5, 2019

the problem is due to --add-pitch=true in online.conf, the pitch computing and processing has not been implemented with cuda code, clearly. And due to the complexity with pitch compute, I don't think it can be convert easily. The easiest way to walkaround this issue might be writing a data adapter for OnlineFeatureInferface and use CPU code for pitch extraction in short term.

Note, the complexity is not due to the algorithm behind it, but the online feature processing. The historic version pitch-functions.cc, which has only offline processing functionality, is pretty clear. The only problem is in latter online refactoring, from commit 16fb114 to c34e68d cause a pitch feature difference and I cannot backport it.

@luitjens
Copy link
Contributor

luitjens commented Aug 5, 2019 via email

@luitjens
Copy link
Contributor

Can you please try with the latest master? We just pushed in the FBANK code which also includes a fix for no ivectors.

@danpovey
Copy link
Contributor

Closing as likely has already been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants