batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

drawfish · 2019-06-27T09:45:41Z

Below is the exec info:

batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-
chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf --word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/gr
cc/wav.scp ark:/dev/null-kndiscount-5.0e-10/words.txt exp/nnet3_tdnn_online_game_transfer_final/tdnn-lstm/final.mdl exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst scp:data/dahua/mfc
batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000
--mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf '--word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/graph/words.txt' exp/nnet3_tdnn_onl
ine_game_transfer_final/tdnn-lstm/final.mdl 'exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst' scp:data/dahua/mfcc/wav.scp ark:/dev/null
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [6]: GeForce GTX 1080 Ti   free:10945M, used:227M, total:11172M, free/total:0.979682 version 6.1
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 5 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 11 orphan components.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:Collapse():nnet-utils.cc:1463) Added 6 components, removed 11
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 2 control threads, 20 worker threads and batch size 200
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260
ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.402~1-53346]:ComputeFeatures():online-cuda-feature-pipeline.cc:66) Assertion failed: (false)

[ Stack-Trace: ]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7f3271b6b31a]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7f3271b6bd88]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x
ae) [0x7f326fdc633a]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPip
eline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7f32745b5529]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x4fe) [0x7f32745b60fa]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder:
:BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7f32745b7f28]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f3270a0fc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f32711566ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f327047e41d]

Aborted (core dumped)

gdb core info:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feat'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f32703ac428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f30e106c700 (LWP 41345))]
(gdb) bt
#0  0x00007f32703ac428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f32703ae02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f3271b6bda1 in kaldi::KaldiAssertFailure_ (
    func=func@entry=0x7f326fdca460 <kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)::__func__> "ComputeFeatures",
    file=file@entry=0x7f326fdc9c00 "online-cuda-feature-pipeline.cc", line=line@entry=66, cond_str=cond_str@entry=0x7f326fdc8d77 "false") at kaldi-error.cc:234
#3  0x00007f326fdc633a in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7f30e106b330, cu_wave=..., sample_freq=8000, input_features=0x7f323fcfcaa0, ivector_features=0x7f323fcfca90)
    at online-cuda-feature-pipeline.cc:66
#4  0x00007f32745b5529 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7ffe36a937b0, first=first@entry=0,
    tasks=std::vector of length 1, capacity 200 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:616
#5  0x00007f32745b60fa in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7ffe36a937b0, threadId=<optimized out>) at batched-threaded-nnet3-cuda-pipeline.cc:841
#6  0x00007f32745b7f28 in std::_Mem_fn_base<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), true>::operator()<int, void>(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int&&) const (__object=<optimized out>, this=<optimized out>) at /usr/include/c++/5/functional:600
#7  std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)>::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=<optimized out>) at /usr/include/c++/5/functional:1531
#8  std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)>::operator()() (this=<optimized out>)
    at /usr/include/c++/5/functional:1520
#9  std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run() (
    this=<optimized out>) at /usr/include/c++/5/thread:115
#10 0x00007f3270a0fc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f32711566ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f327047e41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

The text was updated successfully, but these errors were encountered:

danpovey · 2019-06-27T18:37:07Z

It looks to me like it's only set up right now to work when you have ivectors. It may have to be extended to work when there are no ivectors

…

On Thu, Jun 27, 2019 at 5:45 AM drawnfish ***@***.***> wrote: Below is the exec info: batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf --word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/grcc/wav.scp ark:/dev/null-kndiscount-5.0e-10/words.txt exp/nnet3_tdnn/tdnn-lstm/final.mdl exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst scp:data/dahua/mfc batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf '--word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/graph/words.txt' exp/nnet3_tdnn_online_game_transfer_final/tdnn-lstm/final.mdl 'exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst' scp:data/mfcc/wav.scp ark:/dev/null LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode. LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [6]: GeForce GTX 1080 Ti free:10945M, used:227M, total:11172M, free/total:0.979682 version 6.1 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 5 orphan nodes. LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 11 orphan components. LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:Collapse():nnet-utils.cc:1463) Added 6 components, removed 11 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 2 control threads, 20 worker threads and batch size 200 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260 ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.402~1-53346]:ComputeFeatures():online-cuda-feature-pipeline.cc:66) Assertion failed: (false) [ Stack-Trace: ] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7f3271b6b31a] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7f3271b6bd88] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase const&, float, kaldi::CuMatrix*, kaldi::CuVector*)+0xae) [0x7f326fdc633a] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocatorkaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState* >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7f32745b5529] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x4fe) [0x7f32745b60fa] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7f32745b7f28] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f3270a0fc80] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f32711566ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f327047e41d] Aborted (core dumped) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3425?email_source=notifications&email_token=AAZFLO3APIURAC4LU4OPXDLP4SD47A5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G4A27WQ>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAZFLO7MAJADMHKEZHDCPXDP4SD47ANCNFSM4H32FQTQ> .

cloudhan · 2019-07-01T10:11:06Z

I encountered another problem:

ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.413~1-c4490]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

And the GDB stacktrace:

(gdb) 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fffd5044231 in __GI_abort () at abort.c:79
#2  0x0000555555a9f2f4 in kaldi::KaldiAssertFailure_ (
    func=func@entry=0x555555ace7a0 <kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)::__func__> "AddMatMat", file=file@entry=0x555555acc143 "cu-matrix.cc", line=line@entry=1305, 
    cond_str=cond_str@entry=0x555555acce1d "k == k1") at kaldi-error.cc:234
#3  0x000055555589ee97 in kaldi::CuMatrixBase<float>::AddMatMat (this=this@entry=0x7fff1dff2930, alpha=alpha@entry=1, A=..., 
    transA=transA@entry=kaldi::kNoTrans, B=..., transB=transB@entry=kaldi::kTrans, beta=beta@entry=0) at cu-matrix.cc:1305
#4  0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
    at online-ivector-feature-cuda.cc:58
#5  0x0000555555739684 in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7fff1dff2fe0, cu_wave=..., sample_freq=16000, 
    input_features=0x555574ff2ef0, ivector_features=0x555574ff2ee0) at online-cuda-feature-pipeline.cc:64
#6  0x0000555555715152 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7fffffffd760, first=first@entry=0, 
    tasks=std::vector of length 1, capacity 50 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:615
#7  0x0000555555715e2f in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7fffffffd760, threadId=<optimized out>)
    at batched-threaded-nnet3-cuda-pipeline.cc:841
#8  0x000055555571771b in std::__invoke_impl<void, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(std::__invoke_memfun_deref, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/7/bits/invoke.h:73
#9  std::__invoke<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__fn=<optimized out>)
    at /usr/include/c++/7/bits/invoke.h:95
#10 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/7/thread:234
#11 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::operator() (this=<optimized out>) at /usr/include/c++/7/thread:243
#12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> > >::_M_run (this=<optimized out>) at /usr/include/c++/7/thread:186
#13 0x00007fffd569b96f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff0e7e5aa in start_thread (arg=0x7fff1dffb700) at pthread_create.c:463
#15 0x00007fffd5104cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And additional debug info:

(gdb) frame 4
#4  0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
    at online-ivector-feature-cuda.cc:58
58	    lda_feats_normalized.AddMatMat(1.0, spliced_feats_normalized, kNoTrans,
(gdb) print spliced_feats_normalized
$1 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7ffcaa000000, num_cols_ = 280, num_rows_ = 1024, stride_ = 320}, <No data fields>}
(gdb) print cu_lda_
$2 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7fff5fbd8e00, num_cols_ = 281, num_rows_ = 40, stride_ = 320}, <No data fields>}

It runs fine when you disable the cuda feature pipeline.

@luitjens Would you mind look at it?

luitjens · 2019-07-01T15:56:42Z

Please provide your feature extraction configuration. These types of errors are going to be expected until we fill in all the features.

…

On Mon, Jul 1, 2019 at 4:11 AM Cloud Han ***@***.***> wrote: I encountered another problem: ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.413~1-c4490]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1) And the GDB stacktrace: (gdb) #0 __GI_raise ***@***.***=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fffd5044231 in __GI_abort () at abort.c:79 #2 0x0000555555a9f2f4 in kaldi::KaldiAssertFailure_ ( ***@***.***=0x555555ace7a0 <kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)::__func__> "AddMatMat", ***@***.***=0x555555acc143 "cu-matrix.cc", ***@***.***=1305, ***@***.***=0x555555acce1d "k == k1") at kaldi-error.cc:234 #3 0x000055555589ee97 in kaldi::CuMatrixBase<float>::AddMatMat ***@***.***=0x7fff1dff2930, ***@***.***=1, A=..., ***@***.***=kaldi::kNoTrans, B=..., ***@***.***=kaldi::kTrans, ***@***.***=0) at cu-matrix.cc:1305 #4 0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ***@***.***=0x555574ff2ee0) at online-ivector-feature-cuda.cc:58 #5 0x0000555555739684 in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures ***@***.***=0x7fff1dff2fe0, cu_wave=..., sample_freq=16000, input_features=0x555574ff2ef0, ivector_features=0x555574ff2ee0) at online-cuda-feature-pipeline.cc:64 #6 0x0000555555715152 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures ***@***.***=0x7fffffffd760, ***@***.***=0, tasks=std::vector of length 1, capacity 50 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:615 #7 0x0000555555715e2f in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7fffffffd760, threadId=<optimized out>) at batched-threaded-nnet3-cuda-pipeline.cc:841 #8 0x000055555571771b in std::__invoke_impl<void, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(std::__invoke_memfun_deref, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/7/bits/invoke.h:73 #9 std::__invoke<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__fn=<optimized out>) at /usr/include/c++/7/bits/invoke.h:95 #10 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/7/thread:234 #11 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::operator() (this=<optimized out>) at /usr/include/c++/7/thread:243 #12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> > >::_M_run (this=<optimized out>) at /usr/include/c++/7/thread:186 #13 0x00007fffd569b96f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x00007ffff0e7e5aa in start_thread (arg=0x7fff1dffb700) at pthread_create.c:463 #15 0x00007fffd5104cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 And additional debug info: (gdb) frame 4 #4 0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ***@***.***=0x555574ff2ee0) at online-ivector-feature-cuda.cc:58 58 lda_feats_normalized.AddMatMat(1.0, spliced_feats_normalized, kNoTrans, (gdb) print spliced_feats_normalized $1 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7ffcaa000000, num_cols_ = 280, num_rows_ = 1024, stride_ = 320}, <No data fields>} (gdb) print cu_lda_ $2 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7fff5fbd8e00, num_cols_ = 281, num_rows_ = 40, stride_ = 320}, <No data fields>} It runs fine when you disable the cuda feature pipeline. @luitjens <https://github.com/luitjens> Would you mind look at it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3425?email_source=notifications&email_token=ABSFS4UU66FKOD7ALS6XVMDP5HJ6DA5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY5VA4I#issuecomment-507203697>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABSFS4TPEJEDHXYEXLRWHPLP5HJ6DANCNFSM4H32FQTQ> .

luitjens · 2019-07-02T02:45:01Z

I think Dan's assertion about ivectors is probably right. We tried placing the hooks in to not have ivectors but my guess is we are not resizing the matrix the right size. I'm heading on vacation in a day and can look at it when I return in 2 weeks. In the meantime i'd suggest turning gpu feature extraction off in the binary for this model. Or if you are really ambitious trying to fix it while i'm gone. Maybe @LeviBarnes will have time to look into this.

dpny518 · 2019-08-02T08:17:15Z

I had the same error

batched-wav-nnet3-cuda --cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2 --cuda-use-tensor-cores=false --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 --config=model/conf/online.conf --word-symbol-table=model/graph/words.txt --max-batch-size=1 --cuda-worker-threads=2 model/final.mdl model/graph/HCLG.fst scp:results/temp/wav.scp ark:/dev/null 
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [0]: GeForce GTX 1080 Ti	free:10453M, used:723M, total:11177M, free/total:0.935236 version 6.1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 25 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 50 orphan components.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Collapse():nnet-utils.cc:1463) Added 25 components, removed 50
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 3 control threads, 2 worker threads and batch size 1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.434~1-e167b]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

[ Stack-Trace: ]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7fc0c6a1330a]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7fc0c6a13d78]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudamatrix.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0xf7) [0x7fc0c7c7a62d]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::IvectorExtractorFastCuda::GetIvector(kaldi::CuMatrixBase<float> const&, kaldi::CuVector<float>*)+0x1b2) [0x7fc0c4c6abc4]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x8d) [0x7fc0c4c6c339]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7fc0c9465303]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x503) [0x7fc0c94662a1]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7fc0c9467e7c]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc0c58b7c80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc0c5ffe6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0c532641d]

Here is my script that I use to decode it

cuda_flags="--cuda-use-tensor-cores=true 
--iterations=5 --cuda-memory-proportion=.5 
--max-batch-size=32 --cuda-control-threads=3 
--batch-drain-size=8 --cuda-worker-threads=2"

batched-wav-nnet3-cuda $cuda_flags  \
      --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 \
      --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 \
	  --config="$model"/conf/online.conf \
	  --word-symbol-table="$model"/graph/words.txt \
	  --max-batch-size=1 \
	  --cuda-worker-threads=2 \
	  "$model"/final.mdl \
	  "$model"/graph/HCLG.fst \
      "scp:$results/temp/wav.scp" \
	  "ark:/dev/null" 2>&1  | tee -a result.txt

luitjens · 2019-08-02T13:44:05Z

Please provide the content of all feature extraction config files.

…

On Fri, Aug 2, 2019 at 2:17 AM Steve Rogers ***@***.***> wrote: I had the same error batched-wav-nnet3-cuda --cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2 --cuda-use-tensor-cores=false --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 --config=model/conf/online.conf --word-symbol-table=model/graph/words.txt --max-batch-size=1 --cuda-worker-threads=2 model/final.mdl model/graph/HCLG.fst scp:results/temp/wav.scp ark:/dev/null LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [0]: GeForce GTX 1080 Ti free:10453M, used:723M, total:11177M, free/total:0.935236 version 6.1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 25 orphan nodes. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 50 orphan components. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Collapse():nnet-utils.cc:1463) Added 25 components, removed 50 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 3 control threads, 2 worker threads and batch size 1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.434~1-e167b]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1) [ Stack-Trace: ] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7fc0c6a1330a] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7fc0c6a13d78] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudamatrix.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0xf7) [0x7fc0c7c7a62d] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::IvectorExtractorFastCuda::GetIvector(kaldi::CuMatrixBase<float> const&, kaldi::CuVector<float>*)+0x1b2) [0x7fc0c4c6abc4] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x8d) [0x7fc0c4c6c339] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7fc0c9465303] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x503) [0x7fc0c94662a1] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7fc0c9467e7c] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc0c58b7c80] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc0c5ffe6ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0c532641d] Here is my script that I use to decode it cuda_flags="--cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2" batched-wav-nnet3-cuda $cuda_flags \ --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 \ --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 \ --config="$model"/conf/online.conf \ --word-symbol-table="$model"/graph/words.txt \ --max-batch-size=1 \ --cuda-worker-threads=2 \ "$model"/final.mdl \ "$model"/graph/HCLG.fst \ "scp:$results/temp/wav.scp" \ "ark:/dev/null" 2>&1 | tee -a result.txt — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3425?email_source=notifications&email_token=ABSFS4SJBJSC4VM5GBOSQHDQCPUTLA5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3NAYRA#issuecomment-517606468>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABSFS4VLL6PR6ZA7PDV6PQTQCPUTLANCNFSM4H32FQTQ> .

dpny518 · 2019-08-05T06:05:07Z

ivector

--splice-config=model/conf/splice.conf
--cmvn-config=model/ivector/online_cmvn.conf
--lda-matrix=model/ivector/final.mat
--global-cmvn-stats=model/ivector/global_cmvn.stats
--diag-ubm=model/ivector/final.dubm
--ivector-extractor=model/ivector/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100

online.conf

--feature-type=mfcc
--mfcc-config=model/conf/mfcc.conf
--ivector-extraction-config=model/conf/ivector_extractor.conf
--frame-subsampling-factor=3
--add-pitch=true
--acoustic-scale=1
--minimize=false
--max-active=10000
--beam=15
--lattice-beam=8

mfcc

# config for high-resolution MFCC features, intended for neural network training.
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false       # use average of log energy, not energy.
--sample-frequency=16000 # AISHELL-2 is sampled at 16kHz
--num-mel-bins=40        # similar to Google's setup.
--num-ceps=40            # there is no dimensionality reduction.
--low-freq=20            # low cutoff frequency for mel bins
--high-freq=-400         # high cutoff frequency, relative to Nyquist of 8000 (=7600)

Also my WER is a lot worse with GPU than CPU, 88% WER vs 9.95 WER when i set -gpu-feature-extract=false

cloudhan · 2019-08-05T06:36:08Z

the problem is due to --add-pitch=true in online.conf, the pitch computing and processing has not been implemented with cuda code, clearly. And due to the complexity with pitch compute, I don't think it can be convert easily. The easiest way to walkaround this issue might be writing a data adapter for OnlineFeatureInferface and use CPU code for pitch extraction in short term.

Note, the complexity is not due to the algorithm behind it, but the online feature processing. The historic version pitch-functions.cc, which has only offline processing functionality, is pretty clear. The only problem is in latter online refactoring, from commit 16fb114 to c34e68d cause a pitch feature difference and I cannot backport it.

luitjens · 2019-08-05T13:55:07Z

Thanks for the response Cloud. Indeed pitch is not implemented yet. We have not evaluated how hard it would be to do so and just have not gotten around to doing this. We have a very small team and a lot of requests. The order we are implementing things are completely customer driven and we don't currently have any models which use pitch. Once we get a high priority model in house that uses pitch we will attempt to implement this. The work around is to set gpu-feature-extract=false. Performance for a single GPU will be about the same however scalablity on dense GPU systems will suffer.

…

On Mon, Aug 5, 2019 at 12:36 AM Cloud Han ***@***.***> wrote: the problem is due to --add-pitch=true in online.conf, the pitch computing and processing has not been implemented with cuda code, clearly. And due to the complexity with pitch compute, I don't think it can be convert easily. The easiest way to walkaround this issue might be writing a data adapter for OnlineFeatureInferface and use CPU code for pitch extraction in short term. Note, the complexity is not due to the algorithm behind it, but the online feature processing. The historic version pitch-functions.cc <https://github.com/kaldi-asr/kaldi/blob/4d656e1df34579fdf32645da5c0dabaf9d74e2ce/src/feat/pitch-functions.cc>, which has only offline processing functionality, is pretty clear. The only problem is in latter online refactoring, from commit 16fb114 <16fb114> to c34e68d <c34e68d> cause a pitch feature difference and I cannot backport it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3425?email_source=notifications&email_token=ABSFS4VXAW4E3BK6HKDG5ZDQC7C77A5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3Q2QFI#issuecomment-518105109>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABSFS4V7SJQNYWF2ZFQV3ODQC7C77ANCNFSM4H32FQTQ> .

luitjens · 2019-08-29T01:01:51Z

Can you please try with the latest master? We just pushed in the FBANK code which also includes a fix for no ivectors.

danpovey · 2019-09-29T12:59:08Z

Closing as likely has already been fixed.

drawfish added the bug label Jun 27, 2019

danpovey closed this as completed Sep 29, 2019

pingpiang2019 mentioned this issue Nov 19, 2019

[src] CUDA Online/Offline pipelines + light batched nnet3 driver #3568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

drawfish commented Jun 27, 2019 •

edited

Loading

danpovey commented Jun 27, 2019 via email

cloudhan commented Jul 1, 2019

luitjens commented Jul 1, 2019 via email

luitjens commented Jul 2, 2019

dpny518 commented Aug 2, 2019

luitjens commented Aug 2, 2019 via email

dpny518 commented Aug 5, 2019 •

edited

Loading

cloudhan commented Aug 5, 2019

luitjens commented Aug 5, 2019 via email

luitjens commented Aug 29, 2019

danpovey commented Sep 29, 2019

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

Comments

drawfish commented Jun 27, 2019 • edited Loading

danpovey commented Jun 27, 2019 via email

cloudhan commented Jul 1, 2019

luitjens commented Jul 1, 2019 via email

luitjens commented Jul 2, 2019

dpny518 commented Aug 2, 2019

luitjens commented Aug 2, 2019 via email

dpny518 commented Aug 5, 2019 • edited Loading

cloudhan commented Aug 5, 2019

luitjens commented Aug 5, 2019 via email

luitjens commented Aug 29, 2019

danpovey commented Sep 29, 2019

drawfish commented Jun 27, 2019 •

edited

Loading

dpny518 commented Aug 5, 2019 •

edited

Loading