Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue on Windows #1474

Closed
notanaverageman opened this issue Mar 6, 2017 · 15 comments
Closed

Memory issue on Windows #1474

notanaverageman opened this issue Mar 6, 2017 · 15 comments

Comments

@notanaverageman
Copy link

We have been using Nnet2 models for a time and had no issues. Recently we had updated our systems to use Nnet3 models and after the update our servers started to show memory leaks.

The problem is not an actual memory leak. I have checked the code with Valgrind on Linux and DrMemory & VisualLeakDetector on Windows. None of them reports any leak. The problem reveals itself only if there are multiple recognition threads. (I have tried on a 8 core machine with 8 threads and on a 32 core machine with 24 threads.)

Things I have tried to compare multiple environments to identify the issue:

  • Platforms:

    • CentOS 6.6 - No leak.
    • Windows Server 2012 - Leak.
    • Windows 10 - Leak.
    • MinGW on Windows 10 - Leak.
    • Cygwin on Windows 10 - A strange divide by zero error.
    • Ubuntu 16.04 - A strange divide by zero error.
  • Architecture:

    • x64 - Leak.
    • x86 - Leak.
  • Compilers:

    • Visual Studio 2012 - Leak.
    • Visual Studio 2015 - Leak.
    • Intel Compiler included in Parallel Studio XE 2015 - Leak.
  • CBLAS Libraries:

    • Intel MKL 2015 - Leak.
    • Intel MKL 2017 - Leak.
    • CLAPACK - Leak.

After a month of struggling with this problem I am starting to think this is a problem with Windows's memory management. I have spent too much time on this issue and I wanted to ask if the problem has occured to anybody else.

I have minimized the problem into the following code and the nnet model I have been using is here:

#include <iostream>
#include <thread>
#include <string>
#include "util/kaldi-io.h"
#include "nnet3/nnet-nnet.h"
#include "nnet3/nnet-computation.h"
#include "nnet3/nnet-computation-graph.h"

int threadCount = 8;

int main(int argc, char * argv[])
{
    std::cout << "Press enter to start." << std::endl;
    std::cin.get();

    std::vector<std::thread> threads;

    for (int i = 0; i < threadCount; i++)
    {
        threads.emplace_back([]()
        {
            bool isBinary;
            kaldi::Input kaldiInput("nnet.bin", &isBinary);

            kaldi::nnet3::Nnet nnet;
            nnet.Read(kaldiInput.Stream(), isBinary);

            for (int j = 0; j < 10000; j++)
            {
                for (int k = 0; k < 10000; k++)
                {
                    int input_end = 0 + 100;
                    kaldi::nnet3::IoSpecification input;
                    input.name = "input";
                    kaldi::nnet3::IoSpecification output;
                    output.name = "output";

                    int n = rand() % 10;
                    // in the IoSpecification for now we we will request all the same indexes at
                    // output that we requested at input.
                    for (int t = 0; t < input_end; t++) {
                        input.indexes.push_back(kaldi::nnet3::Index(n, t));
                        output.indexes.push_back(kaldi::nnet3::Index(n, t));
                    }

                    kaldi::nnet3::ComputationRequest request;
                    request.inputs.push_back(input);
                    request.outputs.push_back(output);

                    kaldi::nnet3::ComputationGraph graph;
                    kaldi::nnet3::ComputationGraphBuilder builder(nnet, request, &graph);
                    builder.Compute();
                }
            }
        });
    }

    for (auto & th : threads)
    {
        th.join();
    }

    std::cout << "Done." << std::endl;
    std::cin.get();

    return 0;
}
@danpovey
Copy link
Contributor

danpovey commented Mar 6, 2017 via email

@notanaverageman
Copy link
Author

Yes the same team but the problem is different. We have solved that issue by disabling multithreading on ivector extraction.

You can see the divide by zero issue by compiling the code I sent in the first post on Ubuntu 16.04 with gcc 5.4.0 (4.8 and 4.9 also give me the same issue on Ubuntu 14.04.)

I compiled the code with this command: g++ -std=c++11 test.cpp -Ikaldi/src/ -Ikaldi/tools/openfst/include -DHAVE_OPENBLAS -Ikaldi/tools/OpenBLAS/install/include -Lkaldi/tools/OpenBLAS/install/lib -Lkaldi/tools/openfst/lib -Lkaldi/src/util/ -Lkaldi/src/nnet3/ -lopenblas -lfst -lpthread -lkaldi-util -lkaldi-nnet3

I am copying the gdb output on my machine:

Thread 3 "a.out" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fffeaeff700 (LWP 30603)]
0x00007ffff74a1d03 in std::tr1::__detail::_Mod_range_hashing::operator() (
    this=0x7fffeaefec8b, __num=9714, __den=0)
    at /usr/include/c++/5/tr1/hashtable_policy.h:369
369	    { return __num % __den; }

With backtrace:

#0  0x00007ffff74a1d03 in std::tr1::__detail::_Mod_range_hashing::operator() (
    this=0x7fffeaefec8b, __num=9714, __den=0)
    at /usr/include/c++/5/tr1/hashtable_policy.h:369
#1  0x00007ffff75480b1 in std::tr1::__detail::_Hash_code_base<std::pair<int, kaldi::nnet3::Index>, std::pair<std::pair<int, kaldi::nnet3::Index> const, int>, std::_Select1st<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::equal_to<std::pair<int, kaldi::nnet3::Index> >, kaldi::nnet3::CindexHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, false>::_M_bucket_index (this=0x7fffeaefec88, __c=9714, __n=0)
    at /usr/include/c++/5/tr1/hashtable_policy.h:677
#2  0x00007ffff7547c78 in std::tr1::_Hashtable<std::pair<int, kaldi::nnet3::Index>, std::pair<std::pair<int, kaldi::nnet3::Index> const, int>, std::allocator<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::_Select1st<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::equal_to<std::pair<int, kaldi::nnet3::Index> >, kaldi::nnet3::CindexHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::_M_insert (this=0x7fffeaefec88, 
    __v=...) at /usr/include/c++/5/tr1/hashtable.h:893
#3  0x00007ffff7545f4a in std::tr1::_Hashtable<std::pair<int, kaldi::nnet3::Index>, std::pair<std::pair<int, kaldi::nnet3::Index> const, int>, std::allocator<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::_Select1st<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::equal_to<std::pair<int, kaldi::nnet3::Index> >, kaldi::nnet3::CindexHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::insert (this=0x7fffeaefec88, __v=...)
    at /usr/include/c++/5/tr1/hashtable.h:376
#4  0x00007ffff753bea1 in kaldi::nnet3::ComputationGraph::GetCindexId (
this=0x7fffeaefec30, cindex=..., input=true, is_new=0x7fffeaefe766)
at nnet-computation-graph.cc:33
#5  0x00007ffff753d5f7 in kaldi::nnet3::ComputationGraphBuilder::AddInputs (
this=0x7fffeaefecc0) at nnet-computation-graph.cc:244
#6  0x00007ffff753e902 in kaldi::nnet3::ComputationGraphBuilder::Compute (
this=0x7fffeaefecc0) at nnet-computation-graph.cc:434
#7  0x000000000040b003 in main::{lambda()#1}::operator()() const ()
#8  0x000000000040c650 in void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) ()
#9  0x000000000040c5a6 in std::_Bind_simple<main::{lambda()#1} ()>::operator()() ()
#10 0x000000000040c536 in std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() ()
#11 0x00007ffff6f37c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007ffff7bc16ba in start_thread (arg=0x7fffeaeff700)
    at pthread_create.c:333
#13 0x00007ffff669d82d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@danpovey
Copy link
Contributor

danpovey commented Mar 7, 2017 via email

@notanaverageman
Copy link
Author

I have checked with valgrind. The output is below. Seems there is no new information.

==31982== Memcheck, a memory error detector
==31982== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31982== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==31982== Command: ./a.out
==31982== 
Press enter to start.

==31982== 
==31982== Process terminating with default action of signal 8 (SIGFPE)
==31982==  Integer divide by zero at address 0x8034464EF
==31982==    at 0x557BD03: std::tr1::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const (hashtable_policy.h:369)
==31982==    by 0x56220B0: std::tr1::__detail::_Hash_code_base<std::pair<int, kaldi::nnet3::Index>, std::pair<std::pair<int, kaldi::nnet3::Index> const, int>, std::_Select1st<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::equal_to<std::pair<int, kaldi::nnet3::Index> >, kaldi::nnet3::CindexHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, false>::_M_bucket_index(std::pair<int, kaldi::nnet3::Index> const&, unsigned long, unsigned long) const (hashtable_policy.h:677)
==31982==    by 0x5621C77: std::tr1::_Hashtable<std::pair<int, kaldi::nnet3::Index>, std::pair<std::pair<int, kaldi::nnet3::Index> const, int>, std::allocator<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::_Select1st<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::equal_to<std::pair<int, kaldi::nnet3::Index> >, kaldi::nnet3::CindexHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::_M_insert(std::pair<std::pair<int, kaldi::nnet3::Index> const, int> const&, std::tr1::integral_constant<bool, true>) (hashtable.h:893)
==31982==    by 0x561FF49: std::tr1::_Hashtable<std::pair<int, kaldi::nnet3::Index>, std::pair<std::pair<int, kaldi::nnet3::Index> const, int>, std::allocator<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::_Select1st<std::pair<std::pair<int, kaldi::nnet3::Index> const, int> >, std::equal_to<std::pair<int, kaldi::nnet3::Index> >, kaldi::nnet3::CindexHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::insert(std::pair<std::pair<int, kaldi::nnet3::Index> const, int> const&) (hashtable.h:376)
==31982==    by 0x5615EA0: kaldi::nnet3::ComputationGraph::GetCindexId(std::pair<int, kaldi::nnet3::Index> const&, bool, bool*) (nnet-computation-graph.cc:33)
==31982==    by 0x56175F6: kaldi::nnet3::ComputationGraphBuilder::AddInputs() (nnet-computation-graph.cc:244)
==31982==    by 0x5618901: kaldi::nnet3::ComputationGraphBuilder::Compute() (nnet-computation-graph.cc:434)
==31982==    by 0x40B002: main::{lambda()#1}::operator()() const (in Desktop/a.out)
==31982==    by 0x40C64F: void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (in Desktop/a.out)
==31982==    by 0x40C5A5: std::_Bind_simple<main::{lambda()#1} ()>::operator()() (in Desktop/a.out)
==31982==    by 0x40C535: std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() (in Desktop/a.out)
==31982==    by 0x5AC8C7F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==31982== 
==31982== HEAP SUMMARY:
==31982==     in use at exit: 13,218,072 bytes in 6,044 blocks
==31982==   total heap usage: 8,476 allocs, 2,432 frees, 13,457,566 bytes allocated
==31982== 
==31982== LEAK SUMMARY:
==31982==    definitely lost: 0 bytes in 0 blocks
==31982==    indirectly lost: 0 bytes in 0 blocks
==31982==      possibly lost: 6,912 bytes in 24 blocks
==31982==    still reachable: 13,211,160 bytes in 6,020 blocks
==31982==         suppressed: 0 bytes in 0 blocks
==31982== Rerun with --leak-check=full to see details of leaked memory
==31982== 
==31982== For counts of detected and suppressed errors, rerun with: -v
==31982== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Killed

@danpovey
Copy link
Contributor

danpovey commented Mar 7, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 7, 2017 via email

@notanaverageman
Copy link
Author

The crash occurs in single thread, too. Actually the exception is thrown when the program first accesses the int32 ComputationGraph::GetCindexId(const Cindex &cindex, bool input, bool *is_new) function. The throwing values are:
j: 0
k: 0
new_index: 0
cindex: (0 (3 0 0))

The strange thing is if I run the code below on a separate program, it does not throw:

typedef unordered_map<kaldi::nnet3::Cindex, int32, kaldi::nnet3::CindexHasher> map_type;
map_type cindex_to_cindex_id_;
int new_index = cindex_to_cindex_id_.size();
kaldi::nnet3::Cindex cindex(0, kaldi::nnet3::Index(3, 0, 0));
std::pair<map_type::iterator, bool> p = cindex_to_cindex_id_.insert(std::pair<kaldi::nnet3::Cindex, int32>(cindex, new_index));

@danpovey
Copy link
Contributor

danpovey commented Mar 8, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 8, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 8, 2017 via email

@notanaverageman
Copy link
Author

I don't think it is a flag issue. Kaldi is compiled with default makefile and the same compilation process on CentOS 6.6 does not give the error. It occurs on Ubuntu and Cygwin. Do all the Linux distributions use the same standard library? It might be that Ubuntu's unordered_map implementation has a bug. Here is a similar bug that was solved in 2014.

@notanaverageman
Copy link
Author

notanaverageman commented Mar 8, 2017

I tried the code without lambda and newed all the objects, but the valgrind output is still the same.

@danpovey
Copy link
Contributor

danpovey commented Mar 8, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 8, 2017 via email

@notanaverageman
Copy link
Author

Thanks for the tip. Adding the code to the makefile solves the issue. I will try to compile with Cygwin and the error should be gone there, too. This way I can check if the leak is caused by Windows OS or the Microsoft's STL.

@danpovey danpovey closed this as completed Apr 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants