Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MemoryManager implementation with rpmalloc #3873

Merged
merged 8 commits into from
Oct 18, 2017

Conversation

lukas-w
Copy link
Member

@lukas-w lukas-w commented Oct 10, 2017

Fixes #3865, by integrating rpmalloc.

Quick – probably not very meaningful – benchmark, using Skiessi-RandomProjectNumber14253:

time user time system elapsed cpu
stable-1.2 23.26 1.33 8.09 303%
malloc/free in MemoryManager 21.30 1.30 6.57 344%
rpmalloc override, replaces all memory allocations 21.66 1.45 6.47 357%
rpmalloc (this PR) 21.15 1.45 6.71 337%

@zonkmachine
Copy link
Member

zonkmachine commented Oct 10, 2017

Cool!

On Linuxmint 17.3, I get:

-- Configuring done
CMake Error at src/3rdparty/rpmalloc/CMakeLists.txt:3 (add_library):
  Cannot find source file:

.

Edit. New to submodules. The commands that got me going was:

git submodule init
git submodule update --recursive --remote

@tresf
Copy link
Member

tresf commented Oct 10, 2017

On Linuxmint 17.3, I get:

It's a submodule, you'll have to clone with the recurse option. git clone --recursive foo://bar

@Umcaruje
Copy link
Member

Error when compiling with carla:

In file included from /home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:25:0:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.h:81:5: error: ‘QMutex’ does not name a type
     QMutex fMutex;
     ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘intptr_t CarlaInstrument::handleDispatcher(NativeHostDispatcherOpcode, int32_t, intptr_t, void*, float)’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:253:12: warning: enumeration value ‘NATIVE_HOST_OPCODE_INTERNAL_PLUGIN’ not handled in switch [-Wswitch]
     switch (opcode)
            ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘virtual void CarlaInstrument::play(sample_t (*)[2])’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:355:32: error: ‘fMutex’ was not declared in this scope
         const QMutexLocker ml(&fMutex);
                                ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘virtual bool CarlaInstrument::handleMidiEvent(const MidiEvent&, const MidiTime&, f_cnt_t)’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:371:28: error: ‘fMutex’ was not declared in this scope
     const QMutexLocker ml(&fMutex);
                            ^
plugins/carlabase/CMakeFiles/carlabase.dir/build.make:72: recipe for target 'plugins/carlabase/CMakeFiles/carlabase.dir/carla.cpp.o' failed
make[2]: *** [plugins/carlabase/CMakeFiles/carlabase.dir/carla.cpp.o] Error 1

@tresf
Copy link
Member

tresf commented Oct 10, 2017

@Umcaruje that appears to be a bug with Carla, it should have imported QMutex without relying on it from another header. Can you add #include <QMutex> to the Carla plugin, it sounds like an existing (benign with MemoryManager, but still legit) bug with Carla on stable although @lukas-w can just add it to the PR too. :)

This is easy to miss when another common header accidentally provides the include.

@Umcaruje
Copy link
Member

Built without carla, I can confirm I don't get stutters while playing on elementary os loki under both qt4 and qt5. Very good and comparable performance with 1.1. I'll keep using it for further testing.

@PhysSong
Copy link
Member

Tested. It showed very fast allocation speed and I got no crashes during tests.

MemoryHelper::alignedFree( ( *it ).m_pool );
MemoryHelper::alignedFree( ( *it ).m_free );
if (--thread_guard_depth == 0) {
rpmalloc_thread_initialize();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant to do rpmalloc_thread_finalize here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Thanks!

@gi0e5b06
Copy link

May someone try this project?
It runs at 60% CPU on my old computer.
memory20171011.zip

@qnebra
Copy link

qnebra commented Oct 11, 2017

@gi0e5b06

On my computer with 2.2 Ghz CPU LMMS with your project opened use 14% of CPU.

@qnebra
Copy link

qnebra commented Oct 11, 2017

I am merged this PR into my local lmms and got this on cmake generate:

CMake Error at src/3rdparty/rpmalloc/CMakeLists.txt:3 (add_library):
Cannot find source file:

rpmalloc/rpmalloc/rpmalloc.c

Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx

CMake Error: Cannot determine link language for target "rpmalloc".
CMake Error: CMake can not determine linker language for target: rpmalloc

@tresf
Copy link
Member

tresf commented Oct 11, 2017

@qnebra please see #3873 (comment). This PR switches the project to using submodules, which will take some adjustment, but will greatly reduce the amount of 3rd party source code we bundle.

@qnebra
Copy link

qnebra commented Oct 11, 2017

Thank you.
Maybe add information about submodules to a developer wiki of lmms? To avoid situations like this.

@zonkmachine
Copy link
Member

Maybe add information about submodules to a developer wiki of lmms?

Let's merge it first.

@qnebra
Copy link

qnebra commented Oct 11, 2017

Test & merge, of course.

@gi0e5b06
Copy link

gi0e5b06 commented Oct 12, 2017

@qnebra Thanks for testing. I'm making some tests about memory use. Actually half of the instruments are good (don't allocate extra memory) and the other half is not, mainly because of Oscillator. All oscillators are duplicated for each played note (with no new information). There must be a way to avoid that.

PS1: I also rewrote BaseDetuning to avoid allocation. @softrabbit was right, a single float works fine.

PS2: This memory manager doesn't seem to improve performance a lot. I wouldn't merge it as it is. The memory manager is a sensible part very depending on the OS, the OS version, the compiler, etc. It should be configurable at runtime. Right now, 5 are available (legacy, standard, rpmalloc, mine and PsySong's one). An abstract class would make wonders.

@softrabbit
Copy link
Member

All oscillators are duplicated for each played note (with no new information). There must be a way to avoid that.

What, no new information? Frequency and phase come to mind... that being said, I'd like to see some kind of "oscillator bank" concept, where all 6 (TripleO) or 18 (Organic) Oscillators are merged into one object, preferably organizing the data and math in a way that makes it easy for compilers to optimize. (SIMD, anyone?) That might even up for some new features...

@gi0e5b06
Copy link

@softrabbit Frequency is the same than for the NotePlayHandle, same for ext_phaseOffset. PhaseOffset seems to be more or less equal to ext_phaseOffset. So only phase stays (one float), which is probably computable (I think but I may be wrong). Anyway, there is a big room for improvment here. Especially as Oscillator by many many instruments.

I wouldn't group the oscillators into a single object. Instead I would pre-allocate 2048 oscillators (for example) and reuse them when needed because all the instruments use the same Oscillator type (for now). More important, if it goes over 2048, the note should be skipped.

As a result: no memory allocation and CPU under controlled for all the instruments.

@softrabbit
Copy link
Member

I wouldn't group the oscillators into a single object. Instead I would pre-allocate 2048 oscillators (for example) and reuse them when needed because all the instruments use the same Oscillator type (for now). More important, if it goes over 2048, the note should be skipped.

I hate to dig up my own code as an example, but: #2089

That's a speedup of maybe ~2x from reorganizing arrays of structs into structs of arrays. Probably even more if compiling for wider SIMD models than SSE2. That's not to say the same applies to the Oscillator situation, but it might be worth considering.

@gi0e5b06
Copy link

@softrabbit Actually I don't disagree. But that was my understanding that you talked about grouping, not restructurating. As a first step requiring little code changes, I'm suggesting to group them all in a big array (with eventually informal subgrouping in 6/18).

As for switching from "arrays of structs into structs of arrays", there are certainly possible gains. But it is a complete rewrite of the Oscillator class and of the depending instruments. And IMHO, that's something that should be only tried later. Eventually in parallel (new classes: Oscillators, Kicker2, TripleOscillator2, Organic2, etc).

@qnebra
Copy link

qnebra commented Oct 12, 2017

@gi0e5b06

I forgot about something, on my PC your project run "as is" consume 65% of CPU. My previous tests are incorrect, because I modified your project. Interesing behaviour with this loop points.

What tools can I use to test lmms perfomance? Because in my project program has low usage of system resources according to system monitoring tools, but sounds are extremely stutter and glitch.

@PhysSong
Copy link
Member

Are there some benchmarks for cache miss rate? It is also important for performance.

@lukas-w
Copy link
Member Author

lukas-w commented Oct 18, 2017

@PhysSong I did a couple more benchmarks, again using Skiessi-RandomProjectNumber14253.

Startup and load project:

time cpu Ir incl. L1 Miss Sum Cycle Estimation Called
stable-1.2 2.06 45% 1.6×10⁹
(27.9%)
3.0×10⁶
(3.79%)
1.7×10⁹
(23.74%)
6.6×10³
this PR 1.97 44% 4.7×10⁵
(0.01%)
5.8×10³
(0.01%)
1.7×10⁴
(0.00%)
6.6×10³

Render project using CLI:

time cpu Ir incl. L1 Miss Sum Cycle Estimation Called
stable-1.2 07.49 324% 2.0×1010
(15.3%)
3.9×10⁴
(7.27%)
2.0×1010
(14.9%)
2.8×10⁴
this PR 06.30 359% 2.2×10⁶
(0.00%)
7.2×10³
(0.00%)
5.5×10⁵
(0.00%)
2.8×10⁴

If KCachegrind's cycle estimation is accurate, the old MemoryManager is something between 36,000 (rendering) and 100,000 times (startup and loading) slower than rpmalloc.

Merging.

@lukas-w lukas-w merged commit 8d6cb12 into stable-1.2 Oct 18, 2017
@lukas-w lukas-w deleted the fix/memory-manager-performance branch October 18, 2017 15:02
@zonkmachine
Copy link
Member

zonkmachine commented Oct 20, 2017

Maybe add information about submodules to a developer wiki of lmms?

@qnebra You are of course absolutely right here. The now updated wiki page is:
https://github.com/LMMS/lmms/wiki/Accessing-git-repository

@gi0e5b06
Copy link

My memory manager is about 35% faster than rpmalloc.
And it's not even optimized yet.
Available here:
https://github.com/gi0e5b06/lmms/blob/master/src/core/MemoryManagerArray.cpp

Drawbacks: is dedicated to LMMS, uses on average more memory

To do: auta-adaptable to the style of the user

Suggestion: make the memory manager an option configurable by the user

PhysSong added a commit to PhysSong/lmms that referenced this pull request Oct 23, 2017
PhysSong pushed a commit to PhysSong/lmms that referenced this pull request Oct 29, 2017
* Replace MemoryManager implementation with rpmalloc
    Fixes LMMS#3865
* Travis: Specify OSX image for Qt5 build
PhysSong pushed a commit to PhysSong/lmms that referenced this pull request Oct 29, 2017
* Replace MemoryManager implementation with rpmalloc
    Fixes LMMS#3865
* Travis: Specify OSX image for Qt5 build
@tresf tresf mentioned this pull request Nov 10, 2017
sdasda7777 pushed a commit to sdasda7777/lmms that referenced this pull request Jun 28, 2022
* Replace MemoryManager implementation with rpmalloc
    Fixes LMMS#3865
* Travis: Specify OSX image for Qt5 build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants