Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Pass thread pool impl to Map constructor #6687

Merged
merged 3 commits into from
Oct 20, 2016
Merged

Pass thread pool impl to Map constructor #6687

merged 3 commits into from
Oct 20, 2016

Conversation

mikemorris
Copy link
Contributor

@mikemorris mikemorris commented Oct 12, 2016

Fixes #3982

Put together an initial draft of subclassing Scheduler to supply a custom thread pool implementation to Map. Currently, only the NodeMap implementation has been updated to pass in a thread pool based on NanAsyncWorker - all other platforms will need to be updated to pass in the default thread pool implementation.

I think the NodeThread->NodeThreadPool reference is quite unsafe currently. Should I be using a std::weak_ptr or something here?

I'm hitting a segfault when running npm test, backtrace follows:

* thread #11: tid = 0x14b212, 0x000000010014888d node`v8::HandleScope::Initialize(v8::Isolate*) + 29, stop reason = EXC_BAD_ACCESS (code=1, address=0x6c08)
    frame #0: 0x000000010014888d node`v8::HandleScope::Initialize(v8::Isolate*) + 29
node`v8::HandleScope::Initialize:
->  0x10014888d <+29>: movq   0x6c08(%rbx), %r15
    0x100148894 <+36>: callq  0x100421c70               ; v8::internal::ThreadId::GetCurrentThreadId()
    0x100148899 <+41>: movl   %eax, -0x1c(%rbp)
    0x10014889c <+44>: movl   %eax, -0x20(%rbp)
(lldb) bt
error: libmbgl-core.a(geometry_tile_worker.o) DWARF DIE at 0x00010b33 (class Actor<mbgl::GeometryTileWorker>) has a member variable 0x00010b52 (object) whose type is a forward declaration, not a complete definition.
Try compiling the source file with -fno-limit-debug-info
* thread #11: tid = 0x14b212, 0x000000010014888d node`v8::HandleScope::Initialize(v8::Isolate*) + 29, stop reason = EXC_BAD_ACCESS (code=1, address=0x6c08)
  * frame #0: 0x000000010014888d node`v8::HandleScope::Initialize(v8::Isolate*) + 29
    frame #1: 0x000000010401fef5 mapbox-gl-native.node`Nan::HandleScope::HandleScope(this=0x0000700000c30358) + 37 at nan.h:337
    frame #2: 0x000000010401f645 mapbox-gl-native.node`Nan::HandleScope::HandleScope(this=0x0000700000c30358) + 21 at nan.h:337
    frame #3: 0x000000010410ef30 mapbox-gl-native.node`Nan::AsyncWorker::AsyncWorker(this=0x0000000102306ba0, callback_=0x0000000000000000) + 96 at nan.h:1479
    frame #4: 0x0000000104114d77 mapbox-gl-native.node`node_mbgl::NodeThreadPool::NodeThread::NodeThread(this=0x0000000102306ba0, parent_=0x0000000102306060) + 39 at node_thread_pool.cpp:22
    frame #5: 0x0000000104114ced mapbox-gl-native.node`node_mbgl::NodeThreadPool::NodeThread::NodeThread(this=0x0000000102306ba0, parent_=0x0000000102306060) + 29 at node_thread_pool.cpp:23
    frame #6: 0x0000000104114c63 mapbox-gl-native.node`node_mbgl::NodeThreadPool::schedule(this=0x0000000102306060, mailbox=std::__1::weak_ptr<mbgl::Mailbox>::element_type @ 0x0000000102405ae8 strong=4 weak=6) + 195 at node_thread_pool.cpp:17
    frame #7: 0x000000010417c65d mapbox-gl-native.node`mbgl::Mailbox::push(this=0x0000000102405ae8, message=unique_ptr<mbgl::Message, std::__1::default_delete<mbgl::Message> > @ 0x0000700000c30638) + 589 at mailbox.cpp:20
    frame #8: 0x000000010458d82d mapbox-gl-native.node`void mbgl::ActorRef<mbgl::GeometryTileWorker>::invoke<void (this=0x0000000102405828, fn=b0 d5 58 04 01 00 00 00 00 00 00 00 00 00 00 00)()>(void (mbgl::GeometryTileWorker::*)()) + 253 at actor_ref.hpp:34
    frame #9: 0x000000010458b103 mapbox-gl-native.node`mbgl::GeometryTileWorker::coalesce(this=0x0000000102405828) + 51 at geometry_tile_worker.cpp:171
    frame #10: 0x000000010458ba2d mapbox-gl-native.node`mbgl::GeometryTileWorker::setPlacementConfig(this=0x0000000102405828, placementConfig_=(angle = 0, pitch = 0, debug = false), correlationID_=1) + 461 at geometry_tile_worker.cpp:127
    frame #11: 0x0000000104576615 mapbox-gl-native.node`void mbgl::MessageImpl<mbgl::GeometryTileWorker, void (mbgl::GeometryTileWorker::*)(mbgl::PlacementConfig, unsigned long long), std::__1::tuple<mbgl::PlacementConfig, unsigned long long> >::invoke<0ul, 1ul>(this=0x000000010172e280, (null)=std::__1::index_sequence<0UL, 1UL> @ 0x0000700000c308c8) + 197 at message.hpp:31
    frame #12: 0x0000000104576525 mapbox-gl-native.node`mbgl::MessageImpl<mbgl::GeometryTileWorker, void (mbgl::GeometryTileWorker::*)(mbgl::PlacementConfig, unsigned long long), std::__1::tuple<mbgl::PlacementConfig, unsigned long long> >::operator(this=0x000000010172e280)() + 21 at message.hpp:26
    frame #13: 0x000000010417ccfa mapbox-gl-native.node`mbgl::Mailbox::receive(this=0x0000000102405ae8) + 1370 at mailbox.cpp:48
    frame #14: 0x0000000104114f52 mapbox-gl-native.node`node_mbgl::NodeThreadPool::NodeThread::Execute(this=0x000000010172e2c0) + 434 at node_thread_pool.cpp:33
    frame #15: 0x0000000104115a41 mapbox-gl-native.node`Nan::AsyncExecute(req=0x000000010172e2c8) + 33 at nan.h:1690
    frame #16: 0x00000001007814ce node`worker + 90
    frame #17: 0x000000010078d6dc node`uv__thread_start + 25
    frame #18: 0x00007fff8dcb299d libsystem_pthread.dylib`_pthread_body + 131
    frame #19: 0x00007fff8dcb291a libsystem_pthread.dylib`_pthread_start + 168
    frame #20: 0x00007fff8dcb0351 libsystem_pthread.dylib`thread_start + 13

/cc @jfirebaugh @miccolis

@mention-bot
Copy link

@mikemorris, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jfirebaugh, @brunoabinader and @kkaefer to be potential reviewers.

@jfirebaugh
Copy link
Contributor

There's no requirement that scheduled mailboxes be processed in order -- the necessary ordering is enforced by the mailbox itself. So you can dispense with the queue and mutex and have NodeThread hold the weak_ptr<Mailbox> directly.

@mikemorris
Copy link
Contributor Author

Thanks @jfirebaugh, got that cleaned up and implemented Mailbox::maybeReceive, looking more closely at the backtrace it looks like the issue has to do with creating a new v8::HandleScope, will try to get this figured out tomorrow.

@mikemorris
Copy link
Contributor Author

It appears that in some (but not all) instances, v8::Isolate::GetCurrent() is a null pointer in NodeThreadPool::schedule, with a backtrace always passing through the message handling in GeometryTileWorker. Could this be due to the mbgl::util::RunLoop implementation invoking libuv manually and passing this loop into an mbgl::Mailbox in geometry_tile.cpp?

@jfirebaugh do you think implementing this Node.js thread pool will require a custom mbgl::util::RunLoop implementation as well?

@jfirebaugh
Copy link
Contributor

Schedulers need to be able to schedule from any thread, and it looks like Nan::AsyncWorker is only usable from the main thread. Indeed, uv_queue_work, on which it's based, says "Note that even though a global thread pool which is shared across all events loops is used, the functions are not thread safe." I read that as implying you can call uv_queue_work(uv_default_loop(), ...) only from the thread running the default loop.

So NodeThreadPool::schedule will need to dispatch back to the main thread -- with e.g. AsyncQueue<std::weak_ptr<Mailbox>> -- and then do the Nan::AsyncQueueWorker from there.

@mikemorris
Copy link
Contributor Author

@jfirebaugh Got this ALMOST working, but tests are hanging on not holding references because the AsyncQueue in NodeThreadPool is holding a reference to the Node.js main loop. Is there some way I can tweak this to make it a weak reference or something?

@jfirebaugh
Copy link
Contributor

Try AsyncQueue::unref.

namespace node_mbgl {

NodeThreadPool::NodeThreadPool()
: queue(new util::AsyncQueue<std::weak_ptr<mbgl::Mailbox>>(uv_default_loop(), [this](std::weak_ptr<mbgl::Mailbox> mailbox) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks the queue. NodeThreadPool::queue should be a value, not a pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, needed to fix this in mapbox-gl-native/platform/node/src/node_log.cpp too.

@mikemorris
Copy link
Contributor Author

I'm not quite sure what the issue I'm hitting now is, but it may be related to an implicit instantiation or something? Only thinking that because I had to move

#include "util/async_queue.hpp"

from node_thread_pool.cpp into node_thread_pool.hpp to work around an implicit instantiation of undefined template error when switching from a pointer to a value for the AsyncQueue

# .load
node(64939,0x7fff72e56000) malloc: *** error for object 0x102321760: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Process 64939 stopped
* thread #1: tid = 0x18dee4, 0x00007fff95784f06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff95784f06 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff95784f06 <+10>: jae    0x7fff95784f10            ; <+20>
    0x7fff95784f08 <+12>: movq   %rax, %rdi
    0x7fff95784f0b <+15>: jmp    0x7fff9577f7cd            ; cerror_nocancel
    0x7fff95784f10 <+20>: retq
(lldb) bt
* thread #1: tid = 0x18dee4, 0x00007fff95784f06 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff95784f06 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8dcb54ec libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff897486df libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff916af041 libsystem_malloc.dylib`free + 425
    frame #4: 0x00000001041177d7 mapbox-gl-native.node`node_mbgl::util::AsyncQueue<std::__1::weak_ptr<mbgl::Mailbox> >::stop(this=0x0000000104117780, handle=0x00000001023217a0)::'lambda'(uv_handle_s*)::operator()(uv_handle_s*) const + 55 at async_queue.hpp:51
    frame #5: 0x0000000104117798 mapbox-gl-native.node`node_mbgl::util::AsyncQueue<std::__1::weak_ptr<mbgl::Mailbox> >::stop(handle=0x00000001023217a0)::'lambda'(uv_handle_s*)::__invoke(uv_handle_s*) + 24 at async_queue.hpp:50
    frame #6: 0x000000010078357f node`uv_run + 517
    frame #7: 0x000000010065fd30 node`node::Start(int, char**) + 625
    frame #8: 0x0000000100000d34 node`start + 52

@mikemorris
Copy link
Contributor Author

mikemorris commented Oct 13, 2016

@jfirebaugh Pushed up 755623baa863f12ed3f967fef0daae0d4b279558 which should likely be reverted, but "fixes" the issue I described in #6687 (comment) and npm test and npm run test-suite complete successfully!!

@jfirebaugh
Copy link
Contributor

Ugh, AsyncQueue wants to delete itself asynchronously via a uv_close callback. This is confusing and dangerous (leaks if you forget to call stop()), but to get things working, revert back to creating the AsyncQueues with new.

@mikemorris mikemorris force-pushed the scheduler branch 5 times, most recently from 88f0eee to 76de29c Compare October 14, 2016 19:53
@mikemorris
Copy link
Contributor Author

mikemorris commented Oct 14, 2016

Android appears to be failing a single device test - any idea what's going on here @tobrun?

+---------+----------------------+---------------------+
| OUTCOME |   TEST_AXIS_VALUE    |     TEST_DETAILS    |
+---------+----------------------+---------------------+
| Failed  | shamu-22-en-portrait | 1 test cases failed |
+---------+----------------------+---------------------+

Not seeing ^ anymore, hitting #6705 instead.

@mikemorris
Copy link
Contributor Author

CI tests are all green now - what needs to happen before we can merge this @jfirebaugh?

The public API of the Node.js bindings (and other platforms) didn't change, but is the ABI break with the change to the mbgl::Map constructor something we need to consider in versioning?

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close!

@@ -37,7 +38,8 @@ class QueryBenchmark {
std::shared_ptr<HeadlessDisplay> display{ std::make_shared<HeadlessDisplay>() };
HeadlessView view{ display, 1 };
DefaultFileSource fileSource{ "benchmark/fixtures/api/cache.db", "." };
Map map{ view, fileSource, MapMode::Still };
ThreadPool workerThreadPool{ 4 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename workerThreadPoolthreadPool in each of the places in this PR that are gaining a ThreadPool member. The fact that it's used for "workers" is an implementation detail.

@@ -397,7 +398,8 @@ - (void)commonInit

// setup mbgl map
mbgl::DefaultFileSource *mbglFileSource = [MGLOfflineStorage sharedOfflineStorage].mbglFileSource;
_mbglMap = new mbgl::Map(*_mbglView, *mbglFileSource, mbgl::MapMode::Continuous, mbgl::GLContextMode::Unique, mbgl::ConstrainMode::None, mbgl::ViewportMode::Default);
mbgl::ThreadPool workerThreadPool(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not live long enough. It's on the stack here, but it needs to outlive _mbglMap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this - is the stack allocation okay in bin/render.cpp, bin/glfw.cpp and the tests?

@@ -261,7 +262,8 @@ - (void)commonInit {
[[NSFileManager defaultManager] removeItemAtURL:legacyCacheURL error:NULL];

mbgl::DefaultFileSource *mbglFileSource = [MGLOfflineStorage sharedOfflineStorage].mbglFileSource;
_mbglMap = new mbgl::Map(*_mbglView, *mbglFileSource, mbgl::MapMode::Continuous, mbgl::GLContextMode::Unique, mbgl::ConstrainMode::None, mbgl::ViewportMode::Default);
mbgl::ThreadPool workerThreadPool(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

private:
util::AsyncQueue<std::weak_ptr<mbgl::Mailbox>>* queue;

class NodeThread : public Nan::AsyncWorker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename NodeThreadWorker. It's not a thread.

@@ -51,6 +51,7 @@ class Map::Impl : public style::Observer {

View& view;
FileSource& fileSource;
Scheduler& workerThreadPool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename workerThreadPoolscheduler.

@jfirebaugh
Copy link
Contributor

The public API of the Node.js bindings (and other platforms) didn't change, but is the ABI break with the change to the mbgl::Map constructor something we need to consider in versioning?

So long as all the SDKs are updated in this PR, which they are, we're good.

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In those cases, the Map and ThreadPool are in the same stack frame, so it's ok.

Updates mbgl::Map constructor usage everywhere

Adds NodeThreadPool implementation using AsyncQueue to call
Nan::AsyncQueueWorker from main thread
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Node.js node-mapbox-gl-native
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants