Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[Error:38]Function not implemented. multiprocessing in graphene #2689

Open
Yujindawang opened this issue Sep 24, 2021 · 7 comments
Open

[Error:38]Function not implemented. multiprocessing in graphene #2689

Yujindawang opened this issue Sep 24, 2021 · 7 comments

Comments

@Yujindawang
Copy link

Yujindawang commented Sep 24, 2021

I run a pytorch program in graphene, but here threw an error when it comes to multi-process communication.

Traceback (most recent call last):
  File "/workplace/app/predict.py", line 137, in <module>
    main(input_dir)
  File "/workplace/app/predict.py", line 113, in main
    for i, (images) in enumerate(tqdm(val_loader)):
  File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1185, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 355, in __iter__
    return self._get_iterator()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 887, in __init__
    self._worker_result_queue = multiprocessing_context.Queue()  # type: ignore
  File "/usr/lib/python3.6/multiprocessing/context.py", line 102, in Queue
    return Queue(maxsize, ctx=self.get_context())
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 42, in __init__
    self._rlock = ctx.Lock()
  File "/usr/lib/python3.6/multiprocessing/context.py", line 67, in Lock
    return Lock(ctx=self.get_context())
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 59, in __init__
    unlink_now)

Sorry I can’t provide the relevant code. So i typed a simple example to simulate this situation.

import os
import threading
import multiprocessing

# Main
print('Main:', os.getpid())

# worker function
def worker(sign, lock):
    lock.acquire()
    print(sign, os.getpid())
    lock.release()


# Multi-thread
record = []
lock = threading.Lock()

# Multi-process
record = []
lock = multiprocessing.Lock()

if __name__ == '__main__':
    for i in range(5):
        thread = threading.Thread(target=worker, args=('thread', lock))
        thread.start()
        record.append(thread)

    for thread in record:
        thread.join()
    
    for i in range(5):
        process = multiprocessing.Process(target=worker, args=('process', lock))
        process.start()
        record.append(process)
    
    for process in record:
        process.join()

when i run this py in graphene, same error was thrown.

Traceback (most recent call last):
  File "/workplace/app/test.py", line 21, in <module>
    lock = multiprocessing.Lock()
  File "/usr/lib/python3.6/multiprocessing/context.py", line 67, in Lock
    return Lock(ctx=self.get_context())
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 59, in __init__
    unlink_now)

This seems to be related to Sem_lock.

So I want to confirm the problems of multi-process in graphene and how to solve this bug?

Thank you for your attention and look forward to your reply. T_T

@dimakuv
Copy link

dimakuv commented Sep 24, 2021

From what I understand, Python's multiprocessing package uses POSIX semaphores (or maybe even older Sys-V semaphores). Check https://linux.die.net/man/7/sem_overview.

Gramine currently doesn't support semaphores at all. So this example unfortunately cannot run in Gramine.

@dimakuv
Copy link

dimakuv commented Sep 24, 2021

You can actually add loader.log_level = "trace" in your manifest file, and re-run your simple example. Gramine will output a lot of additional info.

If you see some system call like sem_wait() or sem_open() or segmet(), and they return -38 ("not implemented"), then it's definitely the problem: Gramine doesn't support these system calls (Gramine doesn't support semaphores).

@Yujindawang
Copy link
Author

You can actually add loader.log_level = "trace" in your manifest file, and re-run your simple example. Gramine will output a lot of additional info.

If you see some system call like sem_wait() or sem_open() or segmet(), and they return -38 ("not implemented"), then it's definitely the problem: Gramine doesn't support these system calls (Gramine doesn't support semaphores).

i got it, so is there any workaround to deal with this problem?

@dimakuv
Copy link

dimakuv commented Sep 24, 2021

i got it, so is there any workaround to deal with this problem?

The only workaround I can think of is: don't use multiprocessing package in Python.

@Yujindawang
Copy link
Author

Yujindawang commented Sep 28, 2021

i got it, so is there any workaround to deal with this problem?

The only workaround I can think of is: don't use multiprocessing package in Python.

I try to set num_workers=0 to ban the multiprocessing function ,  

data.DataLoader(val_dst, batch_size=opts.val_batch_size, shuffle=False, num_workers=0)

The original error 38 disappeared, but another bug happened, my program killed by singal 8. I checked that it means 8) SIGFPE.
I have tried without graphene, it ran successfully. Then i try to increase enclave_size, stack.size and pal_internal_mem_size,but all invalid. I have no idea what happened.
微信图片_20210928092345

for i, (images) in enumerate(tqdm(val_loader)):

@dimakuv
Copy link

dimakuv commented Sep 28, 2021

arithmetic fault is interesting!

Can you debug your program using gdb? You simply build all Gramine in debug mode and then run GDB=1 gramine-sgx <your app>. For more info, check https://gramine.readthedocs.io/en/latest/devel/debugging.html

The debugger should point you to the place in code / assembly instruction where SIGFPE (airthmetic fault) happens. This may immediately give you an idea of what goes wrong. Or at least you can copy-paste the assembly snippet where SIGFPE happens -- maybe we'll be able to help just by looking at the failing assembly.

@Yujindawang
Copy link
Author

arithmetic fault is interesting!

Can you debug your program using gdb? You simply build all Gramine in debug mode and then run GDB=1 gramine-sgx <your app>. For more info, check https://gramine.readthedocs.io/en/latest/devel/debugging.html

The debugger should point you to the place in code / assembly instruction where SIGFPE (airthmetic fault) happens. This may immediately give you an idea of what goes wrong. Or at least you can copy-paste the assembly snippet where SIGFPE happens -- maybe we'll be able to help just by looking at the failing assembly.

Sorry for not being able to check the message due to some things in the past two days, i will try GBD immediately.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants