Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Oscar: Mars actors 2.0 #1935

Open
qinxuye opened this issue Jan 26, 2021 · 0 comments
Open

[PROPOSAL] Oscar: Mars actors 2.0 #1935

qinxuye opened this issue Jan 26, 2021 · 0 comments

Comments

@qinxuye
Copy link
Collaborator

qinxuye commented Jan 26, 2021

Oscar: Mars Actors 2.0

Background

Mars Actors is the key component of entire distributed scheduling. Some enhancements need to be done in summary.

  1. Support stateful and stateless actors, statefull actors can only be created on main process, stateless ones can be created on main and sub processes. This is to ensure that all subprocess can be killed without leading to inconsistency status. This is important to reach goal of cancel-free. When user wants to cancel a job, the task which performed on a subprocess of worker can be cancelled by killing the subprocess.
  2. More sophisticated error handling. For older Mars Actors, if a subprocess is crashed due to reason OOM, the actors who sent messages to the actors on the subprocess will finally get timeout or broken pipe. We need to raise a ActorDead instead to indicate that the actor is dead due to death of the subprocess.
  3. Deadlock detection. If actor A sent a message to actor B, in the on_message of B, it send another message to actor A, the deadlock happens, we need to be able to detect the potential deadlock. The solution is to embed the calling chain into the message, and if a cycle call is detected, raise an error.
  4. API tuning. Previously, we provide the basic API like send and tell, calling actor's method remotely is implemented with an inherited actor based on the basic one. We can support this internally. For more details, refer to API examples shown below.
  5. Promise support internally. Promise is usefull when an actor send a message, and expect callback to another actor. The key point is that when the message sent, the first actor must be able to process other messages due to the reason that it's reentrancy now. However, for now, the promise is supported via another module, and it's quite complicated, and the usage is not very natrual as well.
  6. Multiple backends support, firstly should be Ray. Actors can be created on Ray instead of Mars Actors itself.

APIs

Oscar will change to

Basic APIs

  • Actor class.
import mars.oscar as ma

# stateful actor
class MyActor(ma.Actor):
    def __init__(self, *args, **kwargs):
        pass
  
    async def __post_create__(self):
        pass
      
    async def __pre_destroy__(self):
        pass
      
    def method_a(self, arg_1, arg_2, **kw_1):  # user-defined function
        pass

    async def method_b(self, arg_1, arg_2, **kw_1)  # user-defined async function
        pass
  • Creating actors.
import mars.oscar as ma

actor_ref = await ma.create_actor(
    MyActor, args=(1, 2), kwargs=dict(a=1, b=2),
    address=None)
  • Destroying actors.
import mars.oscar as ma

await ma.destroy_actor(actor_ref)
# or
await actor_ref.destroy()
  • Checking existence of actors.
import mars.oscar as ma

await ma.has_actor(worker_addr, actor_id)
  • Getting reference.
import mars.oscar as ma

actor_ref = await ma.actor_ref(worker_addr, actor_id)
  • Calling actor method.
# send
await actor_ref.method_a.send(1, 2, a=1, b=2)
# equivalent to actor_ref.method_a.send
await actor_ref.method_a(1, 2, a=1, b=2)
# tell
await actor_ref.method_a.tell(1, 2, a=1, b=2)
  • Promise integration
import mars.oscar as ma

class MyActor3(ma.Actor):
    def method_3():
        # some process
        do_some_operations
        # send message to other Actor, and 
        # quit the function to process other messages,
        # when callback comes, resume
        yield actor_ref.method_1.async_wait(1, 2, a=1, b=2), \
              actor_ref2.method_2.async_wait(1, 2)
        # resume to process
        do_other_operations
  • Long running annotation
import mars.oscar as ma

class MyActor4(ma.Actor):
    @ma.long_running
    def method_4():
        # CPU intensive operation, 
        # if not annotate with long running,
        # this function may block other coroutines,
        # `long_running` will let the method run in a thread,
        # and other coroutines could proceed
        pass

Actor Worker-level API

User-defined actor pool.

import mars.oscar as ma

class MyActorPool(ma.ActorPool):
    def on_new_process(self):
        pass
      
ma.register_actor_pool(MyActorPool)

Creating actor pool.

import mars.oscar as ma

ma.create_actor_pool(address, n_process, distributor, 
                     actor_pool_class=None, label=None, **kw)

Actor driver API

import mars.oscar as ma

ma.setup_cluster({'supervisor': {'CPU': 4, 'MEMORY': 16},
                  'worker1': {'cpu': 8, 'memory': 32},
                  'worker2': {'cpu': 8, 'memory': 32}})

Other backends like Ray could implement this method in order to create a Mars cluster.

@qinxuye qinxuye changed the title [PROPOSAL] Oscar: Mars actors 2.0 with multiple backends support [PROPOSAL] Oscar: Mars actors 2.0 Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant