Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use atexit hook to implicitly shutdown Runtime #1595

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

ByronHsu
Copy link
Collaborator

@ByronHsu ByronHsu commented Oct 7, 2024

Motivation

Runtime has two caveats for termination:

  1. Require runtime.shutdown() to be called explicitly, or the program hangs
  2. If the client runs into errors, the program hangs instead of crashing

Currently, we use __del__ to shutdown subprocesses. However, the call of __del__ is undeterministic (depending on gc), and usually not called at the termination.

atexit provides a more reliable way. It is triggered when

  1. The last line of the main script has been executed.
  2. An unhandled exception occurs.
  3. The sys.exit() function is called.
  4. The process receives a termination signal (e.g., SIGTERM on Unix-like systems).

Modifications

add atexit hook to Runtime

Testing

  1. The program shuts down gracefully if no exception
"""
Usage:
python3 local_example_chat.py
"""

import sglang as sgl


@sgl.function
def multi_turn_question(s, question_1, question_2):
    s += sgl.user(question_1)
    s += sgl.assistant(sgl.gen("answer_1", max_tokens=256))
    s += sgl.user(question_2)
    s += sgl.assistant(sgl.gen("answer_2", max_tokens=256))


def single():
    state = multi_turn_question.run(
        question_1="What is the capital of the United States?",
        question_2="List two local attractions.",
    )

    for m in state.messages():
        print(m["role"], ":", m["content"])

    print("\n-- answer_1 --\n", state["answer_1"])


def stream():
    state = multi_turn_question.run(
        question_1="What is the capital of the United States?",
        question_2="List two local attractions.",
        stream=True,
    )

    for out in state.text_iter():
        print(out, end="", flush=True)
    print()


def batch():
    states = multi_turn_question.run_batch(
        [
            {
                "question_1": "What is the capital of the United States?",
                "question_2": "List two local attractions.",
            },
            {
                "question_1": "What is the capital of France?",
                "question_2": "What is the population of this city?",
            },
        ]
    )

    for s in states:
        print(s.messages())


if __name__ == "__main__":
    runtime = sgl.Runtime(model_path="/shared/public/models/Qwen/Qwen2.5-1.5B-Instruct/")
    sgl.set_default_backend(runtime)

    # Run a single request
    print("\n========== single ==========\n")
    single()

    # Stream output
    print("\n========== stream ==========\n")
    stream()

    # Run a batch of requests
    print("\n========== batch ==========\n")
    batch()
  1. If the client raises exception, the program shuts down with exception
@sgl.function
def multi_turn_question(s, question_1, question_2):
    s += sgl.user(question_1)
    s += sgl.assistant(sgl.gen("answer_1", max_tokens=256, dummy=True))
    s += sgl.user(question_2)
    s += sgl.assistant(sgl.gen("answer_2", max_tokens=256))
Traceback (most recent call last):
  File "/home/jobuser/sglang/examples/frontend_language/quick_start/local_example_chat.py", line 65, in <module>
    single()
  File "/home/jobuser/sglang/examples/frontend_language/quick_start/local_example_chat.py", line 18, in single
    state = multi_turn_question.run(
  File "/home/jobuser/sglang/python/sglang/lang/ir.py", line 198, in run
    return run_program(self, backend, args, kwargs, default_sampling_para, stream)
  File "/home/jobuser/sglang/python/sglang/lang/interpreter.py", line 80, in run_program
    run_internal(state, program, func_args, func_kwargs, sync)
  File "/home/jobuser/sglang/python/sglang/lang/interpreter.py", line 45, in run_internal
    raise e
  File "/home/jobuser/sglang/python/sglang/lang/interpreter.py", line 43, in run_internal
    state.ret_value = program.func(state, *func_args, **func_kwargs)
  File "/home/jobuser/sglang/examples/frontend_language/quick_start/local_example_chat.py", line 12, in multi_turn_question
    s += sgl.assistant(sgl.gen("answer_1", max_tokens=256, dummy=True))
TypeError: gen() got an unexpected keyword argument 'dummy'

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@ByronHsu ByronHsu changed the title Add atexit hook to Runtime to implicitly shutdown python program Use atexit hook to implicitly shutdown Runtime Oct 7, 2024
@merrymercy merrymercy enabled auto-merge (squash) October 7, 2024 05:02
@merrymercy merrymercy merged commit 565b05f into sgl-project:main Oct 7, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants