Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow sandbox #164

Merged
merged 18 commits into from
Oct 28, 2022
Merged

Workflow sandbox #164

merged 18 commits into from
Oct 28, 2022

Conversation

cretz
Copy link
Member

@cretz cretz commented Oct 25, 2022

What was changed

This is better described in the README, but high-level notes:

  • Created entirely new temporalio.worker.workflow_sandbox package
  • Changed default runner to this new sandbox
  • Uses exec() for global state isolation
  • Uses custom importer
  • Lots of Python libraries do lots of side-effecting, non-deterministic things on import so we have lots of knobs for affecting what is/isn't restricted
  • Users can skip restricting code using with workflow.unsafe.sandbox_unrestricted():
  • Users can skip sandbox per workflow via @workflow.defn(sandboxed=False)
  • Users can skip sandbox per worker via runner_class=UnsandboxedWorkflowRunner (what the default is before this PR)
  • Has caveats such as improperly affecting global state which we need to fix
  • TODO: Need to try subinterpreter via Rust
  • TODO: Test for performance

It is quite possible people's code is gonna break. The sandbox is a bit fickle with how it checks restrictions and such. But if you have a simple workflow trying to call random or open a file, this will stop you.

Closes #80

@cretz cretz requested a review from a team October 25, 2022 15:29
@cretz cretz marked this pull request as ready for review October 25, 2022 17:30
README.md Outdated Show resolved Hide resolved
temporalio/worker/workflow_sandbox/importer.py Outdated Show resolved Hide resolved
Comment on lines +621 to +622
To remove restrictions around a particular block of code, use `with temporalio.workflow.unsafe.sandbox_unrestricted():`.
The workflow will still be running in the sandbox, but no restrictions for invalid library calls will be applied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
return ret


# TODO(cretz): Should I make this more declarative in an external file?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine honestly, especially if you could change it slightly to allow passing just a string list for all the all_uses cases, which would reduce the verbosity a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I may make it a bit easier to write. I am not happy with my SandboxMatcher and I have learned a lot since, but I marked this entire API as unstable, so I can clean later.

Comment on lines +794 to +798
def __getattribute__(self, __name: str) -> Any:
state = _RestrictionState.from_proxy(self)
_trace("__getattribute__ %s on %s", __name, state.name)
state.assert_child_not_restricted(__name)
ret = object.__getattribute__(self, "__getattr__")(__name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the highest density of underscores I've ever seen in a code block lol

Copy link
Member Author

@cretz cretz Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely confusing. It's how you prevent accidental recursion when accessing self after you have overridden __getattribute__ (ref https://docs.python.org/3/reference/datamodel.html#object.__getattribute__). In this case I'm accessing __getattr__ the higher level of __getattribute__, and __name is named that way to prevent you from using it as a kwarg (signature taken from typeshed basically IIRC)

Comment on lines +95 to +96
sandboxed: Whether the workflow should run in a sandbox. Default is
true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make sense to refer to more detailed docs about what the sandbox is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to, but I am afraid of linking to the README considering the goal would be to have it in real docs (but the README is so much more comprehensive). I have this same problem in workflow_sandbox/__init__.py where I just say:

See the Python SDK documentation for how to use, customize, and work around sandbox issues.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Definitely... tedious, though. Would be curious to see if we could end up with less code via the multiple interpreters route, but I suppose all the blacklisting stuff can only ever get so short.

@cretz
Copy link
Member Author

cretz commented Oct 25, 2022

This makes sense to me. Definitely... tedious, though. Would be curious to see if we could end up with less code via the multiple interpreters route

I think I can simplify in a couple of ways:

I think it also will be safer. But I didn't have enough time to continue my pursuits here. My other concern is perf and mem utilization. So I need to test this sandbox first.

I suppose all the blacklisting stuff can only ever get so short.

Yeah, I don't mind maintaining a blacklist of stdlib. I think it's also gonna have use being accessed programmatically when I make a static analyzer.

README.md Outdated Show resolved Hide resolved
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got about half way through the review.
I'll pick it back up tomorrow.
So far LGTM, didn't have any critical blocking comments.

@@ -49,7 +50,8 @@ def __init__(
*,
workflows: Sequence[Type],
workflow_task_executor: Optional[concurrent.futures.ThreadPoolExecutor] = None,
workflow_runner: WorkflowRunner = UnsandboxedWorkflowRunner(),
workflow_runner: WorkflowRunner = SandboxedWorkflowRunner(),
unsandboxed_workflow_runner: WorkflowRunner = UnsandboxedWorkflowRunner(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a stable option? Should it be marked experimental?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a stable option. It is what is run when you do @workflow.defn(sandboxed=False).

Comment on lines +101 to +102
unsandboxed_workflow_runner: Runner for workflows that opt-out of
sandboxing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, is this a stable API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, yes

temporalio/worker/workflow_sandbox/importer.py Outdated Show resolved Hide resolved
temporalio/worker/workflow_sandbox/importer.py Outdated Show resolved Hide resolved
temporalio/worker/workflow_sandbox/importer.py Outdated Show resolved Hide resolved
temporalio/worker/workflow_sandbox/importer.py Outdated Show resolved Hide resolved
logger = logging.getLogger(__name__)

# Set to true to log lots of sandbox details
LOG_TRACE = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge this with the other LOG_TRACE var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one? I think these should remain file-specific

@cretz cretz requested a review from bergundy October 27, 2022 16:35
for k, v in extra_globals.items():
self.globals_and_locals[k] = v
try:
exec(code, self.globals_and_locals, self.globals_and_locals)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we didn't test for performance, but do you think compiling and caching the workflow code could be beneficial?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not much. This runs only a few lines of code. Two of those are static yes (another is dynamic), but the actual user's workflow code is already cached by Python in the loader and import just re-executes it. The compile() step here would only compile those few lines. But yes, upon benchmarking, this may be worth it. I can make that code static too. I may come back to this after profiling.

@cretz cretz merged commit 0126e6e into temporalio:main Oct 28, 2022
@cretz cretz deleted the workflow-sandbox branch October 28, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create workflow sandbox
4 participants