-
Notifications
You must be signed in to change notification settings - Fork 0
/
architecture.fjson
1 lines (1 loc) · 32.7 KB
/
architecture.fjson
1
{"parents": [], "prev": null, "next": null, "title": "Architecture Overview", "meta": {}, "body": "<section id=\"architecture-overview\">\n<h1>Architecture Overview<a class=\"headerlink\" href=\"#architecture-overview\" title=\"Permalink to this heading\">\u00b6</a></h1>\n<p>Runhouse is a Python library that allows any application to flexibly and powerfully utilize remote compute\ninfrastructure by deploying and calling remote services on the fly. It is principally designed for Machine\nLearning-style workloads (online, offline, training, and inference), where the need for heterogeneous\nremote compute is frequent and flexibility is paramount to minimize costs.</p>\n<p>Incorporating heterogeneous compute into the runtime of an application, like workflow\norchestrators (Airflow, Prefect) or distributed libraries (Ray, Spark) do, is far more disruptive and less flexible at\nevery level (development workflow, debugging, DevOps, infra) than calling the heterogeneous portions as remote services.\nCompare converting your Python application into an Airflow DAG to run the training portion on a GPU, vs.\nmaking an HTTP call within the application to the training function running as a service on a GPU.\nCalling a function or class as a remote service is a common pattern (e.g. microservices, Temporal)\nbut divides the code into multiple applications. This multiplies the DevOps overhead, each having their own\nconfiguration, automation, scaling, etc. Runhouse achieves the best of both approaches: limitless compute dynamism and\nflexibility in Python without disrupting the runtime or cleaving the application, by offloading\nfunctions and classes to remote compute as services on the fly.</p>\n<section id=\"why\">\n<h2>Why?<a class=\"headerlink\" href=\"#why\" title=\"Permalink to this heading\">\u00b6</a></h2>\n<p>This solves a few major problems for AI teams:</p>\n<ol class=\"arabic simple\">\n<li><p><strong>Cost</strong>: Runhouse introduces the flexibility to allocate compute only while needed, right-size instances based on\nthe size of the workload, work across multiple regions or clouds for lower costs, and share compute and services\nacross tasks. Users typically see cost savings on the order of 50-75%, depending on the workload.</p></li>\n<li><p><strong>Development at scale</strong>: Powerful hardware such as GPUs or distributed clusters (Spark, Ray) can be hugely\ndisruptive, requiring all development, debugging, automation, and deployment to occur on their runtime. Ray, Spark,\nor PyTorch distributed users for example must be tunneled into the head node at all times for development, leading\nto a proliferation of hosted notebook services as a stop-gap. Runhouse allows Python to orchestrate to these\nsystems remotely, returning the development workflow and operations to standard Python. Teams using Runhouse\ncan abandon hosted development notebooks and sandboxes entirely, again saving considerable cost and\nresearch-to-production time.</p></li>\n<li><p><strong>Infrastructure overhead</strong>: Runhouse thoughtfully captures infrastructure concerns in code, providing a clear\ncontract between the application and infrastructure, and saving ML teams from learning all the infra, networking,\nsecurity, and DevOps underneath.</p></li>\n</ol>\n</section>\n<section id=\"high-level-flow\">\n<h2>High-level Flow<a class=\"headerlink\" href=\"#high-level-flow\" title=\"Permalink to this heading\">\u00b6</a></h2>\n<p>The basic flow of how Runhouse offloads function and classes as services is as follows. You can follow along with this\nannotated code snippet:</p>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"kn\">import</span> <span class=\"nn\">runhouse</span> <span class=\"k\">as</span> <span class=\"nn\">rh</span>\n\n<span class=\"c1\"># [1] and [2]</span>\n<span class=\"n\">gpu</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">cluster</span><span class=\"p\">(</span><span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"rh-a10x"</span><span class=\"p\">,</span> <span class=\"n\">instance_type</span><span class=\"o\">=</span><span class=\"s2\">"A10G:1"</span><span class=\"p\">,</span> <span class=\"n\">provider</span><span class=\"o\">=</span><span class=\"s2\">"aws"</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">up_if_not</span><span class=\"p\">()</span>\n\n<span class=\"c1\"># [3]</span>\n<span class=\"n\">sd_worker</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">env</span><span class=\"p\">(</span><span class=\"n\">reqs</span><span class=\"o\">=</span><span class=\"p\">[</span><span class=\"s2\">"torch"</span><span class=\"p\">,</span> <span class=\"s2\">"transformers"</span><span class=\"p\">,</span> <span class=\"s2\">"diffusers"</span><span class=\"p\">],</span> <span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"sd_generate"</span><span class=\"p\">)</span>\n<span class=\"n\">remote_sd_generate</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">function</span><span class=\"p\">(</span><span class=\"n\">sd_generate</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">to</span><span class=\"p\">(</span><span class=\"n\">gpu</span><span class=\"p\">,</span> <span class=\"n\">env</span><span class=\"o\">=</span><span class=\"n\">sd_worker</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># [4]</span>\n<span class=\"n\">imgs</span> <span class=\"o\">=</span> <span class=\"n\">remote_sd_generate</span><span class=\"p\">(</span><span class=\"s2\">"A hot dog made out of matcha."</span><span class=\"p\">)</span>\n<span class=\"n\">imgs</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">show</span><span class=\"p\">()</span>\n\n<span class=\"c1\"># [5]</span>\n<span class=\"n\">remote_sd_generate</span><span class=\"o\">.</span><span class=\"n\">save</span><span class=\"p\">()</span>\n<span class=\"n\">sd_upsampler</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">function</span><span class=\"p\">(</span><span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"/my_username/sd_upsampler"</span><span class=\"p\">)</span>\n<span class=\"n\">high_res_imgs</span> <span class=\"o\">=</span> <span class=\"n\">sd_upsampler</span><span class=\"p\">(</span><span class=\"n\">imgs</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># [6]</span>\n<span class=\"n\">gpu</span><span class=\"o\">.</span><span class=\"n\">teardown</span><span class=\"p\">()</span>\n</pre></div>\n</div>\n<section id=\"specify-and-or-allocate-compute\">\n<h3>1. Specify and/or Allocate Compute<a class=\"headerlink\" href=\"#specify-and-or-allocate-compute\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">gpu</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">cluster</span><span class=\"p\">(</span><span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"rh-a10x"</span><span class=\"p\">,</span> <span class=\"n\">instance_type</span><span class=\"o\">=</span><span class=\"s2\">"A10G:1"</span><span class=\"p\">,</span> <span class=\"n\">provider</span><span class=\"o\">=</span><span class=\"s2\">"aws"</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">up_if_not</span><span class=\"p\">()</span>\n</pre></div>\n</div>\n<p>Runhouse can allocate compute to the application on the fly, either by\nutilizing an existing VM or Ray cluster, or allocating a new one using local cloud or K8s credentials. The\n<code class=\"docutils literal notranslate\"><span class=\"pre\">rh.cluster</span></code> constructor is generally used to specify and interact with remote compute, including bringing it up\nif necessary (<code class=\"docutils literal notranslate\"><span class=\"pre\">cluster.up_if_not()</span></code>).</p>\n</section>\n<section id=\"starting-the-runhouse-server-daemon\">\n<h3>2. Starting the Runhouse Server Daemon<a class=\"headerlink\" href=\"#starting-the-runhouse-server-daemon\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<p>If not already running, the client will start the Runhouse API server daemon\non the compute and form a secure network connection (either over SSH or HTTP/S). Dependencies can be specified to be\ninstalled before starting the daemon.</p>\n<ol class=\"arabic simple\">\n<li><p>The daemon can be thought of as a \u201cPython object server\u201d, holding key-value pairs of names and Python\nobjects in memory, and exposing an HTTP API to call methods on those objects by name.</p></li>\n<li><p>The objects are held in a single default worker process by default but can be sent to other worker\nprocesses, including on other nodes in the cluster, to achieve powerful parallelism out of the box.</p></li>\n<li><p>If I call GET <a class=\"reference external\" href=\"http://myserver:32300/my_object/my_method\">http://myserver:32300/my_object/my_method</a>, the daemon will look up the object named\n\u201cmy_object\u201d, issue an instruction for its worker to call the method \u201cmy_method\u201d on it, and\nreturn the result.</p></li>\n<li><p>The HTTP server and workers can handle thousands of concurrent calls per second, and have similar latency\nunder simple conditions to Flask.</p></li>\n<li><p>New workers can be constructed with <code class=\"docutils literal notranslate\"><span class=\"pre\">rh.env</span></code>, which specifies the details of the Python environment\n(packages, environment variables) in which the process will be constructed. By default, workers live\nin the same Python environment as the daemon but can also be started in a conda environment or a\nseparate node. To configure the environment of the daemon itself, such as setting environment variables\nor installing dependencies which will apply across all workers by default, you can pass an <code class=\"docutils literal notranslate\"><span class=\"pre\">rh.env</span></code> to the\n<code class=\"docutils literal notranslate\"><span class=\"pre\">default_env</span></code> argument of the <code class=\"docutils literal notranslate\"><span class=\"pre\">rh.cluster</span></code> constructor.</p></li>\n</ol>\n</section>\n<section id=\"deploying-functions-or-classes\">\n<h3>3. Deploying Functions or Classes<a class=\"headerlink\" href=\"#deploying-functions-or-classes\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">sd_worker</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">env</span><span class=\"p\">(</span><span class=\"n\">reqs</span><span class=\"o\">=</span><span class=\"p\">[</span><span class=\"s2\">"torch"</span><span class=\"p\">,</span> <span class=\"s2\">"transformers"</span><span class=\"p\">,</span> <span class=\"s2\">"diffusers"</span><span class=\"p\">],</span> <span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"sd_generate"</span><span class=\"p\">)</span>\n<span class=\"n\">remote_sd_generate</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">function</span><span class=\"p\">(</span><span class=\"n\">sd_generate</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">to</span><span class=\"p\">(</span><span class=\"n\">gpu</span><span class=\"p\">,</span> <span class=\"n\">env</span><span class=\"o\">=</span><span class=\"n\">sd_worker</span><span class=\"p\">)</span>\n</pre></div>\n</div>\n<p>The user specifies a function or class to be deployed to the remote compute\nusing the <code class=\"docutils literal notranslate\"><span class=\"pre\">rh.function</span></code> or <code class=\"docutils literal notranslate\"><span class=\"pre\">rh.module</span></code> constructors (or by subclassing <code class=\"docutils literal notranslate\"><span class=\"pre\">rh.Module</span></code>), and calling\n<code class=\"docutils literal notranslate\"><span class=\"pre\">remote_obj</span> <span class=\"pre\">=</span> <span class=\"pre\">my_obj.to(my_cluster,</span> <span class=\"pre\">env=my_env)</span></code>. The Runhouse client library extracts the path, module name,\nand importable name from the function or class. If the function or class is defined in local code, the repo or\npackage is rsynced onto the cluster. An instruction with the import path is sent to the cluster to\nconstruct the function or class in a particular worker and upserts it into the key-value store.</p>\n</section>\n<section id=\"calling-the-function-or-class\">\n<h3>4. Calling the Function or Class<a class=\"headerlink\" href=\"#calling-the-function-or-class\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">imgs</span> <span class=\"o\">=</span> <span class=\"n\">remote_sd_generate</span><span class=\"p\">(</span><span class=\"s2\">"A hot dog made out of matcha."</span><span class=\"p\">)</span>\n<span class=\"n\">imgs</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">show</span><span class=\"p\">()</span>\n</pre></div>\n</div>\n<p>After deploying the function, class, or object into the server, the Runhouse\nPython client returns a local callable stub which behaves like the original object but forwards method calls\nover HTTP to the remote object on the cluster.</p>\n<ol class=\"arabic simple\">\n<li><p>If a stateful instance of a class is desired, an <code class=\"docutils literal notranslate\"><span class=\"pre\">__init__</span></code> method can be called on the remote class to\ninstantiate a new remote object from the class and assign it a name.</p></li>\n<li><p>If arguments are passed to the method, they\u2019re serialized with cloudpickle and sent with the HTTP request.\nSerializing code, such as functions, classes, or dataclasses, is strongly discouraged, as it can lead to\nversioning mismatch errors between local and remote package versions.</p></li>\n<li><p>From here on, you can think of Runhouse as facilitating\nregular object-oriented programming but with the objects living remotely, maybe in a different cluster,\nregion, or cloud than the local code.</p></li>\n<li><p>Python behavior like async, exceptions, printing, and logging are all preserved across remote calls but\ncan be disabled or controlled if desired.</p></li>\n</ol>\n</section>\n<section id=\"saving-and-loading\">\n<h3>5. Saving and Loading<a class=\"headerlink\" href=\"#saving-and-loading\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">remote_sd_generate</span><span class=\"o\">.</span><span class=\"n\">save</span><span class=\"p\">()</span>\n<span class=\"n\">sd_upsampler</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">function</span><span class=\"p\">(</span><span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"/my_username/sd_upsampler"</span><span class=\"p\">)</span>\n<span class=\"n\">high_res_imgs</span> <span class=\"o\">=</span> <span class=\"n\">sd_upsampler</span><span class=\"p\">(</span><span class=\"n\">imgs</span><span class=\"p\">)</span>\n</pre></div>\n</div>\n<p>The Runhouse client can save and load objects to and from the local filesystem, or to a\nremote metadata store. This allows for easy sharing of clusters and services across users and environments,\nand for versioning and rollback of objects. The metadata store can be accessed from any Python interpreter,\nand is backed by UIs and APIs to view, monitor, and manage all resources.</p>\n</section>\n<section id=\"terminating-modules-workers-or-clusters\">\n<h3>6. Terminating Modules, Workers, or Clusters<a class=\"headerlink\" href=\"#terminating-modules-workers-or-clusters\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">gpu</span><span class=\"o\">.</span><span class=\"n\">teardown</span><span class=\"p\">()</span>\n</pre></div>\n</div>\n<p>When a remote object is no longer needed, it can be deallocated from\nthe remote compute by calling <code class=\"docutils literal notranslate\"><span class=\"pre\">cluster.delete(obj_name)</span></code>. This will remove the object from the key-value store and\nfree up the memory on the worker. A worker process can similarly be terminated with <code class=\"docutils literal notranslate\"><span class=\"pre\">cluster.delete(worker_name)</span></code>,\nterminating its activities and freeing its memory. An on-demand cluster can be terminated with <code class=\"docutils literal notranslate\"><span class=\"pre\">cluster.teardown()</span></code>,\nor by setting its <code class=\"docutils literal notranslate\"><span class=\"pre\">autostop_mins</span></code>, which will auto-terminate it after that period of inactivity. The default autostop\nis 60 minutes unless otherwise specified.</p>\n</section>\n</section>\n<section id=\"comparing-to-other-systems\">\n<h2>Comparing to other systems<a class=\"headerlink\" href=\"#comparing-to-other-systems\" title=\"Permalink to this heading\">\u00b6</a></h2>\n<p>Runhouse\u2019s APIs bear similarity to other systems, so it\u2019s helpful to compare and contrast. In many cases,\nRunhouse is not a replacement for these systems but rather a complement or extension. In others, you may be able\nto replace your usage of the other system entirely with Runhouse.</p>\n<section id=\"distributed-frameworks-e-g-ray-spark-elixr\">\n<h3>Distributed frameworks (e.g. Ray, Spark, Elixr)<a class=\"headerlink\" href=\"#distributed-frameworks-e-g-ray-spark-elixr\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<p>Distributed frameworks make it possible to offload execution onto separate\ncompute, like a different process or node within a their cluster runtime. Runhouse\ncan be seen as similar but with the crucial distinction of dispatching execution to compute <em>outside</em> of its own\nruntime (which is just Python) or orchestrating <em>between</em> clusters (even of different types).\nFor this reason, it has no other runtime to setup than Python itself, can be used to orchestrate your distributed code so you\ncan use your Ray or Spark clusters less disruptively within your stack (e.g. sending a function which uses\nRay over to the head node of the Ray cluster, where the Ray will execute as usual).</p>\n<p>This also fixes certain sharp edges with these systems to significantly reduce costs, such as the inability to use\nmore than one cluster in an application or sharing a cluster between multiple callers. Is also means the local and\nremote compute are largely decoupled, with no shared runtime which will break if one disconnects or goes down.</p>\n</section>\n<section id=\"workflow-orchestrators-e-g-airflow-prefect-dagster-flyte-metaflow-argo\">\n<h3>Workflow orchestrators (e.g. Airflow, Prefect, Dagster, Flyte, Metaflow, Argo)<a class=\"headerlink\" href=\"#workflow-orchestrators-e-g-airflow-prefect-dagster-flyte-metaflow-argo\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<p>Workflow orchestrators can allocate heterogeneous compute\non the the fly but act as the runtime itself for the program and only support certain pre-defined and highly\nconstrained DAGs. By allocating services Runhouse allows for arbitrary control flow and utilization of remote\nhardware, making Python itself the orchestrator.\nFor example, with Runhouse it\u2019s easy to allocate small compute to start a training but if the training fails due to OOM\nrestart it with a slightly larger box. Other compute flexibility like multi-region or multi-cloud which other\norchestrators struggle with are trivial for Runhouse.</p>\n<p>Generally, workflow orchestrators are built to be good at monitoring, telemetry, fault-tolerance, and scheduling, so\nwe recommend using one strictly for those features and using Runhouse within your pipeline nodes for the heterogeneous\ncompute and remote execution. You can also save a lot of money by reusing compute across multiple nodes or reusing\nservices across multiple pipelines with Runhouse, which is generally not possible with workflow orchestrators.</p>\n</section>\n<section id=\"serverless-frameworks-e-g-modal-aws-lambda\">\n<h3>Serverless frameworks (e.g. Modal, AWS Lambda)<a class=\"headerlink\" href=\"#serverless-frameworks-e-g-modal-aws-lambda\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<p>Serverless frameworks allow for the allocation of services on the fly but within a well-defined sandbox, and not\nstrictly from within regular Python - they require specific pre-packaging or CLI launch\ncommands outside Python. Runhouse runs fully in a Python interpreter so it can extend the compute power of practically\nany existing Python application, and allocates services inside your own compute, wherever that may be. We may even\nsupport serverless systems as compute backends in the future.</p>\n</section>\n<section id=\"infrastructure-in-code-e-g-skypilot-pulumi\">\n<h3>Infrastructure in code (e.g. SkyPilot, Pulumi)<a class=\"headerlink\" href=\"#infrastructure-in-code-e-g-skypilot-pulumi\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<p>Infrastructure in code tools allocate compute on the fly but can\u2019t utilize it instantly\nto offload execution within the application (though you could call a predefined script entrypoint or API\nendpoint). Runhouse uses SkyPilot to allocate compute but is vertically integrated to be able\nto perform allocation, (re)deployment, and management of a new service all in Python so the new compute can be used\ninstantly within the existing application. It also doesn\u2019t need to perform allocation to create new services -\nit can use existing compute or static VMs.</p>\n</section>\n<section id=\"gpu-accelerator-dispatch-e-g-pytorch-jax-mojo\">\n<h3>GPU/Accelerator dispatch (e.g. PyTorch, Jax, Mojo)<a class=\"headerlink\" href=\"#gpu-accelerator-dispatch-e-g-pytorch-jax-mojo\" title=\"Permalink to this heading\">\u00b6</a></h3>\n<p>GPU/Accelerator dispatch systems give the ability to offload computation to a local GPU or\nTPU. Runhouse does not have this capability but can offload a function or class to a remote instance with an\naccelerator, which can then itself use libraries like PyTorch or Jax (and maybe one day Mojo) to use the accelerator.</p>\n</section>\n</section>\n<section id=\"saving-loading-and-sharing\">\n<h2>Saving, Loading, and Sharing<a class=\"headerlink\" href=\"#saving-loading-and-sharing\" title=\"Permalink to this heading\">\u00b6</a></h2>\n<p>Runhouse resources (clusters, functions, modules, environments) can be saved, shared, and reused based on a compact\nJSON metadata signature. This allows for easy sharing of clusters and services across users and environments, which\ncan often lead to massive cost savings. Runhouse comes with a built-in metadata store / service registry called\n<a class=\"reference external\" href=\"https://www.run.house/dashboard\">Den</a> to facilitate convenient saving, loading, sharing, and management of these\nresources. Den can be accessed via an HTTP API or from any Python interpreter with a Runhouse token\n(either in <code class=\"docutils literal notranslate\"><span class=\"pre\">~/.rh/config.yaml</span></code> or an <code class=\"docutils literal notranslate\"><span class=\"pre\">RH_TOKEN</span></code> environment variable) like so:</p>\n<div class=\"highlight-python notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"kn\">import</span> <span class=\"nn\">runhouse</span> <span class=\"k\">as</span> <span class=\"nn\">rh</span>\n\n<span class=\"n\">remote_func</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">function</span><span class=\"p\">(</span><span class=\"n\">fn</span><span class=\"o\">=</span><span class=\"n\">my_func</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">to</span><span class=\"p\">(</span><span class=\"n\">my_cluster</span><span class=\"p\">,</span> <span class=\"n\">env</span><span class=\"o\">=</span><span class=\"n\">my_env</span><span class=\"p\">,</span> <span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"my_function"</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># Save to Den</span>\n<span class=\"n\">remote_func</span><span class=\"o\">.</span><span class=\"n\">save</span><span class=\"p\">()</span>\n\n<span class=\"c1\"># Reload the function and invoke it remotely on the cluster</span>\n<span class=\"n\">remote_func</span> <span class=\"o\">=</span> <span class=\"n\">rh</span><span class=\"o\">.</span><span class=\"n\">function</span><span class=\"p\">(</span><span class=\"n\">name</span><span class=\"o\">=</span><span class=\"s2\">"/my_username/my_function"</span><span class=\"p\">)</span>\n<span class=\"n\">res</span> <span class=\"o\">=</span> <span class=\"n\">remote_func</span><span class=\"p\">(</span><span class=\"o\">*</span><span class=\"n\">args</span><span class=\"p\">,</span> <span class=\"o\">**</span><span class=\"n\">kwargs</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># Share the function with another user, giving them access to call or modify the resource</span>\n<span class=\"n\">remote_func</span><span class=\"o\">.</span><span class=\"n\">share</span><span class=\"p\">(</span><span class=\"s2\">"user_a@gmail.com"</span><span class=\"p\">,</span> <span class=\"n\">access_level</span><span class=\"o\">=</span><span class=\"s2\">"write"</span><span class=\"p\">)</span>\n</pre></div>\n</div>\n<p>You can access the metadata directly by calling <code class=\"docutils literal notranslate\"><span class=\"pre\">resource.config()</span></code> and reconstruct the resource with\n<code class=\"docutils literal notranslate\"><span class=\"pre\"><Resource</span> <span class=\"pre\">Type>.from_config(config)</span></code>.</p>\n</section>\n</section>\n\n <script type=\"text/x-thebe-config\">\n {\n requestKernel: true,\n binderOptions: {\n repo: \"binder-examples/jupyter-stacks-datascience\",\n ref: \"master\",\n },\n codeMirrorConfig: {\n theme: \"abcdef\",\n mode: \"python\"\n },\n kernelOptions: {\n name: \"python3\",\n path: \"./.\"\n },\n predefinedOutput: true\n }\n </script>\n <script>kernelName = 'python3'</script>", "metatags": "<meta name=\"generator\" content=\"Docutils 0.19: https://docutils.sourceforge.io/\" />\n", "rellinks": [["genindex", "General Index", "I", "index"], ["py-modindex", "Python Module Index", "", "modules"]], "sourcename": "architecture.rst.txt", "toc": "<ul>\n<li><a class=\"reference internal\" href=\"#\">Architecture Overview</a><ul>\n<li><a class=\"reference internal\" href=\"#why\">Why?</a></li>\n<li><a class=\"reference internal\" href=\"#high-level-flow\">High-level Flow</a><ul>\n<li><a class=\"reference internal\" href=\"#specify-and-or-allocate-compute\">1. Specify and/or Allocate Compute</a></li>\n<li><a class=\"reference internal\" href=\"#starting-the-runhouse-server-daemon\">2. Starting the Runhouse Server Daemon</a></li>\n<li><a class=\"reference internal\" href=\"#deploying-functions-or-classes\">3. Deploying Functions or Classes</a></li>\n<li><a class=\"reference internal\" href=\"#calling-the-function-or-class\">4. Calling the Function or Class</a></li>\n<li><a class=\"reference internal\" href=\"#saving-and-loading\">5. Saving and Loading</a></li>\n<li><a class=\"reference internal\" href=\"#terminating-modules-workers-or-clusters\">6. Terminating Modules, Workers, or Clusters</a></li>\n</ul>\n</li>\n<li><a class=\"reference internal\" href=\"#comparing-to-other-systems\">Comparing to other systems</a><ul>\n<li><a class=\"reference internal\" href=\"#distributed-frameworks-e-g-ray-spark-elixr\">Distributed frameworks (e.g. Ray, Spark, Elixr)</a></li>\n<li><a class=\"reference internal\" href=\"#workflow-orchestrators-e-g-airflow-prefect-dagster-flyte-metaflow-argo\">Workflow orchestrators (e.g. Airflow, Prefect, Dagster, Flyte, Metaflow, Argo)</a></li>\n<li><a class=\"reference internal\" href=\"#serverless-frameworks-e-g-modal-aws-lambda\">Serverless frameworks (e.g. Modal, AWS Lambda)</a></li>\n<li><a class=\"reference internal\" href=\"#infrastructure-in-code-e-g-skypilot-pulumi\">Infrastructure in code (e.g. SkyPilot, Pulumi)</a></li>\n<li><a class=\"reference internal\" href=\"#gpu-accelerator-dispatch-e-g-pytorch-jax-mojo\">GPU/Accelerator dispatch (e.g. PyTorch, Jax, Mojo)</a></li>\n</ul>\n</li>\n<li><a class=\"reference internal\" href=\"#saving-loading-and-sharing\">Saving, Loading, and Sharing</a></li>\n</ul>\n</li>\n</ul>\n", "display_toc": true, "page_source_suffix": ".rst", "globaltoc": "<p class=\"caption\" role=\"heading\"><span class=\"caption-text\">Getting Started</span></p>\n<ul>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/quick-start-cloud/\">Quick Start</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/quick-start-den/\">Den Quick Start</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"how-to-use-runhouse/\">How to Use Runhouse</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"runhouse-in-your-stack/\">Working with Common Libraries and Tools</a></li>\n</ul>\n<p class=\"caption\" role=\"heading\"><span class=\"caption-text\">API Basics</span></p>\n<ul>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/api-clusters/\">Clusters</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/api-modules/\">Functions and Modules</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/api-folders/\">Folders</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/api-secrets/\">Secrets</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/api-resources/\">Resource Management</a></li>\n</ul>\n<p class=\"caption\" role=\"heading\"><span class=\"caption-text\">API Reference</span></p>\n<ul>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"api/python/\">Python API</a><ul>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/resource/\">Resource</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/function/\">Function</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/cluster/\">Cluster</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/image/\">Image</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/package/\">Package</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/module/\">Module</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/folder/\">Folder</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/secrets/\">Secrets</a></li>\n<li class=\"toctree-l2\"><a class=\"reference internal\" href=\"api/python/login/\">Login/Logout</a></li>\n</ul>\n</li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"api/cli/\">Command Line Interface</a></li>\n</ul>\n<p class=\"caption\" role=\"heading\"><span class=\"caption-text\">Other Topics</span></p>\n<ul>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"tutorials/async/\">Asynchronous Programming</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"installation/\">Installation and Setup</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"debugging-logging/\">Debugging and Logging</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"docker-setup/\">Docker: Cluster Setup</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"docker-workflows/\">Docker: Dev and Prod Workflows</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"troubleshooting/\">Manual Setup and Troubleshooting</a></li>\n<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"security-and-authentication/\">Security and Authentication</a></li>\n</ul>\n", "current_page_name": "architecture", "sidebars": ["about.html", "navigation.html", "relations.html", "searchbox.html", "donate.html"], "customsidebar": null, "alabaster_version": "0.7.16", "alabaster_version_info": [0, 7, 16]}