Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In memory blob object #78

Merged
merged 35 commits into from
Aug 7, 2023
Merged

In memory blob object #78

merged 35 commits into from
Aug 7, 2023

Conversation

dongreenberg
Copy link
Contributor

@dongreenberg dongreenberg commented Jun 23, 2023

Servlets / Object Store RpCs

  • Introduce Env Servlets
  • Add call_module_method to server and servlet.
    • Many new tests
    • Set up saving / streaming logs for calls
    • Merge in Josh's branch which streams to file without using ray logs
    • Fix capturing stdout and capturing logs from generators
    • Support capturing logs for call call_module_method calls, not just function
    • Add fast return path which skips result queue and streaming logs
      • Add basic performance tests
  • Add put_resource RpC
    • Fix autonaming logic (especially envs, runs, and modules)
  • Support streaming results from generator function or property through call_module_method
    • Test generator with small LM
  • Change env.to to use cluster.put_resource(env), then cluster.call_module_method(env.name, "install")
    • Update conda_env to do this too
    • Refactor package to always accept env in _install and use env.run instead of many subprocess.calls
    • Introduce simple env install caching
    • Support specifying resources in new env
    • Clarify / update how to specify that you don't want workdir synced over
    • More clearly indicate what is being rsynced, and update non-git repo logic

Modules

  • Add rh.Module parent for all classes which can hold a system and have their methods execute remotely on the system, include user-defined classes
    • Support .run and .remote for any remote method call, but with new behavior
    • Support setting attributes remotely by default, and getting properties via obj.fetch.property
    • Introduce subclass and factory methods for creating modules
      • Change factory to avoid running .__init__locally
      • Change factory and getattribute to support running class methods remotely
    • Many Tests
    • Introduce get_or_to
    • Return an awaitable from method if we detect that method is a coroutine (via @isaacrob)
    • Fix get_or_call and get_or_run
  • Make Function a Module
    • Change fn.to to use cluster.put_resource(fn), then cluster.call_module_method(env.name, "call")
    • calling fn.save(new_name) needs to rename the resource in the cluster's kvstore.
    • Make .remote return an actual remote Module, and .run return the runkey async
    • Allow .remote for stream to return a Queue while results are still being generated so user can .get each new result
    • Fix cancelling
    • Fix or deprecate fn map and starmap, and clean up supporting fn methods
  • Introduce rh.Queue to support streaming across Ray boundary
    • Support .get without popping value (subsequent PR)
    • Support persisting / paging queue (subsequent PR)
    • Support basic integer-based prioritization
    • Make sure results are handled FIFO (or more custom integer-based priority)
  • Introduce rh.KVStore to support Actor or non-actor KVs
  • Make Blob a Module
    • Clean up APIs and blob tests further
    • Change blob.to to handle to&from same cluster case (no bouncing off laptop)
  • Make Folder a Module (make KVStore subclass and see if we can remove fsspec)
  • Make Table a Module (subsequent PR)
  • Provide examples in funhouse

Provenance

  • Change Run from being a resource to a provenance property inside Resources
  • Add .provenance to Resource and various constructors
  • update Run's config_from_rns and from_config to contain provenance data directly rather than relying on filesystem config
  • Make log saving optional
  • Add option to save logs within RNS config
  • Clean up supporting run methods
  • Clean up supporting run_module_utils

Cluster

  • Change system to handle _current_cluster case for system.run and RpCs
    • Handle _current_cluster case for is_up, up_if_not, etc. Make sure we never launch a cluster from itself.
    • Handle call_module_method locally too
  • Add rh.here to return Cluster as primary way for users to interact with Object Store
    • Initialize obj store properly so rh.here is available in a python interpreter on the cluster
    • Add cluster.call to call module methods in a python interpreter (e.g. for debugging)
    • Add cluster.contents to list keys and obj types
  • Make like-clusters share clients so we don't need to recheck the cluster when working with objects returned by .remote (which will contain a new cluster object)
  • Update cluster docs

Cleanup

  • Update compute getting started tutorial
  • Deprecate obsolete RPCs
  • Update pipeline example in funhouse with new APIs
  • Document more clearly
  • Retest and update funhouse

dongreenberg and others added 7 commits June 22, 2023 17:27
…ronments). ObjectStore now holds a Ray actor which wraps a kv dict, which allows any ray process to access the kv dict.

2) Fix bug in cluster factory
3) Allow HTTP server to be started inside a conda env (needs more testing)
4) Remove deprecated pkg_resources usage
…bclass. .write() is no longer needed to save down, and we now handle serialization during file blob's .fetch and .write.

2) Use in-memory blob in object store tests instead of pinning. Obj store tests pass.
3) Start refactoring blob_tests. First few pass.
4) Only start ray in obj store if it's installed successfully (need to do this elsewhere too)
5)
2) Move default name generation into new util _generate_default_name function. Need to migrate folder and table to use this.
3) Organize utils a bit more.
…e tests pass. Next is making obj_store a dict again and unfucking run_module_utils.
if self.system.on_this_cluster():
obj_store.delete(self.name)
else:
self.system.delete(self.name)
Copy link
Collaborator

@jlewitt1 jlewitt1 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to delete the blob from the cluster's local file system? We don't have a delete method for the cluster which does this (i think previously this went through the blob's folder, how can we do this now that a blob doesn't live in a folder object?

jlewitt1 and others added 22 commits June 27, 2023 01:30
# Conflicts:
#	runhouse/rns/run_module_utils.py
#	runhouse/servers/http/http_server.py
…ethod on a blob in the blob store. Obj_store tests all pass.
…tall" method. All non-conda env tests pass and basic function test passes.
…ts all pass, but streaming logs doesn't work because logs are written to stdout after function completes.
…y object store, as well as a ray-based dict actor for x-env key lookup.

- Modify function `.to` to put function on the cluster, and modify __call__ path to call function on cluster via call_module_method.
- Fix dryrun bugs with putting function on cluster.
- Add simple numpy-based test to evaluate whether pinning keeps object in python memory. Test passes.
- Add simple env install caching
- Update Runs to use new blob behavior.
- Add test for stateful_generator to emulate LLM, not yet working.
# Conflicts:
#	runhouse/rns/envs/env.py
#	runhouse/rns/function.py
#	runhouse/rns/hardware/cluster_factory.py
- Add streaming results for generator function.
- Introduce Queue and KVStore resources
- Add `load` option for blob not to check RNS for key

Test stream_logs, test_pinning_in_memory, test_put_resource, test_stateful_generator and test_function.test_generator all pass, but streaming logs doesn't work property for generators.
# Conflicts:
#	runhouse/rns/defaults.py
#	runhouse/rns/top_level_rns_fns.py
…t rh.Modules. KVStore and Queue need more testing. Also, right now we're not sending state over when we send the resources to the cluster.

- Move logic for local vs. remote execution mostly within Module and Cluster.
- Changed function not to rely on run_module utils for __call__, remote, and run, and so far so good. Most obj_store tests work (ones that rely on .run do not). Logging is broken though.
- remote looks like it works but needs more testing.
- Added `provenance` field to Resource for holding Run info.
- Allow obj_store to support `put` across servlets
…ass) and MyClass(rh.Module). All module tests pass with streaming, both local=True and local=False, and property fetching.

- Support passing state when putting resources on a cluster.
- Support fetching properties (private and public) and complete Module through .fetch method. Fix support for private methods in __getattribute__.
- Make sure working_dir is synced for locally defined Modules.
- Introduce `remote_init` for easier specification of remote setup and saving a hop.
…arios, and add tests to ensure streaming logs works. All module tests pass, and most function and obj_store tests pass (other than .remote and .map related tests).
- Change .run behavior to be async but return a run string.
- Update Blob to be a Module.
- Allow cluster.get to return a remote.
- Stop calling mkdir within Folder constructor, call within put instead.
- Introduce `rh.here` as a way to get current cluster.

All obj_store tests pass except cancelling. All module tests pass. Most function tests pass except other function types (map, queue, etc.), cancelling, and http url. All cluster tests pass.
# Conflicts:
#	runhouse/rns/envs/env.py
#	tests/conftest.py
#	tests/test_blob.py
#	tests/test_env.py
#	tests/test_function.py
#	tests/test_obj_store.py
- Get rid of `install` dedicated rpc, and make cluster.install_packages flow through `env.to`
- Add support for returning a queue when calling a generator with .remote to stream back results.

Module and cluster tests pass.
# Conflicts:
#	runhouse/__init__.py
@dongreenberg
Copy link
Contributor Author

Merging this into main for now, but still more cleanup to do.

@dongreenberg dongreenberg merged commit 89dce37 into main Aug 7, 2023
@jlewitt1 jlewitt1 deleted the in-mem branch September 5, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants