Skip to content

Commit

Permalink
Update docs for branches (#91)
Browse files Browse the repository at this point in the history
* Update docs for branches

* Use yield from
  • Loading branch information
Zhou Fang authored Mar 3, 2024
1 parent f911cc7 commit a97f091
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 11 deletions.
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ print(catalog.datasets())

### Write and Read

Append, delete some data. Each mutation generates a new version of data, represented by an increasing integer ID. We expect to support the [Iceberg](https://iceberg.apache.org/docs/latest/branching/) style tags and branches for better version management.
Append, delete some data. Each mutation generates a new version of data, represented by an increasing integer ID. Users can add tags to version IDs as alias.
```py
import pyarrow.compute as pc
from space import RayOptions
Expand Down Expand Up @@ -170,12 +170,23 @@ runner.read_all(
)

# Read the changes between version 0 and 2.
for change_type, data in runner.diff(0, "after_delete"):
print(change_type)
print(data)
for change in runner.diff(0, "after_delete"):
print(change.change_type)
print(change.data)
print("===============")
```

Create a new branch and make changes in the new branch:

```py
# The default branch is "main"
ds.add_branch("dev")
ds.set_current_branch("dev")
# Make changes in the new branch, the main branch is not updated.
# Switch back to the main branch.
ds.set_current_branch("main")
```

### Transform and Materialized Views

Space supports transforming a dataset to a view, and materializing the view to files. The transforms include:
Expand Down Expand Up @@ -285,7 +296,6 @@ ds.storage.record_manifest() # Accept filter and snapshot_id
Space is a new project under active development.

:construction: Ongoing tasks:
- Iceberg style version branches.
- Performance benchmark and improvement.

## Disclaimer
Expand Down
2 changes: 1 addition & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "space-datasets"
version = "0.0.10"
version = "0.0.11"
authors = [{ name = "Space team", email = "no-reply@google.com" }]
description = "Unified storage framework for machine learning datasets"
readme = "README.md"
Expand Down
3 changes: 1 addition & 2 deletions python/src/space/core/ops/change_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,7 @@ def read_change_data(storage: Storage, start_snapshot_id: int,
"""
for snapshot_id in ordered_snapshot_ids(storage, start_snapshot_id,
end_snapshot_id):
for change in LocalChangeDataReadOp(storage, snapshot_id, read_options):
yield change
yield from LocalChangeDataReadOp(storage, snapshot_id, read_options)


class LocalChangeDataReadOp(StoragePathsMixin):
Expand Down
5 changes: 2 additions & 3 deletions python/src/space/ray/ops/change_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,8 @@ def read_change_data(storage: Storage, start_snapshot_id: int,
"""
for snapshot_id in ordered_snapshot_ids(storage, start_snapshot_id,
end_snapshot_id):
for change in _RayChangeDataReadOp(storage, snapshot_id, ray_options,
read_options):
yield change
yield from _RayChangeDataReadOp(storage, snapshot_id, ray_options,
read_options)


class _RayChangeDataReadOp(LocalChangeDataReadOp):
Expand Down

0 comments on commit a97f091

Please sign in to comment.