Skip to content

Commit

Permalink
Add new API design doc
Browse files Browse the repository at this point in the history
  • Loading branch information
jwodder committed Mar 3, 2021
1 parent 2df18e4 commit c4c5a57
Showing 1 changed file with 185 additions and 0 deletions.
185 changes: 185 additions & 0 deletions doc/design/python-api-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
Designs for an improved Python API
==================================

* New base classes:
* `Dandiset`
* `identifier: str`
* `created: datetime`
* `modified: datetime`
* `version: Optional[Version]`
* `get_all_assets() -> Iterator[Asset]`
* `get_assets_under_path(path: str) -> Iterator[Asset]` — Returns all assets whose paths are either under the given folder or equal the given filepath
* `get_asset_by_path(path: str) -> Optional[Asset]` — Finds the unique asset with the given exact path; returns `None` if no such asset
* `get_metadata() -> DandiMeta`
* `Asset`:
* `path: str`
* `sha256: str` — Should this only be on `RemoteAsset`?
* `size: int`
* `created: datetime` — Should this only be on `RemoteAsset`?
* `modified: datetime` — Should this only be on `RemoteAsset`?
* `get_metadata() -> BareAssetMeta`

* `dandi.dandiapi`:
* Upon creating a new `RESTFullAPIClient` instance, a new session object should automatically be created and used for all of the client's requests; the `session()` method should thus be eliminated
* If we need to support user-supplied session objects, those should be passed in to the constructor
* It may or may not be a good idea for the client to be usable as a context manager that closes its session on exit.
* Con: Closing the session would affect `RemoteDandiset` etc. instances for the client (see below), making them unusable.
* Rename the `send_request()` method to `request()` to parallel `requests.Session.request()`
* Rename the `parameters` argument to the HTTP methods to `params` to match `requests`
* The `api_url` argument to `DandiAPIClient`'s constructor should default to the URL for the official Dandiarchive instance (which should also be available as a constant)
* Give `DandiAPIClient` a classmethod `for_dandi_instance(cls, instance: Union[str, dandi_instance], token=None, authenticate=True)` (Rethink name) that takes either the name of a Dandi instance or an actual `dandi_instance`, fetches a token from the environment, keychain, and/or user if one is not given and `authenticate` is true, and returns a `DandiAPIClient` instance
* Most interaction with API resources should be via new `RemoteDandiset`, `Version`, and `RemoteAsset` classes and their methods
* These classes (except `Version`?) will all contain a private `_client` attribute referring back to the associated `DandiAPIClient` instance.
* Changes to `DandiAPIClient` methods:
* Add a `get_dandiset(dandiset_id: Union[str, RemoteDandiset], version_id=None) -> RemoteDandiset` method
* **To discuss:** Should the default `version` be "draft" or the most recent version? Either way, there should be another method that explicitly gets the other option.
* **To discuss:** Should this take a `lazy=False` parameter that, if true, constructs a `RemoteDandiset` containing only an identifier, without making any requests at the time of creation? This would allow calling a `RemoteDandiset`'s methods without having to make a request for the Dandiset's data beforehand. If & when any non-identifiers attributes of a lazy `RemoteDandiset` are requested, a request is made at that point and the results cached.
* Add a `get_all_dandisets() -> Iterator[RemoteDandiset]` method
* **To discuss:** What should these objects' `Version`s be?
* Retype `create_dandiset` to `create_dandiset(name: str, metadata: DandiMeta) -> RemoteDandiset`
* Add a `register_dandiset(dirpath: Union[str, Path], name: str, description: str) -> LocalDandiset` method that, in addition to calling `create_dandiset()`, also creates a `dandiset.yaml` file in `dirpath`?
* Add a `get_current_user() -> User` method
* Add a `search_users(username: str) -> Iterator[User]` method
* Add an `upload(dandiset: LocalDandiset, paths: Optional[List[str]] = None, show_progress=True, existing="refesh", validation="require") -> List[RemoteAsset]` method
* Use this to replace the `upload()` function
* The elements of `paths` are interpreted the same way as the argument to `get_assets_under_path()`
* The remaining methods should be either deleted or made private.
* **To bikeshed:** Should the Dandiset and version arguments to these methods be named `dandiset` & `version` or `dandiset_id` & `version_id`?
* Where a method can take an ID as either a string or an object, it is permitted for the object to belong to a different client.

* Attributes & methods of the `RemoteDandiset` class, a subclass of the new `Dandiset` base class, representing a combination of the API's dandiset and version resources:
* `identifier: str`
* `created: datetime`
* `modified: datetime`
* `version: Version`
* `most_recent_version: Version` ?
* `get_version(version_id: Union[str, Version]) -> Version` ?
* `get_all_versions() -> Iterator[Version]`
* `for_version(version_id: Union[str, Version]) -> RemoteDandiset`
* When `version_id == "draft"`, this is the same as `for_draft_version()`
* `for_draft_version() -> DraftDandiset`
* `delete() -> None`
* `get_users() -> Iterator[User]`
* `set_users(users: List[User]) -> None`
* Should this method also accept strings giving the usernames directly?
* `get_metadata() -> DandiMeta`
* `publish() -> Version` (Should this be moved to `DraftDandiset`?)
* `get_all_assets() -> Iterator[RemoteAsset]`
* `get_assets_under_path(path: str) -> Iterator[RemoteAsset]`
* `get_asset_by_path(path: str) -> Optional[RemoteAsset]`
* `get_asset_by_uuid(uuid: str) -> Optional[RemoteAsset]` — Returns `None` if the given asset does not exist
* `download(target_dir: Union[str, Path], paths: Optional[List[str]] = None, show_progress=True, existing="error", get_metadata=True) -> ???`
* Use this to replace the `download()` function?
* The elements of `paths` are interpreted the same way as the argument to `get_assets_under_path()`
* `get_metadata` controls whether to create a `dandiset.yaml` file; should it default to `False` when `paths` is not `None`?
* TO DO: Add an argument for controlling whether to create a directory under `target_dir` with the same name as the Dandiset ID?

* Methods of the `DraftDandiset` class (a subclass of `RemoteDandiset` used only for mutable draft versions):
* `set_metadata(metadata: DandiMeta) -> DraftDandiset` — returns the new, modified object
* Methods for uploading an individual asset:
* `upload_file(filepath: Union[str, Path], metadata: BareAssetMeta, show_progress=True, existing="refesh", validation="require") -> RemoteAsset`
* `iter_upload_file(filepath: Union[str, Path], metadata: BareAssetMeta, existing="refesh", validation="require") -> Iterator[UploadProgressDict]`
* `upload_asset(asset: LocalAsset, show_progress=True, existing="refesh", validation="require") -> RemoteAsset` ?
* `iter_upload_asset(asset: LocalAsset, existing="refesh", validation="require") -> Iterator[UploadProgressDict]` ?
* Methods for uploading a directory/collection of assets:
* `upload_assets(assets: Iterable[LocalAsset], show_progress=True, existing="refesh", validation="require") -> List[RemoteAsset]`

* Attributes & methods of the `Version` class:
* `identifier: str`
* `created: datetime`
* `modified: datetime`
* `name: str`
* `asset_count: int`
* `size: int`

* Attributes & methods of the `RemoteAsset` class (a subclass of `Asset`):
* `uuid: str`
* `path: str`
* `sha256: str`
* `size: int`
* `created: datetime`
* `modified: datetime`
* `dandiset_id: str`
* `version_id: str`
* `get_metadata() -> RemoteAssetMeta`
* `set_metadata(metadata: BareAssetMeta) -> RemoteAsset` — returns the new, modified object
* `delete() -> None`
* `download(filepath: str, show_progress=True, chunk_size=...) -> ???`
* `iter_download(filepath: str, chunk_size=...) -> Iterator[DownloadProgressDict]`

* Attributes of the `User` class:
* `username: str`
* `first_name: str`
* `last_name: str`
* `admin: bool`
* **Note:** We should ask the dandi-api developers to make the `GET/PUT /dandisets/{dandiset__pk}/users/` endpoints return full user structures, not just ones with only `username` fields, so that instances of this class can be properly populated

* The upload & download methods (or at least the ones for uploading/downloading single assets) come in pairs:
* The basic methods simply upload/download everything, blocking until completion, and return either nothing or a summary of everything that was uploaded/downloaded
* These methods have `show_progress=True` options for whether to display progress output using pyout or to remain silent
* The upload methods return an `Asset` or collection of `Asset`s. This can be implemented by having the final value yielded by the `iter_` upload function contain an `"asset"` field.
* Each method also has an iterator variant (named with an "`iter_`" preffix) that can be used to iterate over values containing progress information compatible with pyout
* These methods do not output anything (aside perhaps from logging)

* An `UploadProgressDict` is a `dict` containing some number of the following keys:
* `dandiset_id: str` — always present
* `version_id: str` — always present
* `path: str` — always present
* `status: str`
* `upload: float` — percentage of file uploaded (rename to "pct"?)
* `current: int` — number of bytes uploaded
* `message: str`
* `size: int` — the total size of the file
* `errors: int`
* `asset: RemoteAsset` — yielded in the last `dict` of a successful upload
* A `DownloadProgressDict` is a `dict` containing some number of the following keys:
* `dandiset_id: str` — always present
* `version_id: str` — always present
* `path: str` — always present
* `status: str`
* `message: str`
* `size: int` — the total size of the file
* `done: int` — number of bytes downloaded (Rename to "current"?)
* `done%: float` — percentage of bytes downloaded (Rename to "download"? or "pct"?)
* `checksum: str``"differs"`, `"ok"`, or `"-"`

* **To discuss:** Should methods that accept `DandiMeta` and `(Bare|Remote)AssetMeta` instances also accept raw `dict`s? If so, should any validation be done on these `dict`s?

* `dandi.dandiset`: Expand the ability for `Dandiset` (renamed to `LocalDandiset` and made a subclass of the new `Dandiset` base class) to represent a Dandiset on disk:
* `path: Path` (or `dandiset_path`?)
* `set_metadata(metadata: DandiMeta) -> None` ?
* `validate_metadata() -> None`
* `validate_assets() -> ???` ?
* Make `organize()` a method of this class?
* Add methods for marking local files as belonging/not belonging to the Dandiset (without changing anything on disk)
* `add_path(path: Union[str, Path], ext: Union[str, Iterable[str]] = ".nwb") -> None`
* If `path` is a directory, all files with an extension in `ext` under it (without recursing into VCS directories) are added.
* `remove_path(path: Union[str, Path]) -> None`
* If `path` is a directory, all files under it are removed.
* `add_path_glob(pattern: str) -> None`
* Recursive globs (with `**`) are supported
* Should this filter out VCS directories?
* `remove_path_glob(pattern: str) -> None`
* Recursive globs (with `**`) are supported
* Give the constructor arguments for controlling the initial set of files to include in the Dandiset (default: all `*.nwb` files)
- `ext: Union[str, Iterable[str]] = ".nwb"` ?
- [something for disabling auto-registration of files on construction; `register_files=True`?]

* `LocalDandiset` uses `LocalAsset`s, a subclass of the new `Asset` base class:
* `local_path: Path` — the path of the asset on disk

* Miscellaneous changes:
* Add a `DRAFT = "draft"` constant so that misspellings can be caught by static analysis
* Rename all constants (or at least the public ones) to be all-uppercase
* Rename the `dandi_instance` class to `DandiInstance`
* Add a `get_resource_by_url(url: str) -> Union[Dandiset, Version, Asset]` function that takes anything `parse_dandi_url()` accepts and returns the corresponding resource object ?
* Also add specializations like `get_dandiset_by_url(url: str) -> Dandiset` that error if the URL points to a resource of the wrong type?
* Rename `nwb2asset()` to `extract_asset_metadata()`
* If the function is not passed any file digest data, have it calculate the digest itself?
* Add a `VERSION_ID_RGX` constant
* Give `CommonModel` a `validate()` method that just does `type(self)(**self.dict())`
* Add a `UserInputError` exception class subclassing `ValueError`, use it for all errors caused by bad user input, and make `map_to_click_exceptions` only remap it and nothing else to `click.UsageError`
* Make the return value of `parse_dandi_url()` a structured class
* Rename the `download()` function to `download_url()`?
* Add a `process_uploads(Iterable[Iterator[UploadProgressDict]], show_progress=True) -> List[RemoteAsset]` function that takes a collection of upload iterators and consumes them in parallel, using pyout if `show_progress` is true
* Add a `process_downloads(Iterable[Iterator[DownloadProgressDict]], show_progress=True) -> ???` function that takes a collection of download iterators and consumes them in parallel, using pyout if `show_progress` is true

0 comments on commit c4c5a57

Please sign in to comment.