Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Mars storage lib #1905

Open
hekaisheng opened this issue Jan 20, 2021 · 0 comments
Open

[PROPOSAL] Mars storage lib #1905

hekaisheng opened this issue Jan 20, 2021 · 0 comments

Comments

@hekaisheng
Copy link
Contributor

hekaisheng commented Jan 20, 2021

Currently, Mars storage needs to handle both memory and disk, we can define a abstract base class StorageBackend and implements backends like plasma, filesystem, vineyard, etc.

API

StorageBackend defines as below:

class StorageBackend(ABC):
    @classmethod
    @abstractmethod
    async def setup(cls, **kwargs) -> Tuple[Dict, Dict]:
        """
        Setup environments, for example, start plasma store for plasma backend.
        Parameters
        ----------
        kwargs : kwargs
            Kwargs for setup.
        Returns
        -------
        Tuple of two dicts
            Dicts for initialization and teardown.
        """

    @staticmethod
    async def teardown(**kwargs):
        """
        Clean up the environments.
        Parameters
        ----------
        kwargs : kwargs
             Parameters for clean up.
        """

    @property
    @abstractmethod
    def level(self):
        """
        Level of current storage backend.
        Returns
        -------
        Level: str
            storage level.
        """

    @abstractmethod
    async def get(self, object_id, **kwargs) -> object:
        """
        Get object by key. For some backends, `columns` or `slice` can pass to get part of data.
        Parameters
        ----------
        object_id : object id
            Object id to get.
        kwargs:
            Additional keyword arguments
        Returns
        -------
        Python object
        """

    @abstractmethod
    async def put(self, obj, importance=0) -> ObjectInfo:
        """
        Put object into storage with object_id.
        Parameters
        ----------
        obj : python object
            Object to put.
        importance: int
             The priority to spill when storage is full
        Returns
        -------
        ObjectInfo
            object information including size, raw_size, device
        """

    @abstractmethod
    async def delete(self, object_id):
        """
        Delete object from storage by object_id.
        Parameters
        ----------
        object_id
            object id
        """

    @abstractmethod
    async def object_info(self, object_id) -> ObjectInfo:
        """
        Get information about stored object.
        Parameters
        ----------
        object_id
            object id
        Returns
        -------
        ObjectInfo
            Object info including size, device and etc.
        """

    @abstractmethod
    async def open_writer(self, size=None) -> StorageFileObject:
        """
        Return a file-like object for writing.
        Parameters
        ----------
        size: int
            Maximum size in bytes
        Returns
        -------
        fileobj: StorageFileObject
        """

    @abstractmethod
    async def open_reader(self, object_id) -> StorageFileObject:
        """
        Return a file-like object for reading.
        Parameters
        ----------
        object_id
            Object id
        Returns
        -------
        fileobj: StorageFileObject
        """

    async def list(self) -> List:
        """
        List all stored objects in storage.
        Returns
        -------
        List of objects
        """

    async def prefetch(self, object_id):
        """
        Fetch object to current worker.
        Parameters
        ----------
        object_id
            Object id.
        """

    async def pin(self, object_id):
        """
        Pin the data to prevent the data being released or spilled.
        Parameters
        ----------
        object_id
            object id
        """

    async def unpin(self, object_id):
        """
        Unpin the data, allow storage to release the data.
        Parameters
        ----------
        object_id
            object id
        """

the definition of StorageLevel:

class StorageLevel(Enum):
    GPU = 1 << 0
    MEMORY = 1 << 1
    DISK = 1 << 2
    REMOTE = 1 << 3

    def __and__(self, other: "StorageLevel"):
        return self.value | other.value

if the storage could handle both memory and disk, the level can be expressed as StorageLevel.MEMORY & StorageLevel.DISK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants