Update docs

- add lib development guide - Supplementary quickstart documentation - fix known issue Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
kubeedge · Aug 30, 2021 · 3faf253 · 3faf253
1 parent 26ed4e5
commit 3faf253
Show file tree

Hide file tree

Showing 8 changed files with 647 additions and 17 deletions.
diff --git a/build/crd-samples/sedna/lifelonglearningjobv1alpha1.yaml b/build/crd-samples/sedna/lifelonglearningjobv1alpha1.yaml
@@ -1,4 +1,3 @@
-kubectl create -f - <<EOF
 apiVersion: sedna.io/v1alpha1
 kind: LifelongLearningJob
 metadata:

diff --git a/docs/contributing/lib/datasets.md b/docs/contributing/lib/datasets.md
@@ -0,0 +1,131 @@
+# Dataset Development Guide
+
+## Introduction
+
+The Sedna provides interfaces and public methods related to data conversion and sampling in the Dataset class. The user data processing class can inherit from the Dataset class and use these public capabilities.
+
+
+### 1. Example
+
+The following describes how to use the Dataset by using a `txt-format contain sets of images` as an example. The procedure is as follows:
+
+- 1.1. All dataset classes of Sedna are inherited from the base class `sedna.datasources.BaseDataSource`. The base class BaseDataSource defines the interfaces required by the dataset, provides attributes such as data_parse_func, save, and concat, and provides default implementation. The derived class can reload these default implementations as required.
+
+```python
+
+    class BaseDataSource:
+        """
+        An abstract class representing a :class:`BaseDataSource`.
+    
+        All datasets that represent a map from keys to data samples should subclass
+        it. All subclasses should overwrite parse`, supporting get train/eval/infer
+        data by a function. Subclasses could also optionally overwrite `__len__`,
+        which is expected to return the size of the dataset.overwrite `x` for the
+        feature-embedding, `y` for the target label.
+    
+        Parameters
+        ----------
+        data_type : str
+            define the datasource is train/eval/test
+        func: function
+            function use to parse an iter object batch by batch
+        """
+
+        def __init__(self, data_type="train", func=None):
+            self.data_type = data_type  # sample type: train/eval/test
+            self.process_func = None
+            if callable(func):
+                self.process_func = func
+            elif func:
+                self.process_func = ClassFactory.get_cls(
+                    ClassType.CALLBACK, func)()
+            self.x = None  # sample feature
+            self.y = None  # sample label
+            self.meta_attr = None  # special in lifelong learning
+
+        def num_examples(self) -> int:
+            return len(self.x)
+
+        def __len__(self):
+            return self.num_examples()
+
+        def parse(self, *args, **kwargs):
+            raise NotImplementedError
+
+        @property
+        def is_test_data(self):
+            return self.data_type == "test"
+
+        def save(self, output=""):
+            return FileOps.dump(self, output)
+
+    class TxtDataParse(BaseDataSource, ABC):
+        """
+        txt file which contain image list parser
+        """
+
+        def __init__(self, data_type, func=None):
+            super(TxtDataParse, self).__init__(data_type=data_type, func=func)
+
+        def parse(self, *args, **kwargs):
+            pass
+
+```
+
+- 1.2. Defining Dataset parse function
+
+```python
+    def parse(self, *args, **kwargs):
+        x_data = []
+        y_data = []
+        use_raw = kwargs.get("use_raw")
+        for f in args:
+            with open(f) as fin:
+                if self.process_func:
+                    res = list(map(self.process_func, [
+                               line.strip() for line in fin.readlines()]))
+                else:
+                    res = [line.strip().split() for line in fin.readlines()]
+            for tup in res:
+                if not len(tup):
+                    continue
+                if use_raw:
+                    x_data.append(tup)
+                else:
+                    x_data.append(tup[0])
+                    if not self.is_test_data:
+                        if len(tup) > 1:
+                            y_data.append(tup[1])
+                        else:
+                            y_data.append(0)
+        self.x = np.array(x_data)
+        self.y = np.array(y_data)
+```
+
+
+### 2. Commissioning
+
+The preceding implementation can be directly used in the PipeStep in Sedna or independently invoked. The code for independently invoking is as follows:
+
+```python
+import os
+import unittest
+
+
+def _load_txt_dataset(dataset_url):
+    # use original dataset url,
+    # see https://github.com/kubeedge/sedna/issues/35
+    return os.path.abspath(dataset_url)
+
+
+class TestDataset(unittest.TestCase):
+
+    def test_txtdata(self):
+        train_data = TxtDataParse(data_type="train", func=_load_txt_dataset)
+        train_data.parse(train_dataset_url, use_raw=True)
+        self.assertEqual(len(train_data), 1)
+
+
+if __name__ == "__main__":
+    unittest.main()
+```
diff --git a/docs/contributing/lib/development.md b/docs/contributing/lib/development.md
@@ -0,0 +1,11 @@
+# Development Guide
+
+This document is intended to provide contributors with an introduction to developing a runnable algorithm module of the Sedna project. 
+
+The Sedna framework components are decoupled and the registration mechanism is used to combine functional components to facilitate function and algorithm expansion. For details about the Sedna architecture and main mechanisms, see [Lib README](/lib/sedna/README.md).
+
+During Sedna application development, the first problem encountered is how to import service data sets to Sedna. For details, see [Datasets Guide](./datasets.md).
+
+For different algorithms, see [Algorithm Development Guide](./new_algorithm.md). You can add new algorithms to Sedna step by step based on the examples provided in this document.
+
+Before develop a module, follow [lib API Reference](https://sedna.readthedocs.io/en/latest/autoapi/lib/sedna/index.html) to learn about the interface design of sedna.
diff --git a/docs/contributing/lib/new_algorithm.md b/docs/contributing/lib/new_algorithm.md
@@ -0,0 +1,48 @@
+# Algorithm Development Guide
+
+New algorithms, such as `hard example mining` in `incremental_learning` and `joint_inference`, `aggreagtion` in `federated_learning`, `multiple task learning` and `unseen task detect` in `lifelong learning`, need to be extended based on the basic classes provided by Sedna.
+## 1. Add an hard example mining algorithm
+
+The algorithm named `Threshold` is used as an example to describe how to add an HEM algorithm to the Sedna hard example mining algorithm library.
+
+### 1.1 Starting from the `class_factory.py`
+
+First, let's start from the `class_factory.py`. Two classes are defined in `class_factory.py`, namely `ClassType` and `ClassFactory`.
+
+`ClassFactory` can register the modules you want to reuse through decorators. For the new `ClassType.HEM` algorithm, the code is as follows:
+
+```python
+
+@ClassFactory.register(ClassType.HEM, alias="Threshold")
+class ThresholdFilter(BaseFilter, abc.ABC):
+    def __init__(self, threshold=0.5, **kwargs):
+        self.threshold = float(threshold)
+
+    def __call__(self, infer_result=None):
+        # if invalid input, return False
+        if not (infer_result
+                and all(map(lambda x: len(x) > 4, infer_result))):
+            return False
+
+        image_score = 0
+
+        for bbox in infer_result:
+            image_score += bbox[4]
+
+        average_score = image_score / (len(infer_result) or 1)
+        return average_score < self.threshold
+
+```
+
+## 2. Configuring in the CRD yaml
+
+After registration, you only need to change the name of the hem and parameters in the yaml file, and then the corresponding class will be automatically called according to the name.
+
+```yaml
+deploySpec:
+    hardExampleMining:
+      name: "Threshold"
+      parameters:
+        - key: "threshold"
+          value: "0.9"
+```
diff --git a/docs/contributing/prepare-environment.md b/docs/contributing/prepare-environment.md
@@ -64,7 +64,7 @@ Please follow [the kubeedge instructions][kubeedge] to install KubeEdge.
 Once you've set up the prerequisites, continue with:
 - See [control plane development guide]
 for more details about how to build & test Sedna.
-- See [lib development guide TBD] for more details about how to develop AI algorithms and worker images based on [sedna lib code](/lib).
+- See [lib development guide] for more details about how to develop AI algorithms and worker images based on [sedna lib code](/lib).
 
 [git]: https://git-scm.com/
 [framework]: /docs/proposals/architecture.md#architecture
@@ -76,5 +76,5 @@ for more details about how to build & test Sedna.
 [kind]: https://kind.sigs.k8s.io
 [kubeedge]: https://kubeedge.io/en/docs/
 [kubeedge-k8s-compatibility]: https://github.com/kubeedge/kubeedge#kubernetes-compatibility
-
+[lib development guide]: ./lib/development.md 
 [control plane development guide]: ./control-plane/development.md
diff --git a/docs/index.rst b/docs/index.rst
@@ -58,13 +58,15 @@ Sedna can simply enable edge-cloud synergy capabilities to existing training and
     :titlesonly:
     :glob:
 
-    api/crd/*
     api/lib/*
 
 .. toctree::
+    :maxdepth: 1
     :caption: Contributing
+    :titlesonly:
+    :glob:
 
-    Prepare <contributing/prepare-environment>
+    Control Plane <contributing/prepare-environment>
 
 
 .. toctree::