Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement Request] Integrate Plato into Sedna as a backend for supporting federated learning - Phase one #116

Merged
merged 2 commits into from
Sep 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions build/crd-samples/sedna/federatedlearningjob_yolo_v1alpha1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
apiVersion: sedna.io/v1alpha1
kind: FederatedLearningJob
metadata:
name: yolo-v5
spec:
pretrainedModel: # option
name: "yolo-v5-pretrained-model"
transmitter: # option
ws: { } # option, by default
s3: # option, but at least one
aggDataPath: "s3://sedna/fl/aggregation_data"
credentialName: mysecret
aggregationWorker:
model:
name: "yolo-v5-model"
template:
spec:
nodeName: "sedna-control-plane"
containers:
- image: kubeedge/sedna-example-federated-learning-mistnet-yolo-aggregator:v0.4.0
name: agg-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "cut_layer"
value: "4"
- name: "epsilon"
value: "100"
- name: "aggregation_algorithm"
value: "mistnet"
- name: "batch_size"
value: "32"
resources: # user defined resources
limits:
memory: 8Gi
trainingWorkers:
- dataset:
name: "coco-dataset-1"
template:
spec:
nodeName: "edge-node"
containers:
- image: kubeedge/sedna-example-federated-learning-mistnet-yolo-client:v0.4.0
name: train-worker
imagePullPolicy: IfNotPresent
args: [ "-i", "1" ]
env: # user defined environments
- name: "cut_layer"
value: "4"
- name: "epsilon"
value: "100"
- name: "aggregation_algorithm"
value: "mistnet"
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "1"
resources: # user defined resources
limits:
memory: 2Gi
- dataset:
name: "coco-dataset-2"
template:
spec:
nodeName: "edge-node"
containers:
- image: kubeedge/sedna-example-federated-learning-mistnet-yolo-client:v0.4.0
name: train-worker
imagePullPolicy: IfNotPresent
args: [ "-i", "2" ]
env: # user defined environments
- name: "cut_layer"
value: "4"
- name: "epsilon"
value: "100"
- name: "aggregation_algorithm"
value: "mistnet"
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "1"
resources: # user defined resources
limits:
memory: 2Gi
4 changes: 3 additions & 1 deletion examples/build_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
cd "$(dirname "${BASH_SOURCE[0]}")"

IMAGE_REPO=${IMAGE_REPO:-kubeedge}
IMAGE_TAG=${IMAGE_TAG:-v0.3.0}
IMAGE_TAG=${IMAGE_TAG:-v0.4.0}

EXAMPLE_REPO_PREFIX=${IMAGE_REPO}/sedna-example-

dockerfiles=(
federated-learning-mistnet-yolo-aggregator.Dockerfile
federated-learning-mistnet-yolo-client.Dockerfile
federated-learning-surface-defect-detection-aggregation.Dockerfile
federated-learning-surface-defect-detection-train.Dockerfile
incremental-learning-helmet-detection.Dockerfile
Expand Down
23 changes: 23 additions & 0 deletions examples/federated-learning-mistnet-yolo-aggregator.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM tensorflow/tensorflow:1.15.4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be more appropriate to use a PyTorch or python-slim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start from Python-slim needs to install extra libs, which may lead to larger image size.

RUN apt update \
&& apt install -y libgl1-mesa-glx git

COPY ./lib/requirements.txt /home

RUN python -m pip install --upgrade pip

RUN pip install -r /home/requirements.txt

ENV PYTHONPATH "/home/lib:/home/plato:/home/plato/packages/yolov5"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more advised to use plato as an installation package.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more advised to use plato as an installation package.

+1. plato as a python package instead of a submodule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the large package issue, we currently add Plato as third-party lib.

COPY ./lib /home/lib
RUN git clone https://github.com/TL-System/plato.git /home/plato
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: git clone --depth 1 can reduce the code size, https://stackoverflow.com/a/1210012

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


RUN pip install -r /home/plato/requirements.txt
RUN pip install -r /home/plato/packages/yolov5/requirements.txt

WORKDIR /home/work
COPY examples/federated_learning/yolov5_coco128_mistnet /home/work/

CMD ["/bin/sh", "-c", "ulimit -n 50000; python aggregate.py"]
23 changes: 23 additions & 0 deletions examples/federated-learning-mistnet-yolo-client.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM tensorflow/tensorflow:1.15.4

RUN apt update \
&& apt install -y libgl1-mesa-glx git

COPY ./lib/requirements.txt /home

RUN python -m pip install --upgrade pip

RUN pip install -r /home/requirements.txt

ENV PYTHONPATH "/home/lib:/home/plato:/home/plato/packages/yolov5"

COPY ./lib /home/lib
RUN git clone https://github.com/TL-System/plato.git /home/plato

RUN pip install -r /home/plato/requirements.txt
RUN pip install -r /home/plato/packages/yolov5/requirements.txt

WORKDIR /home/work
COPY examples/federated_learning/yolov5_coco128_mistnet /home/work/

ENTRYPOINT ["python", "train.py"]
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import numpy as np
Expand Down Expand Up @@ -74,6 +73,7 @@ def main():
learning_rate=learning_rate,
validation_split=validation_split
)

return train_jobs


Expand Down
Loading