Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lifelong learning: issue-driven interface-adjustment and bug fix #142

Merged
merged 2 commits into from
Aug 12, 2021

Conversation

JoeyHwong-gk
Copy link
Contributor

- fix file_ops method
- fix kb save bug

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
@kubeedge-bot kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 31, 2021
@JoeyHwong-gk
Copy link
Contributor Author

/assign @llhuii @jaypume

count = 0

num = len(objects)
Copy link
Contributor

@luosiqi luosiqi Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names of "num" and "count" seem to be not explicit enough. There is suggestion that "num" should be replaced by "obj_num". It also confuses me what variable "count" means as it is slightly difficult for me to understand its effect at the first sight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_ops is not an interface that needs to be perceived by developers, this function is using for upload files to obs.
num refers to the files/directories that need to be uploaded in the current directory

Copy link

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far for this version, module and comment missing can be found. It is highly suggested to fix these comments before a formal merge.

@@ -36,130 +36,126 @@ class MulTaskLearning:

def __init__(self,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module of task allocation is missing, which serves as the predictor of task definition. Besides, the task mining is to reveal the task relation in the inference, e.g., the predictor of task relationship discovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous version, we merged the two parts together, which will be subdivided in the follow-up version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to learn that. Have created an issue #151 for following up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx

try:
raw_dict = json.loads(param_str, encoding="utf-8")
except json.JSONDecodeError:
raw_dict = {}
return raw_dict

def task_definition(self, samples):
def _task_definition(self, samples):
"""
Task attribute extractor and multi-task definition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input and output explainations are also needed in the comment, so that method selection or parameterer selection could be conducted properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but not reflected in this PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that be tackled in the future? It would be great if we have related issues in the community.

self.task_index_url = Context.get_parameters(
"MODEL_URLS", '/tmp/index.pkl'
)
self.task_index_url = KBResourceConstant.KB_INDEX_NAME.value
self.min_train_sample = int(Context.get_parameters(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difficult to understand the hyper-parameters without any comments here. That is, what does these parameters mean and which components they will affect, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, comments are needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we do not regard the knowledge base as a public interface. Of course, it should also be added in the PR of document

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this comment is talking about the "Context.get_parameters", not the knowledge base.

Good to learn that it will be followed up. Also, it would be great if we have related issues in the community.

# See the License for the specific language governing permissions and
# limitations under the License.

"""Unseen Task detect Algorithms for Lifelong Learning"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to name the module with noun, e.g., unseen task detection algorithms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx.

@@ -34,40 +33,39 @@ class LifelongLearning(JobBase):

def __init__(self,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a lot contents in the init function (100+ loc), without any comments, making it hard to read. It would be nice to add comments time-to-time in the middle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will that be tackled in the future? It would be great if we have related issues in the community.

return res
_index_path = FileOps.join_path(self.save_dir, self.kb_index)
FileOps.dump(task_info, _index_path)
return f"/file/download?files={self.kb_index}&name={self.kb_index}"

def update(self, task: UploadFile = File(...)):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update function is important in the knowledge base. Comments are needed to state what is done in this function and what are its input and output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we will complete in the PR of document supplement

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear that the comment will be followed up. It would be great if we have related issues in the community.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow #85 and #150 to facilitate the docs supplement for lifelonglearning and sedna sdk

Copy link
Contributor

@luosiqi luosiqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my each single comment among the codes.

@kubeedge-bot
Copy link
Collaborator

@luosiqi: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

- Reduce parameters for initial
- show all interfaces of lifelong learning in example
- fix bugs from deep learning framework

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
@ugvddm
Copy link

ugvddm commented Aug 10, 2021

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2021
@llhuii
Copy link

llhuii commented Aug 12, 2021

/lgtm

I'm going to merge this.

@llhuii
Copy link

llhuii commented Aug 12, 2021

/approve

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: llhuii, luosiqi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2021
@kubeedge-bot kubeedge-bot merged commit 626a892 into kubeedge:main Aug 12, 2021
@JoeyHwong-gk JoeyHwong-gk deleted the lls3 branch August 13, 2021 03:09
vcozzolino added a commit to vcozzolino/sedna that referenced this pull request Dec 3, 2021
Upgrade to v0.4.0

Created-by: Vittorio Cozzolino 00609018
Author-id: 553076
MR-id: 12361350
Commit-by: Vittorio Cozzolino;KubeEdge Bot;JoeyHwong;JimmyYang20;ShiXiaohou;DanLiu;HenryChou;Yutong Wang;EnfangCui;llhuii;XinYao1994;Jie Pu;wei.ji
Merged-by: Vittorio Cozzolino 00609018
E2E-issues: 
Description:
fix incremental_learning bug
- Add docs and code comment
- fix bugs: epoch always be 1, inference result not saved, s3 upload
  fail

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Interface Improvement:
   1. The algorithm of HardExampleMining should be seleted by the developer.

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
install.sh: fix LC_BIND_PORT bug

rename the variable LC_BIND_PORT to SENDA_LC_BIND_PORT.

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#121 from llhuii/fix-install-script-bug

install.sh: fix LC_BIND_PORT bug,
Update interface.py

fix env missing bug

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix bug: aggregation of weights should occur in the AggServer

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix bug: Cloud worker not exiting

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
gm: refactor all features into independent dir

All controllers are placed into globalmanager/controllers:
1. each feature has the independent subdirectory
2. upstream/downstream are kept as top level.

Commom types/utils/worker.go are placed into globalmanager/runtime.

Signed-off-by: llhuii <liulinghui@huawei.com>,
fix PR comment
- clean useless code
- catch server exception in threads

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
gm: refactor upstream controller

Split upstream controller, merge each feature CR logic code into its
controller.

Signed-off-by: llhuii <liulinghui@huawei.com>,
gm: share client/Informer with all controllers

Make all controllers sharing with:
1. kubernetes client, and informerFactory with random resync period.
2. sedna crd client, and informerFactory with random resync period.

This can reduce code and improve slim performance.

Signed-off-by: llhuii <liulinghui@huawei.com>,
gm: add dataset controller

Only handle dataset update from edge.

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#106 from JoeyHwong-gk/federated

fix federated learning bugs,
gm: split all upstream logic into separate file

Signed-off-by: llhuii <liulinghui@huawei.com>,
gm: split all downstream logic into separate file

Since all CR watch actions are placed into corresponding controller,
controllers/downstream.go is unnecessary.

Signed-off-by: llhuii <liulinghui@huawei.com>,
LC: fix nil pointer dereference bug

It happened in evalTask of incremental job when deploy model hasn't
been synced to LC. evalTask should return error instead of logging
error. And it doesn't need job id info into error, same as trainTask.

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#139 from JimmyYang20/fixbug

LC: fix nil pointer dereference bug,
LC: send dataset update to GM only when changed

number of samples has been sent to GM only when adding new data.

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#137 from JimmyYang20/main

LC: send dataset update to GM only when changed,
lifelong learning s3 support
- fix file_ops method
- fix kb save bug

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Add object search and tracking docs to docs/proposals/

Add object search and tracking crd samples to build/crd-samples/sedna/

Add object search and tracking type.go files to pkg/apis/sedna/v1alpha1/

Signed-off-by: EnfangCui <17111008@bjtu.edu.cn>,
Merge pull request kubeedge#100 from EnfangCui/add-multi-edge-inference-PR

Add object search and tracking proposals,
gm: more code clean after initial refactor done

1. remove the feature redundant name in all feature controllers(e.g.
'federatedlearningJob' to 'job'), since it has already own independent
package, no need the feature extra name
2. upstream interface optimizaztion
3. fix empty Kind of all CR in downstream
4. add extra doc string
5. fix code style

Signed-off-by: llhuii <liulinghui@huawei.com>,
Fix the problem that kbimage cannot be compiled in Makefile

Signed-off-by: wei.ji <wei.ji@easystack.cn>,
improve lifelong learning docs

1. improve the atc example words
2. fix the broken links in lifelong proposal

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#146 from Jw-Jm/main

Fix the problem that kbimage cannot be compiled in Makefile,
make the hard_example_mining alg to be a common interface

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#134 from llhuii/refactor-gm

gm: decouple all features into independent package,
Merge pull request kubeedge#107 from JoeyHwong-gk/incremental

[incremental learning] example:keep all  results whether is hardExample or not, fixed the issue of using s3 to save model,
Merge pull request kubeedge#143 from JoeyHwong-gk/lldoc

improve lifelong learning docs,
fix example bug: save result which get from cloud if is hard example

fix message when http connect fail

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix pr comment
- make the hard_example_mining alg to be a common interface
- fix get_hem_from_config: raise exception when value is unexpected

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
lc: decouple all features into independent package

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#117 from JoeyHwong-gk/joint

joint_inference: bug fix and interface reconstruction,
Merge pull request kubeedge#149 from JimmyYang20/refector-lc

lc: decouple all features into independent package,
fix lifelong issue
- Reduce parameters for initial
- show all interfaces of lifelong learning in example
- fix bugs from deep learning framework

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix il doc

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#153 from JimmyYang20/fix-doc

Fix rendering issue of example doc in readthedocs,
Merge pull request kubeedge#142 from JoeyHwong-gk/lls3

lifelong learning: issue-driven interface-adjustment and bug fix,
fix the lifelong example problem from backend and constant

- fix sklearn backend: support args in train/eval/infer
- fix lifelong constant

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Automatic push images when publishing a release

A github action is added for pushing image when a new release is created:
1. login docker hub.
2. checkout the project, and run `make push-all`.

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#154 from JoeyHwong-gk/lifelong

[Lifelong example]: fix the problem from backend and constant,
docs: update install guide

1. add GM/LC links
2. add GM/LC deploy form

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#155 from llhuii/add-image-push-gh-action

Push images automatically when a new release is created,
Merge pull request kubeedge#156 from llhuii/update-install-doc

docs: update install guide,
IL: LC supports to recover job when restart

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#152 from JimmyYang20/fixbug

IL: LC supports to recover job when restart,
Fix IMAGE_REPO in github image-publish action

Using the env 'GITHUB_REPOSITORY' instead of 'GITHUB_ACTOR' to get the right
image repo name i.e. `IMAGE_REPO` in Makefile.

Signed-off-by: llhuii <liulinghui@huawei.com>,
add lib doc

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Improve the docs

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix syntax and information in the docs
Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
update lib doc

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Update s3 example docs of IL&JI

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Support websocket reconnection

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Lib support hot model update

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Adjusting the Log of IncrementalLearning example

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Fix codegen verify checker

Note the codegen verify checker should report error

Signed-off-by: llhuii <liulinghui@huawei.com>,
Add the missing gencode for objectsearch/tracking

Signed-off-by: llhuii <liulinghui@huawei.com>,
fix job_kind value in LC_report

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Update dependency of server request in lib
- replace `retry==1.3.3` with `tenacity==8.0.1` because of `retry` no longer maintained.

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#158 from llhuii/fix-imagerepo-of-image-publish-action

Fix IMAGE_REPO in github image-publish action,
Merge pull request kubeedge#164 from JoeyHwong-gk/federated

Support websocket reconnection when the server status is abnormal,
Merge pull request kubeedge#166 from llhuii/fix-verify-checker

Fix codegen verify checker,
Add contributing docs

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#159 from JoeyHwong-gk/libdoc

update lib doc,
Merge pull request kubeedge#148 from llhuii/add-contributing-docs

Add contributing docs,
Merge pull request kubeedge#160 from JimmyYang20/doc-s3

Update s3 example docs of IL&JI,
fix access exceptions when rendering with sphinx

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#150 from JoeyHwong-gk/docs

docs improvement,
GM&LC: IL supports model hot updates

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
fix pr comment

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#138 from JimmyYang20/modelhotupdate

GM&LC: IL supports model hot updates,
Fix s3 example docs of IL&JI

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#174 from JimmyYang20/doc-s3

Fix s3 example docs of IL&JI,
Merge pull request kubeedge#157 from JoeyHwong-gk/hot_model

[Lib Support] hot model update,
Upgrade gorilla/websocket from v1.4.0 to v1.4.2

This upgrade fixes a potential DoS vector bug in gorilla/websocket 1.4.0,
see GHSA-jf24-p9p9-4rjh

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#182 from llhuii/upgrade-websocket

Upgrade gorilla/websocket from v1.4.0 to v1.4.2,
fix lib/requirements
- This upgrade fixes a CSRF error in FastAPI version earlier than 0.65.2,
see GHSA-8h2j-cgx8-6xv7

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#183

See merge request butterfly/sedna!9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants