-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lifelong learning: issue-driven interface-adjustment and bug fix #142
Conversation
JoeyHwong-gk
commented
Jul 31, 2021
- S3 storage support
- Reduce parameters for initial
- work with lifelong learning enhancements tracking issue lifelong learning enhancements tracking issue #85
- fix file_ops method - fix kb save bug Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
700bfd5
to
f5b0779
Compare
2788f39
to
3d8a9d4
Compare
count = 0 | ||
|
||
num = len(objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable names of "num" and "count" seem to be not explicit enough. There is suggestion that "num" should be replaced by "obj_num". It also confuses me what variable "count" means as it is slightly difficult for me to understand its effect at the first sight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_ops is not an interface that needs to be perceived by developers, this function is using for upload files to obs.
num
refers to the files/directories that need to be uploaded in the current directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far for this version, module and comment missing can be found. It is highly suggested to fix these comments before a formal merge.
@@ -36,130 +36,126 @@ class MulTaskLearning: | |||
|
|||
def __init__(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module of task allocation is missing, which serves as the predictor of task definition. Besides, the task mining is to reveal the task relation in the inference, e.g., the predictor of task relationship discovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version, we merged the two parts together, which will be subdivided in the follow-up version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to learn that. Have created an issue #151 for following up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
try: | ||
raw_dict = json.loads(param_str, encoding="utf-8") | ||
except json.JSONDecodeError: | ||
raw_dict = {} | ||
return raw_dict | ||
|
||
def task_definition(self, samples): | ||
def _task_definition(self, samples): | ||
""" | ||
Task attribute extractor and multi-task definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input and output explainations are also needed in the comment, so that method selection or parameterer selection could be conducted properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but not reflected in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will that be tackled in the future? It would be great if we have related issues in the community.
self.task_index_url = Context.get_parameters( | ||
"MODEL_URLS", '/tmp/index.pkl' | ||
) | ||
self.task_index_url = KBResourceConstant.KB_INDEX_NAME.value | ||
self.min_train_sample = int(Context.get_parameters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difficult to understand the hyper-parameters without any comments here. That is, what does these parameters mean and which components they will affect, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, comments are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we do not regard the knowledge base as a public interface. Of course, it should also be added in the PR of document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this comment is talking about the "Context.get_parameters", not the knowledge base.
Good to learn that it will be followed up. Also, it would be great if we have related issues in the community.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Unseen Task detect Algorithms for Lifelong Learning""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to name the module with noun, e.g., unseen task detection algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx.
@@ -34,40 +33,39 @@ class LifelongLearning(JobBase): | |||
|
|||
def __init__(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a lot contents in the init function (100+ loc), without any comments, making it hard to read. It would be nice to add comments time-to-time in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will that be tackled in the future? It would be great if we have related issues in the community.
return res | ||
_index_path = FileOps.join_path(self.save_dir, self.kb_index) | ||
FileOps.dump(task_info, _index_path) | ||
return f"/file/download?files={self.kb_index}&name={self.kb_index}" | ||
|
||
def update(self, task: UploadFile = File(...)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update function is important in the knowledge base. Comments are needed to state what is done in this function and what are its input and output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, we will complete in the PR of document supplement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to hear that the comment will be followed up. It would be great if we have related issues in the community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my each single comment among the codes.
@luosiqi: changing LGTM is restricted to collaborators In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- Reduce parameters for initial - show all interfaces of lifelong learning in example - fix bugs from deep learning framework Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
/lgtm |
I'm going to merge this. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: llhuii, luosiqi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Upgrade to v0.4.0 Created-by: Vittorio Cozzolino 00609018 Author-id: 553076 MR-id: 12361350 Commit-by: Vittorio Cozzolino;KubeEdge Bot;JoeyHwong;JimmyYang20;ShiXiaohou;DanLiu;HenryChou;Yutong Wang;EnfangCui;llhuii;XinYao1994;Jie Pu;wei.ji Merged-by: Vittorio Cozzolino 00609018 E2E-issues: Description: fix incremental_learning bug - Add docs and code comment - fix bugs: epoch always be 1, inference result not saved, s3 upload fail Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Interface Improvement: 1. The algorithm of HardExampleMining should be seleted by the developer. Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, install.sh: fix LC_BIND_PORT bug rename the variable LC_BIND_PORT to SENDA_LC_BIND_PORT. Signed-off-by: llhuii <liulinghui@huawei.com>, Merge pull request kubeedge#121 from llhuii/fix-install-script-bug install.sh: fix LC_BIND_PORT bug, Update interface.py fix env missing bug Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, fix bug: aggregation of weights should occur in the AggServer Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, fix bug: Cloud worker not exiting Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, gm: refactor all features into independent dir All controllers are placed into globalmanager/controllers: 1. each feature has the independent subdirectory 2. upstream/downstream are kept as top level. Commom types/utils/worker.go are placed into globalmanager/runtime. Signed-off-by: llhuii <liulinghui@huawei.com>, fix PR comment - clean useless code - catch server exception in threads Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, gm: refactor upstream controller Split upstream controller, merge each feature CR logic code into its controller. Signed-off-by: llhuii <liulinghui@huawei.com>, gm: share client/Informer with all controllers Make all controllers sharing with: 1. kubernetes client, and informerFactory with random resync period. 2. sedna crd client, and informerFactory with random resync period. This can reduce code and improve slim performance. Signed-off-by: llhuii <liulinghui@huawei.com>, gm: add dataset controller Only handle dataset update from edge. Signed-off-by: llhuii <liulinghui@huawei.com>, Merge pull request kubeedge#106 from JoeyHwong-gk/federated fix federated learning bugs, gm: split all upstream logic into separate file Signed-off-by: llhuii <liulinghui@huawei.com>, gm: split all downstream logic into separate file Since all CR watch actions are placed into corresponding controller, controllers/downstream.go is unnecessary. Signed-off-by: llhuii <liulinghui@huawei.com>, LC: fix nil pointer dereference bug It happened in evalTask of incremental job when deploy model hasn't been synced to LC. evalTask should return error instead of logging error. And it doesn't need job id info into error, same as trainTask. Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Merge pull request kubeedge#139 from JimmyYang20/fixbug LC: fix nil pointer dereference bug, LC: send dataset update to GM only when changed number of samples has been sent to GM only when adding new data. Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Merge pull request kubeedge#137 from JimmyYang20/main LC: send dataset update to GM only when changed, lifelong learning s3 support - fix file_ops method - fix kb save bug Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Add object search and tracking docs to docs/proposals/ Add object search and tracking crd samples to build/crd-samples/sedna/ Add object search and tracking type.go files to pkg/apis/sedna/v1alpha1/ Signed-off-by: EnfangCui <17111008@bjtu.edu.cn>, Merge pull request kubeedge#100 from EnfangCui/add-multi-edge-inference-PR Add object search and tracking proposals, gm: more code clean after initial refactor done 1. remove the feature redundant name in all feature controllers(e.g. 'federatedlearningJob' to 'job'), since it has already own independent package, no need the feature extra name 2. upstream interface optimizaztion 3. fix empty Kind of all CR in downstream 4. add extra doc string 5. fix code style Signed-off-by: llhuii <liulinghui@huawei.com>, Fix the problem that kbimage cannot be compiled in Makefile Signed-off-by: wei.ji <wei.ji@easystack.cn>, improve lifelong learning docs 1. improve the atc example words 2. fix the broken links in lifelong proposal Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Merge pull request kubeedge#146 from Jw-Jm/main Fix the problem that kbimage cannot be compiled in Makefile, make the hard_example_mining alg to be a common interface Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Merge pull request kubeedge#134 from llhuii/refactor-gm gm: decouple all features into independent package, Merge pull request kubeedge#107 from JoeyHwong-gk/incremental [incremental learning] example:keep all results whether is hardExample or not, fixed the issue of using s3 to save model, Merge pull request kubeedge#143 from JoeyHwong-gk/lldoc improve lifelong learning docs, fix example bug: save result which get from cloud if is hard example fix message when http connect fail Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, fix pr comment - make the hard_example_mining alg to be a common interface - fix get_hem_from_config: raise exception when value is unexpected Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, lc: decouple all features into independent package Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Merge pull request kubeedge#117 from JoeyHwong-gk/joint joint_inference: bug fix and interface reconstruction, Merge pull request kubeedge#149 from JimmyYang20/refector-lc lc: decouple all features into independent package, fix lifelong issue - Reduce parameters for initial - show all interfaces of lifelong learning in example - fix bugs from deep learning framework Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, fix il doc Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Merge pull request kubeedge#153 from JimmyYang20/fix-doc Fix rendering issue of example doc in readthedocs, Merge pull request kubeedge#142 from JoeyHwong-gk/lls3 lifelong learning: issue-driven interface-adjustment and bug fix, fix the lifelong example problem from backend and constant - fix sklearn backend: support args in train/eval/infer - fix lifelong constant Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Automatic push images when publishing a release A github action is added for pushing image when a new release is created: 1. login docker hub. 2. checkout the project, and run `make push-all`. Signed-off-by: llhuii <liulinghui@huawei.com>, Merge pull request kubeedge#154 from JoeyHwong-gk/lifelong [Lifelong example]: fix the problem from backend and constant, docs: update install guide 1. add GM/LC links 2. add GM/LC deploy form Signed-off-by: llhuii <liulinghui@huawei.com>, Merge pull request kubeedge#155 from llhuii/add-image-push-gh-action Push images automatically when a new release is created, Merge pull request kubeedge#156 from llhuii/update-install-doc docs: update install guide, IL: LC supports to recover job when restart Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Merge pull request kubeedge#152 from JimmyYang20/fixbug IL: LC supports to recover job when restart, Fix IMAGE_REPO in github image-publish action Using the env 'GITHUB_REPOSITORY' instead of 'GITHUB_ACTOR' to get the right image repo name i.e. `IMAGE_REPO` in Makefile. Signed-off-by: llhuii <liulinghui@huawei.com>, add lib doc Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Improve the docs Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, fix syntax and information in the docs Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, update lib doc Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Update s3 example docs of IL&JI Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Support websocket reconnection Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Lib support hot model update Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Adjusting the Log of IncrementalLearning example Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Fix codegen verify checker Note the codegen verify checker should report error Signed-off-by: llhuii <liulinghui@huawei.com>, Add the missing gencode for objectsearch/tracking Signed-off-by: llhuii <liulinghui@huawei.com>, fix job_kind value in LC_report Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Update dependency of server request in lib - replace `retry==1.3.3` with `tenacity==8.0.1` because of `retry` no longer maintained. Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Merge pull request kubeedge#158 from llhuii/fix-imagerepo-of-image-publish-action Fix IMAGE_REPO in github image-publish action, Merge pull request kubeedge#164 from JoeyHwong-gk/federated Support websocket reconnection when the server status is abnormal, Merge pull request kubeedge#166 from llhuii/fix-verify-checker Fix codegen verify checker, Add contributing docs Signed-off-by: llhuii <liulinghui@huawei.com>, Merge pull request kubeedge#159 from JoeyHwong-gk/libdoc update lib doc, Merge pull request kubeedge#148 from llhuii/add-contributing-docs Add contributing docs, Merge pull request kubeedge#160 from JimmyYang20/doc-s3 Update s3 example docs of IL&JI, fix access exceptions when rendering with sphinx Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Merge pull request kubeedge#150 from JoeyHwong-gk/docs docs improvement, GM&LC: IL supports model hot updates Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, fix pr comment Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Merge pull request kubeedge#138 from JimmyYang20/modelhotupdate GM&LC: IL supports model hot updates, Fix s3 example docs of IL&JI Signed-off-by: JimmyYang20 <yangjin39@huawei.com>, Merge pull request kubeedge#174 from JimmyYang20/doc-s3 Fix s3 example docs of IL&JI, Merge pull request kubeedge#157 from JoeyHwong-gk/hot_model [Lib Support] hot model update, Upgrade gorilla/websocket from v1.4.0 to v1.4.2 This upgrade fixes a potential DoS vector bug in gorilla/websocket 1.4.0, see GHSA-jf24-p9p9-4rjh Signed-off-by: llhuii <liulinghui@huawei.com>, Merge pull request kubeedge#182 from llhuii/upgrade-websocket Upgrade gorilla/websocket from v1.4.0 to v1.4.2, fix lib/requirements - This upgrade fixes a CSRF error in FastAPI version earlier than 0.65.2, see GHSA-8h2j-cgx8-6xv7 Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>, Merge pull request kubeedge#183 See merge request butterfly/sedna!9