Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[incremental learning] example:keep all results whether is hardExample or not, fixed the issue of using s3 to save model #107

Merged
merged 4 commits into from
Aug 4, 2021

Conversation

JoeyHwong-gk
Copy link
Contributor

@JoeyHwong-gk JoeyHwong-gk commented Jun 17, 2021

  • Add docs and code comment
  • fix bugs: epoch always be 1, inference result not saved, tf s3 upload fail

Signed-off-by: JoeyHwong joeyhwong@gknow.cn

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 17, 2021
whose hard coefficient is less than
threshold_cross_entropy. And its default value is
threshold_cross_entropy=0.5
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0.5 the optimal value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on that?

"""

def __init__(self, threshold_img=0.5, threshold_box=0.5, **kwargs):
self.threshold_box = float(threshold_box)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question above. suggest to give optimal value .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on that?


os.environ["AWS_ACCESS_KEY_ID"] = os.getenv("ACCESS_KEY_ID", "")
os.environ["AWS_SECRET_ACCESS_KEY"] = os.getenv("SECRET_ACCESS_KEY", "")
os.environ["S3_ENDPOINT"] = url.netloc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't use ACCESS_KEY_ID and ACCESS_KEY_ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the specified key in the framework(tensorflow)

@@ -28,6 +28,7 @@

os.environ['BACKEND_TYPE'] = 'TENSORFLOW'
LOG = logging.getLogger(__name__)
Copy link

@JimmyYang20 JimmyYang20 Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant TENSORFLOW in constant.py, e.g., constant.backend_type.TENSORFLOW='TENSORFLOW'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do developers need to know constants in lib?

# See the License for the specific language governing permissions and
# limitations under the License.

"""Hard Example Mining Algorithms"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to delete this annotation, this doesn't meet the annotation specification


from sedna.common.class_factory import ClassFactory, ClassType

__all__ = ('ThresholdFilter', 'CrossEntropyFilter', 'IBTFilter')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these constants put in constant.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's standard usage in python, used to restrict the import of modules.

@jaypume jaypume self-requested a review June 22, 2021 02:34
@kubeedge-bot kubeedge-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 23, 2021
@JoeyHwong-gk JoeyHwong-gk requested a review from JimmyYang20 June 28, 2021 00:59
@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 30, 2021
@@ -30,6 +31,19 @@
Session = tf.Session


_url = os.getenv("S3_ENDPOINT_URL", "http://s3.amazonaws.com")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. please put these s3 codes into a function instead of top level
  2. why only does tensorflow have this config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubeedge-bot kubeedge-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 30, 2021
@JoeyHwong-gk JoeyHwong-gk changed the title fix incremental_learning bug Code Maintenance:incremental_learning Jul 7, 2021
- Add docs and code comment
- fix bugs: epoch always be 1, inference result not saved, s3 upload
  fail

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
   1. The algorithm of HardExampleMining should be seleted by the developer.

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>

os.environ["AWS_ACCESS_KEY_ID"] = os.getenv("ACCESS_KEY_ID", "")
os.environ["AWS_SECRET_ACCESS_KEY"] = os.getenv("SECRET_ACCESS_KEY", "")
os.environ["S3_ENDPOINT"] = s3_url.netloc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not env ACCESS_KEY_ID, os.getenv("ACCESS_KEY_ID") will be None,
os.environ["AWS_ACCESS_KEY_ID"] = None will report error
suggest os.environ.setdefault()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix env missing bug

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
@JoeyHwong-gk JoeyHwong-gk changed the title Code Maintenance:incremental_learning incremental learning: bug fix Jul 21, 2021
@JoeyHwong-gk JoeyHwong-gk changed the title incremental learning: bug fix [incremental learning] example:keep all results whether is hardExample or not, fixed the issue of using s3 to save model Jul 28, 2021
@@ -100,10 +105,29 @@ def deal_infer_rsl(model_output):
def run():
camera_address = Context.get_parameters('video_url')

hard_example_name = Context.get_parameters('HEM_NAME', "IBT")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a developer find the env 'HEM_NAME'? If this is a sedna system environment varibles, it should provide a function or class for the developer, instead of expose an environment name which the developer might find only in documents.

In sedna, there are two seperated parameter entries for deployer and developer, and the parameter should be subject to CRD or LIB.

for example:

# Select one of the next two lines (ignore the name of function or varibles): 
ibt=IL.get_hem_from_crd()  # subject to CRD
ibt=IL.IBT(0.9,0.9)        # subject to LIB

instance = IL(
    estimator=e,
    hem=ibt
)```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

Copy link
Member

@jaypume jaypume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an interface that needs to be modified

@JoeyHwong-gk JoeyHwong-gk force-pushed the incremental branch 2 times, most recently from 00b310a to 99b6cb3 Compare July 31, 2021 08:19
@JoeyHwong-gk JoeyHwong-gk requested a review from jaypume July 31, 2021 08:23
@JoeyHwong-gk JoeyHwong-gk requested a review from llhuii August 2, 2021 01:00
@JoeyHwong-gk
Copy link
Contributor Author

/assign @jaypume

}
# get hard exmaple mining algorithm from config e.g.:
# {"method": "IBT", "param": {"threshold_img": 0.9}}
hard_example_mining = IncrementalLearning.get_hem_algorithm(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to add a note here, it's redundant, how about to rename get_hem_algorthm to get _hem_algorithm_from_config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -291,3 +291,27 @@ def get_parameters(cls, param, default=None):
value = cls.parameters.get(
param) or cls.parameters.get(str(param).upper())
return value if value else default

@classmethod
def get_crd_algorithm(cls, algorithm, **param) -> dict:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crd is a very special term, only work in kubernetes, algorithm developer shouldn't understand it. rihgt? how about to rename it get_algorithm_from_api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>
@ugvddm
Copy link

ugvddm commented Aug 3, 2021

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 3, 2021
@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypume

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2021
@jaypume
Copy link
Member

jaypume commented Aug 4, 2021

/lgtm

@kubeedge-bot kubeedge-bot merged commit 16fb975 into kubeedge:main Aug 4, 2021
vcozzolino added a commit to vcozzolino/sedna that referenced this pull request Dec 3, 2021
Upgrade to v0.4.0

Created-by: Vittorio Cozzolino 00609018
Author-id: 553076
MR-id: 12361350
Commit-by: Vittorio Cozzolino;KubeEdge Bot;JoeyHwong;JimmyYang20;ShiXiaohou;DanLiu;HenryChou;Yutong Wang;EnfangCui;llhuii;XinYao1994;Jie Pu;wei.ji
Merged-by: Vittorio Cozzolino 00609018
E2E-issues: 
Description:
fix incremental_learning bug
- Add docs and code comment
- fix bugs: epoch always be 1, inference result not saved, s3 upload
  fail

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Interface Improvement:
   1. The algorithm of HardExampleMining should be seleted by the developer.

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
install.sh: fix LC_BIND_PORT bug

rename the variable LC_BIND_PORT to SENDA_LC_BIND_PORT.

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#121 from llhuii/fix-install-script-bug

install.sh: fix LC_BIND_PORT bug,
Update interface.py

fix env missing bug

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix bug: aggregation of weights should occur in the AggServer

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix bug: Cloud worker not exiting

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
gm: refactor all features into independent dir

All controllers are placed into globalmanager/controllers:
1. each feature has the independent subdirectory
2. upstream/downstream are kept as top level.

Commom types/utils/worker.go are placed into globalmanager/runtime.

Signed-off-by: llhuii <liulinghui@huawei.com>,
fix PR comment
- clean useless code
- catch server exception in threads

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
gm: refactor upstream controller

Split upstream controller, merge each feature CR logic code into its
controller.

Signed-off-by: llhuii <liulinghui@huawei.com>,
gm: share client/Informer with all controllers

Make all controllers sharing with:
1. kubernetes client, and informerFactory with random resync period.
2. sedna crd client, and informerFactory with random resync period.

This can reduce code and improve slim performance.

Signed-off-by: llhuii <liulinghui@huawei.com>,
gm: add dataset controller

Only handle dataset update from edge.

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#106 from JoeyHwong-gk/federated

fix federated learning bugs,
gm: split all upstream logic into separate file

Signed-off-by: llhuii <liulinghui@huawei.com>,
gm: split all downstream logic into separate file

Since all CR watch actions are placed into corresponding controller,
controllers/downstream.go is unnecessary.

Signed-off-by: llhuii <liulinghui@huawei.com>,
LC: fix nil pointer dereference bug

It happened in evalTask of incremental job when deploy model hasn't
been synced to LC. evalTask should return error instead of logging
error. And it doesn't need job id info into error, same as trainTask.

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#139 from JimmyYang20/fixbug

LC: fix nil pointer dereference bug,
LC: send dataset update to GM only when changed

number of samples has been sent to GM only when adding new data.

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#137 from JimmyYang20/main

LC: send dataset update to GM only when changed,
lifelong learning s3 support
- fix file_ops method
- fix kb save bug

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Add object search and tracking docs to docs/proposals/

Add object search and tracking crd samples to build/crd-samples/sedna/

Add object search and tracking type.go files to pkg/apis/sedna/v1alpha1/

Signed-off-by: EnfangCui <17111008@bjtu.edu.cn>,
Merge pull request kubeedge#100 from EnfangCui/add-multi-edge-inference-PR

Add object search and tracking proposals,
gm: more code clean after initial refactor done

1. remove the feature redundant name in all feature controllers(e.g.
'federatedlearningJob' to 'job'), since it has already own independent
package, no need the feature extra name
2. upstream interface optimizaztion
3. fix empty Kind of all CR in downstream
4. add extra doc string
5. fix code style

Signed-off-by: llhuii <liulinghui@huawei.com>,
Fix the problem that kbimage cannot be compiled in Makefile

Signed-off-by: wei.ji <wei.ji@easystack.cn>,
improve lifelong learning docs

1. improve the atc example words
2. fix the broken links in lifelong proposal

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#146 from Jw-Jm/main

Fix the problem that kbimage cannot be compiled in Makefile,
make the hard_example_mining alg to be a common interface

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#134 from llhuii/refactor-gm

gm: decouple all features into independent package,
Merge pull request kubeedge#107 from JoeyHwong-gk/incremental

[incremental learning] example:keep all  results whether is hardExample or not, fixed the issue of using s3 to save model,
Merge pull request kubeedge#143 from JoeyHwong-gk/lldoc

improve lifelong learning docs,
fix example bug: save result which get from cloud if is hard example

fix message when http connect fail

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix pr comment
- make the hard_example_mining alg to be a common interface
- fix get_hem_from_config: raise exception when value is unexpected

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
lc: decouple all features into independent package

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#117 from JoeyHwong-gk/joint

joint_inference: bug fix and interface reconstruction,
Merge pull request kubeedge#149 from JimmyYang20/refector-lc

lc: decouple all features into independent package,
fix lifelong issue
- Reduce parameters for initial
- show all interfaces of lifelong learning in example
- fix bugs from deep learning framework

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix il doc

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#153 from JimmyYang20/fix-doc

Fix rendering issue of example doc in readthedocs,
Merge pull request kubeedge#142 from JoeyHwong-gk/lls3

lifelong learning: issue-driven interface-adjustment and bug fix,
fix the lifelong example problem from backend and constant

- fix sklearn backend: support args in train/eval/infer
- fix lifelong constant

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Automatic push images when publishing a release

A github action is added for pushing image when a new release is created:
1. login docker hub.
2. checkout the project, and run `make push-all`.

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#154 from JoeyHwong-gk/lifelong

[Lifelong example]: fix the problem from backend and constant,
docs: update install guide

1. add GM/LC links
2. add GM/LC deploy form

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#155 from llhuii/add-image-push-gh-action

Push images automatically when a new release is created,
Merge pull request kubeedge#156 from llhuii/update-install-doc

docs: update install guide,
IL: LC supports to recover job when restart

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#152 from JimmyYang20/fixbug

IL: LC supports to recover job when restart,
Fix IMAGE_REPO in github image-publish action

Using the env 'GITHUB_REPOSITORY' instead of 'GITHUB_ACTOR' to get the right
image repo name i.e. `IMAGE_REPO` in Makefile.

Signed-off-by: llhuii <liulinghui@huawei.com>,
add lib doc

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Improve the docs

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
fix syntax and information in the docs
Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
update lib doc

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Update s3 example docs of IL&JI

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Support websocket reconnection

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Lib support hot model update

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Adjusting the Log of IncrementalLearning example

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Fix codegen verify checker

Note the codegen verify checker should report error

Signed-off-by: llhuii <liulinghui@huawei.com>,
Add the missing gencode for objectsearch/tracking

Signed-off-by: llhuii <liulinghui@huawei.com>,
fix job_kind value in LC_report

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Update dependency of server request in lib
- replace `retry==1.3.3` with `tenacity==8.0.1` because of `retry` no longer maintained.

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#158 from llhuii/fix-imagerepo-of-image-publish-action

Fix IMAGE_REPO in github image-publish action,
Merge pull request kubeedge#164 from JoeyHwong-gk/federated

Support websocket reconnection when the server status is abnormal,
Merge pull request kubeedge#166 from llhuii/fix-verify-checker

Fix codegen verify checker,
Add contributing docs

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#159 from JoeyHwong-gk/libdoc

update lib doc,
Merge pull request kubeedge#148 from llhuii/add-contributing-docs

Add contributing docs,
Merge pull request kubeedge#160 from JimmyYang20/doc-s3

Update s3 example docs of IL&JI,
fix access exceptions when rendering with sphinx

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#150 from JoeyHwong-gk/docs

docs improvement,
GM&LC: IL supports model hot updates

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
fix pr comment

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#138 from JimmyYang20/modelhotupdate

GM&LC: IL supports model hot updates,
Fix s3 example docs of IL&JI

Signed-off-by: JimmyYang20 <yangjin39@huawei.com>,
Merge pull request kubeedge#174 from JimmyYang20/doc-s3

Fix s3 example docs of IL&JI,
Merge pull request kubeedge#157 from JoeyHwong-gk/hot_model

[Lib Support] hot model update,
Upgrade gorilla/websocket from v1.4.0 to v1.4.2

This upgrade fixes a potential DoS vector bug in gorilla/websocket 1.4.0,
see GHSA-jf24-p9p9-4rjh

Signed-off-by: llhuii <liulinghui@huawei.com>,
Merge pull request kubeedge#182 from llhuii/upgrade-websocket

Upgrade gorilla/websocket from v1.4.0 to v1.4.2,
fix lib/requirements
- This upgrade fixes a CSRF error in FastAPI version earlier than 0.65.2,
see GHSA-8h2j-cgx8-6xv7

Signed-off-by: JoeyHwong <joeyhwong@gknow.cn>,
Merge pull request kubeedge#183

See merge request butterfly/sedna!9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants