Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.7.x] Backport fixing batch_norm and layer_norm for large tensors (#17805) #18261

Merged
merged 1 commit into from
May 11, 2020

Conversation

ChaiBapchya
Copy link
Contributor

@ChaiBapchya ChaiBapchya commented May 8, 2020

Backport #17805 to fix large tensor nightly test issue
Co-authored-by: Rohit Kumar Srivastava srivastava.141@buckeyemail.osu.edu

Verified the changes locally by

setting up the Infra https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results
Build :

sudo ci/build.py --docker-registry mxnetci --platform ubuntu_nightly_cpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh build_ubuntu_cpu_large_tensor

Test

sudo ci/build.py --docker-registry mxnetci --platform ubuntu_nightly_cpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_test_large_vector

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
@mxnet-bot
Copy link

Hey @ChaiBapchya , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, centos-cpu, sanity, centos-gpu, website, windows-gpu, unix-cpu, clang, unix-gpu, edge, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@ChaiBapchya
Copy link
Contributor Author

Fixes #18246

@ciyongch
Copy link
Contributor

ciyongch commented May 8, 2020

Seems it failed to fetch the external packages for TVM, please re-trigger the CI.

@ChaiBapchya
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, unix-gpu, clang]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, clang, unix-gpu]

@ChaiBapchya
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, unix-gpu]

@ciyongch
Copy link
Contributor

Still fail to fetch clang_llvm package, please help to trigger again, thanks! :)

@ChaiBapchya
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu,unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, unix-gpu]

@ciyongch
Copy link
Contributor

Hi @TaoLv @leezu , please help to review and merge the PR, thanks!

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TaoLv TaoLv merged commit ceb0f06 into apache:v1.7.x May 11, 2020
@ChaiBapchya
Copy link
Contributor Author

Since this is merged, I have retriggered the NightlyTestForBinaries for v1.7.x branch
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/v1.7.x/3/pipeline

Hopefully it passes

@ChaiBapchya ChaiBapchya deleted the fix_lt_nightly_v17_layernorm branch May 11, 2020 05:46
@ciyongch
Copy link
Contributor

Hi @ChaiBapchya thanks a lot for your prompt fix :)

BTW, does it need to manually trigger the tests (both NightlyTest and NightlyTestForBinaries) when needed or they'll be triggered automatically once there's any new commit?
I saw NightlyTest is not triggered for a while even with new commits: http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/NightlyTests/job/v1.7.x/

@ciyongch
Copy link
Contributor

@ChaiBapchya unfortunately, the failure still happened in the nightly test, please check log here. Something different in local and jenkins server?

@ChaiBapchya
Copy link
Contributor Author

ChaiBapchya commented May 11, 2020

Sorry for that confusion. As you can see here : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/v1.7.x/3/pipeline
Changes section says : "Replayed #2"

By mistake, instead of triggering a build using "Build Now" tab in Jenkins here : http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTestsForBinaries/job/v1.7.x/

I replayed # 2 [thinking it will have the same effect]. But no, it doesn't pick up new commits as you can see in build # 3 [it says no changes].

Now I triggered the build correctly, build # 4 [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/v1.7.x/4/pipeline]
This one picks up the 2 updated commits and doesn't say "Replay"
This should work. Also for completeness, I also started a build on nightlyTest pipeline. So that both the pipelines are green for the latest v1.7.0

@ChaiBapchya
Copy link
Contributor Author

ChaiBapchya commented May 11, 2020

BTW, does it need to manually trigger the tests (both NightlyTest and NightlyTestForBinaries) when needed or they'll be triggered automatically once there's any new commit?

No it hasn't been configured to be triggered for every commit because it is "Nightly". But I will check why it isn't being triggered on a Nightly basis either. Checking the configure tab of the pipeline doesn't reveal any triggering mechanism is in place on Jenkins side. Maybe it's done in some other way. Let me get back.

@ChaiBapchya
Copy link
Contributor Author

@ciyongch
Copy link
Contributor

Hi @ChaiBapchya , thanks a lot for your great support to fix #18246! NightlyTestForBinaries passed now:) Let's wait for a while to see if NightlyTest pipeline works.
Suppose both of the two jobs will be triggered once per day (nightly) when it's configured correctly, am I right?

@ciyongch
Copy link
Contributor

Hi @ChaiBapchya , I'm wondering if you got a chance to check the trigger issue for nightly test (both NightlyTests and NightlyTestsForBinaries)? Seems they're failed to be triggered "nightly".
The lastest NightTests is April 27, while for NightlyTestsForBinaries is May 12.
Thanks a lot!
Ciyong

@ChaiBapchya
Copy link
Contributor Author

We don't trigger it nightly for 1.7.x branch
It's only triggered for master
Coz otherwise it would keep piling [for eg we have previous release branches like 1.6.x] that would be tested nightly as well.
Right now it's manually triggered for 1.7.x branch.
What do you advise? @leezu @szha

@ciyongch
Copy link
Contributor

@ChaiBapchya , can you help to trigger both of NightTests an NightlyTestsForBinaries for v1.7.x branch, as there're several PRs get merged recently.
@leezu @szha since currently the nightly test for v1.7.x is triggered manually, is that possible to add me or @TaoLv to the privilege account list which allow to trigger the nightly test till the release process is done, to minimize bothering Chai to help doing it?
Thanks.

@ChaiBapchya
Copy link
Contributor Author

Triggered NightlyTests and NightlyTestsForBinaries for v1.7.x

@ChaiBapchya
Copy link
Contributor Author

@TaoLv is a committer and Committers can trigger pipelines on Jenkins.
@ciyongch I've given "ciyonch" access to trigger jobs as well. Pl confirm. Hope that helps. Thanks.

@ciyongch
Copy link
Contributor

Hi @ChaiBapchya , appreciate for your kindly help, I will give a try in the following day to trigger the nightly build. Will let you know if it works :)

@ciyongch
Copy link
Contributor

ciyongch commented Sep 23, 2020

This PR was miss in v1.x branch, need to backport to v1.x and v1.8.x. @samskalicky @ChaiBapchya

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants