Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a sanity checking release test for Alpa and ray nightly. #32995

Merged
merged 14 commits into from
Mar 14, 2023

Conversation

gjoliver
Copy link
Member

@gjoliver gjoliver commented Mar 3, 2023

Why are these changes needed?

First nightly integration test for Alpa and Ray

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • [*] Release tests
    • This PR is not tested :(

Jun Gong added 5 commits March 8, 2023 06:58
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Jun Gong added 6 commits March 9, 2023 07:26
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Copy link
Member

@jiaodong jiaodong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great start of many more to come !

config = AutoConfig.from_pretrained(model_args.model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(
model_args.model_name_or_path,
use_fast=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can save some time with use_fast=True right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't hurt.
the majority of time is spent on running the model. the model is just so slow.
changed.

f"Step... {cur_step} | "
f"Loss: {train_metric['loss'].mean():.4f}, "
f"Throughput: {throughput_tokens:.2f} token/s, "
f"{throughput_tflops:.2f} TFLOP/s"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we continuously track this metrics in dashboard ? And generally speaking we should throw away first 1~2 steps and average the subsequent ones for perf tracking

not blocking and can be next PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.
I made the change to write a json dict of token and tflops throughput.
will run another test before merge.

Signed-off-by: Jun Gong <jungong@anyscale.com>
Copy link
Member Author

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

config = AutoConfig.from_pretrained(model_args.model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(
model_args.model_name_or_path,
use_fast=False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't hurt.
the majority of time is spent on running the model. the model is just so slow.
changed.

f"Step... {cur_step} | "
f"Loss: {train_metric['loss'].mean():.4f}, "
f"Throughput: {throughput_tokens:.2f} token/s, "
f"{throughput_tflops:.2f} TFLOP/s"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.
I made the change to write a json dict of token and tflops throughput.
will run another test before merge.

Signed-off-by: Jun Gong <jungong@anyscale.com>
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just quickly check the python version

release/alpa_tests/app_config.yaml Outdated Show resolved Hide resolved
Comment on lines +2256 to +2258
working_dir: alpa_tests

frequency: nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
working_dir: alpa_tests
frequency: nightly
working_dir: alpa_tests
python: "3.9"
frequency: nightly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know. keep it in my sleeve for now :)

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
@gjoliver
Copy link
Member Author

@jiaodong I am gonna merge this now. we get metrics like, and we can always optimize things more later.

[INFO 2023-03-14 11:47:14,557] log.py: 41  Observed the following results:
--
  |  
  | throughput_tokens = 72268.23956628374
  | throughput_tflops = 152.47067618098896

@gjoliver gjoliver merged commit 004cd2b into ray-project:master Mar 14, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request Mar 21, 2023
…ay-project#32995)

Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ay-project#32995)

Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request Mar 22, 2023
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ay-project#32995)

Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…ay-project#32995)

Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants