Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance error handling in ETFeeder::lookupNode with detailed exception messages #10

Merged
merged 1 commit into from
Feb 6, 2024

Conversation

changhai0109
Copy link
Contributor

@changhai0109 changhai0109 commented Dec 12, 2023

Summary

This PR introduces a significant improvement in error handling within the ETFeeder::lookupNode method. By catching and handling std::out_of_range exceptions, the update ensures that any attempt to access a non-existent node ID in the dependency graph is met with a clear and informative error message.

Test Plan

$ cd ~/param
$ cd param/train/comms/pt
$ pip install .
$ cd ../../compute/python
$ pip install -r requirements.txt
$ python setup.py install
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_0.json --kineto-file ~/llama_kineto/worker0_step_12.1697596714999.pt.trace.json --output-file ~/rank0.json

$ cd ~/charka
$ pip install .
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank0.json --output_filename ~/rank0.chakra --num_dims 1 
$ cp ~/rank0.chakra chakra.0.et && cp chakra.0.et chakra.1.et

$ cd ~/astra-sim
$ ./build/astra_analytical/build.sh
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware\
  --workload-configuration=/Users/theo/chakra/chakra\
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
ring of node 0, id: 0 dimension: local total nodes in ring: 2 index in ring: 0 offset: 1total nodes in ring: 2
ring of node 0, id: 0 dimension: local total nodes in ring: 2 index in ring: 0 offset: 1total nodes in ring: 2
ring of node 0, id: 0 dimension: local total nodes in ring: 2 index in ring: 0 offset: 1total nodes in ring: 2
ring of node 0, id: 0 dimension: local total nodes in ring: 2 index in ring: 0 offset: 1total nodes in ring: 2
sys[0] finished, 7269182000 cycles
sys[1] finished, 7269182000 cycles

@changhai0109 changhai0109 requested a review from a team as a code owner December 12, 2023 00:04
Copy link

github-actions bot commented Dec 12, 2023

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo changed the title [ETFeeder] fix out of boundary bug in lookupNode func Enhance error handling in ETFeeder::lookupNode with detailed exception messages Feb 6, 2024
@TaekyungHeo
Copy link
Contributor

Looks good to me. It works.

@TaekyungHeo TaekyungHeo self-requested a review February 6, 2024 19:46
@srinivas212 srinivas212 merged commit c739829 into mlcommons:main Feb 6, 2024
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants