debug for gpu rank for analyser #329

BeachWang · 2024-06-11T11:38:29Z

as the title says

data_juicer/core/analyser.py

drcege

LGTM. @garyzhang99 please take a look as well.

garyzhang99

LGTM

* modelscope-sora news (#323) * News/modelscope sora (#327) * modelscope-sora news * remove empower * debug for gpu rank for analyser (#329) * debug for gpu rank for analyser * spec_numprocs -> num_proc * Add more unittest (#304) * add unittest env with gpu * fix unittest yml * add environment for unittest * update workflow trigger * update install step * fix install command * update working dir * update container * update working dir * change working directory * change working directory * change working directory * change working directory * change unittest * use test tag * finish tag support * support run op with different executro * fix pre-commit * add hf mirror * add hf mirror * run all test in standalone mode by default * ignore image face ratio * update tags * add ray testcase * add ray test in workflow * update ray unittest workflow * delete old unittest --------- Co-authored-by: root <panxuchen> * Add source tag (#317) * add source tag for some mapper op * fix no attribute 'current_tag' when executing local tests * move op process logic from executor to base op * fix typo * move export outside op * init refactor * update analyser * fix format * clean up * bring back batch mapper * Improve fault tolerance & Fix Ray executor * fix wrapper * fix batched filter * Remove use_actor as it is not compatible with the refactored OP clas, unless the dataset class is refactored * make wrappers work with unittests * Compatible with unit tests and works with ray * fix unittest * fix wrappers with ray, map, filter * unify unittests * wrap deduplicators * Compatible with non-batched calls * Class-level wrappers - compatible with dataset.filter - bring back nested wrappers * Instance-level wrappers * Refined instance-level wrappers - Remove incomplete dataset.filter wrappers - Simplify code - Stack wrappers * fix use_cuda * Refactor dataset (#348) * refactor dataset * update unittest with DJDataset * fix unittest * update ray data load * add test * ray read json * update docker image version * actor is no longer supported * Regress filter's stats export logic --------- Co-authored-by: BeachWang <1400012807@pku.edu.cn> Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com> Co-authored-by: chenhesen <hesen.chs@alibaba-inc.com> Co-authored-by: garyzhang99 <garyzhang99@163.com>

* Refactor OP & Dataset (#336) * modelscope-sora news (#323) * News/modelscope sora (#327) * modelscope-sora news * remove empower * debug for gpu rank for analyser (#329) * debug for gpu rank for analyser * spec_numprocs -> num_proc * Add more unittest (#304) * add unittest env with gpu * fix unittest yml * add environment for unittest * update workflow trigger * update install step * fix install command * update working dir * update container * update working dir * change working directory * change working directory * change working directory * change working directory * change unittest * use test tag * finish tag support * support run op with different executro * fix pre-commit * add hf mirror * add hf mirror * run all test in standalone mode by default * ignore image face ratio * update tags * add ray testcase * add ray test in workflow * update ray unittest workflow * delete old unittest --------- Co-authored-by: root <panxuchen> * Add source tag (#317) * add source tag for some mapper op * fix no attribute 'current_tag' when executing local tests * move op process logic from executor to base op * fix typo * move export outside op * init refactor * update analyser * fix format * clean up * bring back batch mapper * Improve fault tolerance & Fix Ray executor * fix wrapper * fix batched filter * Remove use_actor as it is not compatible with the refactored OP clas, unless the dataset class is refactored * make wrappers work with unittests * Compatible with unit tests and works with ray * fix unittest * fix wrappers with ray, map, filter * unify unittests * wrap deduplicators * Compatible with non-batched calls * Class-level wrappers - compatible with dataset.filter - bring back nested wrappers * Instance-level wrappers * Refined instance-level wrappers - Remove incomplete dataset.filter wrappers - Simplify code - Stack wrappers * fix use_cuda * Refactor dataset (#348) * refactor dataset * update unittest with DJDataset * fix unittest * update ray data load * add test * ray read json * update docker image version * actor is no longer supported * Regress filter's stats export logic --------- Co-authored-by: BeachWang <1400012807@pku.edu.cn> Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com> Co-authored-by: chenhesen <hesen.chs@alibaba-inc.com> Co-authored-by: garyzhang99 <garyzhang99@163.com> * minor fix * fix num_proc default None --------- Co-authored-by: Ce Ge (戈策) <gece@foxmail.com> Co-authored-by: BeachWang <1400012807@pku.edu.cn> Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com> Co-authored-by: chenhesen <hesen.chs@alibaba-inc.com> Co-authored-by: garyzhang99 <garyzhang99@163.com> Co-authored-by: null <3213204+drcege@users.noreply.github.com>

BeachWang added 2 commits June 11, 2024 19:37

debug for gpu rank for analyser

71158fb

after pre-commit

054ab58

yxdyc requested a review from drcege June 15, 2024 10:09

BeachWang self-assigned this Jun 17, 2024

drcege reviewed Jun 17, 2024

View reviewed changes

data_juicer/core/analyser.py Outdated Show resolved Hide resolved

BeachWang added 2 commits June 24, 2024 19:34

spec_numprocs -> num_proc

2b9cdfc

save fix

8febfbe

drcege approved these changes Jun 25, 2024

View reviewed changes

drcege requested a review from garyzhang99 June 25, 2024 06:09

BeachWang merged commit a8305bc into main Jun 25, 2024
4 checks passed

garyzhang99 reviewed Jun 25, 2024

View reviewed changes

HYLcool deleted the debug/gpu_rank_for_analyser branch December 23, 2024 03:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug for gpu rank for analyser #329

debug for gpu rank for analyser #329

BeachWang commented Jun 11, 2024

drcege left a comment

garyzhang99 left a comment

debug for gpu rank for analyser #329

debug for gpu rank for analyser #329

Conversation

BeachWang commented Jun 11, 2024

drcege left a comment

Choose a reason for hiding this comment

garyzhang99 left a comment

Choose a reason for hiding this comment