Refactor OP & Dataset (from #336) #359

yxdyc · 2024-07-17T12:36:34Z

Next Steps on following PRs:

Check warnings in Ray mode
improve exception handling Implement the basic classes of DJ-Exception and DJ-Monitor #331
Docs reflecting the new nice features from this refactor, interfaces, fault-tolerance, ...

* modelscope-sora news (#323) * News/modelscope sora (#327) * modelscope-sora news * remove empower * debug for gpu rank for analyser (#329) * debug for gpu rank for analyser * spec_numprocs -> num_proc * Add more unittest (#304) * add unittest env with gpu * fix unittest yml * add environment for unittest * update workflow trigger * update install step * fix install command * update working dir * update container * update working dir * change working directory * change working directory * change working directory * change working directory * change unittest * use test tag * finish tag support * support run op with different executro * fix pre-commit * add hf mirror * add hf mirror * run all test in standalone mode by default * ignore image face ratio * update tags * add ray testcase * add ray test in workflow * update ray unittest workflow * delete old unittest --------- Co-authored-by: root <panxuchen> * Add source tag (#317) * add source tag for some mapper op * fix no attribute 'current_tag' when executing local tests * move op process logic from executor to base op * fix typo * move export outside op * init refactor * update analyser * fix format * clean up * bring back batch mapper * Improve fault tolerance & Fix Ray executor * fix wrapper * fix batched filter * Remove use_actor as it is not compatible with the refactored OP clas, unless the dataset class is refactored * make wrappers work with unittests * Compatible with unit tests and works with ray * fix unittest * fix wrappers with ray, map, filter * unify unittests * wrap deduplicators * Compatible with non-batched calls * Class-level wrappers - compatible with dataset.filter - bring back nested wrappers * Instance-level wrappers * Refined instance-level wrappers - Remove incomplete dataset.filter wrappers - Simplify code - Stack wrappers * fix use_cuda * Refactor dataset (#348) * refactor dataset * update unittest with DJDataset * fix unittest * update ray data load * add test * ray read json * update docker image version * actor is no longer supported * Regress filter's stats export logic --------- Co-authored-by: BeachWang <1400012807@pku.edu.cn> Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com> Co-authored-by: chenhesen <hesen.chs@alibaba-inc.com> Co-authored-by: garyzhang99 <garyzhang99@163.com>

yxdyc · 2024-07-17T12:55:06Z

Plz carefully resolve the conflicts later @drcege, especially due to #354

yxdyc changed the title ~~Refactor OP & Dataset (#336)~~ Refactor OP & Dataset (from #336) Jul 17, 2024

yxdyc assigned yxdyc and drcege Jul 17, 2024

Merge branch 'main' into refactor/main

2bce68d

drcege had a problem deploying to Testing July 18, 2024 06:04 — with GitHub Actions Failure

minor fix

70409c6

drcege had a problem deploying to Testing July 18, 2024 06:07 — with GitHub Actions Failure

fix num_proc default None

9eacfde

drcege temporarily deployed to Testing July 18, 2024 06:28 — with GitHub Actions Inactive

HYLcool approved these changes Jul 18, 2024

View reviewed changes

yxdyc merged commit 9f97231 into main Jul 18, 2024
3 checks passed

drcege mentioned this pull request Jul 18, 2024

fix ray batch-sample format #349

Closed

garyzhang99 mentioned this pull request Jul 24, 2024

correct num_proc logic #365

Merged

yxdyc mentioned this pull request Jul 26, 2024

[Feature Request] Implement more streamlined interfaces for users seeking minimal functionality (data_juicer.op.functional) #261

Closed

2 tasks

drcege mentioned this pull request Jul 29, 2024

[Feat]: Add Ray actor support #371

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor OP & Dataset (from #336) #359

Refactor OP & Dataset (from #336) #359

yxdyc commented Jul 17, 2024 •

edited

Loading

yxdyc commented Jul 17, 2024

Refactor OP & Dataset (from #336) #359

Refactor OP & Dataset (from #336) #359

Conversation

yxdyc commented Jul 17, 2024 • edited Loading

yxdyc commented Jul 17, 2024

yxdyc commented Jul 17, 2024 •

edited

Loading