Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor OP & Dataset (from #336) #359

Merged
merged 4 commits into from
Jul 18, 2024
Merged

Refactor OP & Dataset (from #336) #359

merged 4 commits into from
Jul 18, 2024

Conversation

yxdyc
Copy link
Collaborator

@yxdyc yxdyc commented Jul 17, 2024

  • Decouple operator logic from executor. Enabled new interfaces:
    • dataset = dataset.process(op)
    • dataset = dataset.process([op1, op2])
    • dataset = op(dataset)
    • dataset = op.run(dataset)
      ⚠️ Be careful with non-sharable loguru.logger
    • Sample level error catching #325
  • Work with Ray.
    • [skipped] Ray dataset.filter does not have batched version
  • Work with batched op. All fault-tolerant dataset.map functions must be batched, the following methods are now wrapped:
    • Mapper.process
    • Filter.compute_stats
    • Deduplicator.compute_hash

Next Steps on following PRs:

* modelscope-sora news (#323)

* News/modelscope sora (#327)

* modelscope-sora news

* remove empower

* debug for gpu rank for analyser (#329)

* debug for gpu rank for analyser

* spec_numprocs -> num_proc

* Add more unittest  (#304)

* add unittest env with gpu

* fix unittest yml

* add environment for unittest

* update workflow trigger

* update install step

* fix install command

* update working dir

* update container

* update working dir

* change working directory

* change working directory

* change working directory

* change working directory

* change unittest

* use test tag

* finish tag support

* support run op with different executro

* fix pre-commit

* add hf mirror

* add hf mirror

* run all test in standalone mode by default

* ignore image face ratio

* update tags

* add ray testcase

* add ray test in workflow

* update ray unittest workflow

* delete old unittest

---------

Co-authored-by: root <panxuchen>

* Add source tag (#317)

* add source tag for some mapper op

* fix no attribute 'current_tag' when executing local tests

* move op process logic from executor to base op

* fix typo

* move export outside op

* init refactor

* update analyser

* fix format

* clean up

* bring back batch mapper

* Improve fault tolerance & Fix Ray executor

* fix wrapper

* fix batched filter

* Remove use_actor as it is not compatible with the refactored OP clas, unless the dataset class is refactored

* make wrappers work with unittests

* Compatible with unit tests and works with ray

* fix unittest

* fix wrappers with ray, map, filter

* unify unittests

* wrap deduplicators

* Compatible with non-batched calls

* Class-level wrappers

- compatible with dataset.filter
- bring back nested wrappers

* Instance-level wrappers

* Refined instance-level wrappers

- Remove incomplete dataset.filter wrappers
- Simplify code
- Stack wrappers

* fix use_cuda

* Refactor dataset (#348)

* refactor dataset

* update unittest with DJDataset

* fix unittest

* update ray data load

* add test

* ray read json

* update docker image version

* actor is no longer supported

* Regress filter's stats export logic

---------

Co-authored-by: BeachWang <1400012807@pku.edu.cn>
Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com>
Co-authored-by: chenhesen <hesen.chs@alibaba-inc.com>
Co-authored-by: garyzhang99 <garyzhang99@163.com>
@yxdyc
Copy link
Collaborator Author

yxdyc commented Jul 17, 2024

Plz carefully resolve the conflicts later @drcege, especially due to #354

@yxdyc yxdyc changed the title Refactor OP & Dataset (#336) Refactor OP & Dataset (from #336) Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants