Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.5.0 forward merge #339

Closed
wants to merge 12 commits into from
Closed

Conversation

ayushdg
Copy link
Collaborator

@ayushdg ayushdg commented Oct 30, 2024

Description

Merge in changes from the 0.5.0 release into main.
Specifically excludes the changes around our requirements.txt being pinned to specific versions in the previous release.

Also bumps our current version to 0.6.0.dev0 for folks installing from source.

Usage

N/A

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

sarahyurick and others added 8 commits October 30, 2024 12:13
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
…NVIDIA#301)

This PR ensures that users can run the PEFT SDG tutorial using arbitrary
API endpoints by exposing the URL that is used for synthetic data
generation.

Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
* add column param

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* read_pickle and black

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* optional param

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* add parquet comment

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

---------

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
* keep_filename_column param

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* update pytests

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

* run black

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

---------

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
* Add partial image implementation

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Refactor requirements

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix bugs

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Change from_map to map_partitions

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add super constructor

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add kwargs for load_object_on_worker

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Get proper epoch size

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Complete embedding creation loop

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Change devices

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add device

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Refactor embedding creation and add classifier

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix bugs in classifiers

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Refactor model names

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add model name

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix classifier bugs

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Allow postprocessing for classifiers

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix name and add print

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix variable name

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add NSFW

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update init for import

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix embedding size

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add fused classifiers

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix missing index

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update metdata for fused classifiers

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add export to webdataset

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix missing id col

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Sort embeddings by id

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add timm

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update init file

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add autocast to timm

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update requirements and transform

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add additional interpolation support

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix transform normalization

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Remove open_clip

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add index path support to wds

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Address Vibhu's feedback

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add import guard for image dataset

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Change default device

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Remove commented code

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Remove device id

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix index issue

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add docstrings and standardize variable names

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add image curation tutorial

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add initial image docs

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Remove tutorial

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add dataset docs

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add embedder documentation

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Revert embedding column name change

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update user guide for images

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update README

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Update README with RAPIDS nightly instructions

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix formatting issues in image documentation

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Remove extra newline in README

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Address most of Sarah's feedback

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add section summary

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix errors and REWORD GPU bullets in README

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix how table of contents displays with new sections

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

---------

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
* Speedup fuzzy dedup by avoiding merge

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Remove unused function

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Clean up PR based on Praateeks reviews

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* style fixes

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* style fixes

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Remove dangling print

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Add handling for multiple columns

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Nuking convert to strings

Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

* Nuking convert to strings

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Verify it works on exp-01

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

* Add dask profile options and add overwrite

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>

---------

Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
@ayushdg ayushdg added the meta General NeMo-Curator maintenance/packaging label Oct 30, 2024
ryantwolf and others added 4 commits October 30, 2024 12:15
Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
* Change download for NSFW model

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix model init

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix embedding size

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

---------

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
@ayushdg ayushdg closed this Oct 30, 2024
@ayushdg ayushdg deleted the v0.5.0_forward_merge branch October 30, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta General NeMo-Curator maintenance/packaging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants