Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR draft of new YOLO and COCO data set loaders #4527

Closed
wants to merge 20 commits into from

Conversation

SkalskiP
Copy link
Contributor

@SkalskiP SkalskiP commented Aug 24, 2021

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

This PR involves a major refactoring of the dataset utilities used in Ultralytics' YOLOv5, which includes separating utility functions into newly created files and preparing for future development with new tests.

📊 Key Changes

  • Renamed utils/datasets.py to utils/datasets_old.py and updated all imports to reflect this change.
  • Introduced new testing files (tests/__init__.py, tests/utils/__init__.py, tests/utils/datasets/__init__.py, tests/utils/datasets/test_coco.py, tests/utils/test_file.py, tests/utils/test_utils.py) to ensure code quality and correctness.
  • Added pytest to requirements.txt to enable the new testing framework.
  • Added new datasets functionality, splitting functionalities into specific modules like utils/datasets/coco.py and utils/datasets/yolo.py.
  • Created utils/datasets/core.py that includes dataset loaders like COCODataset and YOLODataset.
  • Included utility scripts for image and label caching (utils/datasets/image_cache.py and utils/datasets/label_cache.py) to improve performance when working with large datasets.
  • Added a detailed utils/datasets/todo.txt indicating future work and improvements to be made in the datasets module.

🎯 Purpose & Impact

  • Code Organization and Clarity: The restructuring into separate modules makes the codebase easier to navigate and maintain, potentially speeding up future development.
  • Improved Testing: With the addition of testing files, continued development of the YOLOv5 will be more reliable as changes can be rigorously tested.
  • Performance Optimization: The caching mechanisms introduced will likely improve the speed and efficiency when training models on large datasets.
  • Continued Development Readiness: The detailed todo.txt suggests there is ongoing work to improve the datasets utilities further, giving insight into the direction of future updates.

@glenn-jocher
Copy link
Member

@SkalskiP I did a minor PR #4548 on datasets.py today, was going to tell you but looks like you've already merged master, so good job, carry on! :)

@SkalskiP
Copy link
Contributor Author

@glenn-jocher yes sir! I noticed and merged everything into my PR.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 28, 2021

@SkalskiP was scanning our TODO list for more low hanging fruit. I merged item 19 EarlyStopping today in #4576, and saw item 6. native n-channel image support (i.e. 1-ch greyscale, 4-ch or n-ch hyperspectral images), which seems easy enough, but would require updates throughout datasets.py and augmentations.py, which you've got your own updates to now I see.

There's also 4. auto batch-size, and 13. Annotator class I can do.

EDIT: 13. Annotator class is now complete and merged in #4591

@SkalskiP
Copy link
Contributor Author

@glenn-jocher I read your comment today. I already merged all changes that you have done over the weekend. As for the next steps, it would be great if you would hold back with native n-channel image support (i.e. 1-ch greyscale, 4-ch or n-ch hyperspectral images). It may be easy to implement it but, it'll be hard for me to merge all changes and tie everything up :/

@glenn-jocher
Copy link
Member

@SkalskiP yeah that's what I figured. No worries, there's plenty more on the TODO list that doesn't interact with datasets.py, I'll tackle those first.

@glenn-jocher glenn-jocher linked an issue Oct 8, 2021 that may be closed by this pull request
@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Nov 18, 2021
@github-actions github-actions bot closed this Nov 25, 2021
@glenn-jocher glenn-jocher deleted the new_data_set_loaders branch August 1, 2022 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coco annotations possible?
2 participants