-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --ignored Flag to Exclude Specific Files and Directories During Ingestion #1432
Conversation
Added the --ignored flag bc python projects with __pycache__ and Mac projects in general with .DS_Store caused UnicodeDecodeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution!
It's interesting to note that the test pipeline's failure was due to not running 'black .' for code formatting, rather than actual code errors. To prevent such issues, i might integrate 'black' as a step in your GitHub Actions. This way, code is automatically formatted before it's even merged, ensuring consistency and preventing similar failures. This would streamline our development process by catching formatting issues early and automatically, without relying on manual execution. Would you merge if I make these changes? |
I'd rather add black as a pre-commit action than a github action. I'm no big fan of Github changing code without the developer noticing. Thanks for your feedback! |
black still fails. Run |
# Conflicts: # scripts/ingest_folder.py
Ran |
Stale pull request |
Hey @imartinez @smbrine, it looks like the feedback in this PR was addressed. Is there anything else preventing it from being merged? I've been using this branch for some weeks now, I'd love this feature in main 🙂 |
Thanks for the heads up! I've disabled the GitHub action that closed the PR. Merging it now! |
…irectories During Ingestion (zylon-ai#1432)
This pull request introduces the
--ignored
flag to the ingestion script. The motivation for this change stems from encounteringUnicodeDecodeError
when the script processes Python-generated__pycache__
folders and MacOS-specific.DS_Store
files. These files are not relevant to our data processing and can cause errors due to their format.The
--ignored
flag allows users to specify a list of files or directories to exclude from the ingestion process, enhancing the script's flexibility and reliability, especially in Python and Mac environments.