-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate Article Collection and README Updates from Blog Source #1208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 34f80c6 in 18 seconds
More details
- Looked at
166
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. scripts/update_blogs_in_learning_resources.py:20
- Draft comment:
Consider using an HTTP library likerequests
instead of Selenium for fetching articles if JavaScript execution is not required. This can improve performance and reduce resource usage. - Reason this comment was not posted:
Confidence changes required:50%
The script uses Selenium to scrape a webpage, which can be resource-intensive and slow. It would be more efficient to use an HTTP library like requests if the page content can be accessed without JavaScript execution.
2. scripts/update_blogs_in_learning_resources.py:110
- Draft comment:
Consider creating a backup of the README file before modifying it to prevent data loss in case of an error. - Reason this comment was not posted:
Confidence changes required:50%
The script directly modifies the README file without creating a backup. It's a good practice to create a backup before making changes to important files.
3. scripts/update_blogs_in_learning_resources.py:45
- Draft comment:
Avoid using hardcoded blog links if they can be fetched dynamically. This ensures the script always has the most up-to-date information. - Reason this comment was not posted:
Confidence changes required:50%
The script uses hardcoded blog links which might not be necessary if the links can be fetched dynamically. This could lead to outdated information if the hardcoded links change.
4. scripts/update_blogs_in_learning_resources.py:20
- Draft comment:
Thefetch_articles
function is handling both fetching and processing articles. Consider separating the fetching logic from the processing logic to adhere to the Single Responsibility Principle. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is suggesting a refactor to improve adherence to the Single Responsibility Principle, which is a good practice. The function indeed handles two distinct tasks: fetching data and processing it. Separating these tasks could make the code cleaner and easier to maintain. This aligns with the rules for suggesting code quality refactors.
The function is not overly complex, and the current implementation might be sufficient for its purpose. The suggested refactor could be seen as unnecessary if the function's current form is clear and maintainable.
While the function is not overly complex, adhering to the Single Responsibility Principle is generally beneficial for future maintenance and scalability. The suggestion is reasonable and aligns with best practices.
The comment is valid as it suggests a reasonable refactor to adhere to the Single Responsibility Principle. It should be kept as it provides actionable advice for improving code quality.
5. scripts/update_blogs_in_learning_resources.py:77
- Draft comment:
Ensure function names follow a consistent naming pattern. For example, useget_cutoff_date
consistently with other functions likefetch_articles
andupdate_readme
. - Reason this comment was not posted:
Comment did not seem useful.
6. scripts/update_blogs_in_learning_resources.py:1
- Draft comment:
Consider adding this script to the Sphinx documentation underdocs/
to provide users with guidance on its purpose and usage. - Reason this comment was not posted:
Confidence changes required:50%
The script should be documented in the Sphinx documentation underdocs/
to help users understand its purpose and usage.
Workflow ID: wflow_xRPaLBAeGbNhfbbM
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
return datetime.strptime(date_str, "%Y-%m-%d").date() | ||
except ValueError: | ||
print("Invalid date format. Please use YYYY-MM-DD.") | ||
return get_cutoff_date() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using recursion for input validation can lead to a stack overflow. Consider using a loop to repeatedly prompt for input until a valid date is entered.
Awesome, looking forward to trying this tomorrow / Monday! Otherwise one request -- could you include a "print to standard out" option please? |
Hi @skrawcz. I've added that option. I've also added a docstring in the python script file so that it would be easier for the users to get started with it. |
@sikehish you need to:
Then I can merge this. Otherwise tested locally and it works, thank you! |
Hi @skrawcz |
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182901 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182885 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182883 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182877 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182862 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182842 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182748 +0530 Update materialization.rst Fix mutate docstring Docstring raised Pytest escape error due to /* so we wrap it in quotes. Improve pipe_output first node naming Current name convention is very prone to name clashes with user naming. We assign the same naming convention using namespace.raw to indicate the first node. Provide dockerx setup for docker builds (DAGWorks-Inc#1194) This change adds a script for multi-platform docker builds as well as a github workflow to automatically build them when a new sf-hamilton-ui version has been published. Squashed commits: * Created buildx_and_push.sh script in ui directory to create multi-platform docker builds * Fixed content related issue in buildx_and_push.sh * buildx_and_push.sh: Added functionality to fecth the latest version from PyPi * buildx_and_push.sh: Added check_buildx_installed * buildx_and_push.sh: Added check_buildx_installed * buildx_and_push.sh: Enhanced error handling(checking if jq exists, curl response handling and docker buildx error handling) * Adding build args to buildx_and_push.sh * buildx_and_push.sh: Changes made to test a new workflow * Created a new workflow: hamilton-ui-build-and-push * buildx_and_push.sh: echo statement added to debug * buildx_and_push.sh: cd'ing to the directory(ui) where the shell script is located to prevent context related errors in workflow. * hamilton-ui-build-and-push.yml: Added Docker Hub login step * buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. * hamilton-ui-build-and-push.yml: Replaced previous version from cache with version tag from Dockerhub image. * buildx_and_push.sh: Changed dockerhub username for testing * hamilton-ui-build-and-push.yml: Minor change in the docker registry URL(version) * hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script * hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT * hamilton-ui-build-and-push.yml: Conditional execution of steps implemented * Undid change in dockerhub username * Update ui/buildx_and_push.sh * Update ui/buildx_and_push.sh * chore: fix pre-commit whitespace issues --------- Co-authored-by: Stefan Krawczyk <stefan@dagworks.io> Fix `keep_dot` propagation in `Driver` display functions Bumps hamilton version from 1.81.0 to 1.81.1 fix: caching `SQLiteMetadataStore.get_run_ids()` (DAGWorks-Inc#1205) * fixed .get_run_ids() and standardized .get_run() + tests * fixed docstrings formatting errors --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2> Bumps hamilton version from 1.81.1 to 1.81.2 buildx_and_push.sh: Changes made to test a new workflow buildx_and_push.sh: echo statement added to debug buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. buildx_and_push.sh: Changed dockerhub username for testing Undid change in dockerhub username Created a new update external blogs script which updates External Blogs section in learning_resources.md with the latest blogs(with a date cutoff) Added docstring in update_blogs_in_learning_resources.py chore: fix pre-commit whitespace issues update_blogs_in_learning_resources.py: Added a "print to standard out" option (using --print flag) update_blogs_in_learning_resources.py: Argument parser with expaned help text update_blogs_in_learning_resources.py: Expanded docstring Fixes whitespace and small docs typo buildx_and_push.sh: Changes made to test a new workflow buildx_and_push.sh: echo statement added to debug hamilton-ui-build-and-push.yml: Added Docker Hub login step buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. buildx_and_push.sh: Changed dockerhub username for testing hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT chore: fix pre-commit whitespace issues Update ui/buildx_and_push.sh Update ui/buildx_and_push.sh Fixes whitespace and small docs typo
Thanks @sikehish ! |
This pull request introduces a script for fetching articles from a specified URL(https://blog.dagworks.io/archive), processing the data, and updating the README file with the fetched articles. It addresses issue #1046.
Changes
How I tested this
Checklist