Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Improve the RSSSourceStage for Sherlock workflows #1274

Closed
2 tasks done
Tracked by #1298
mdemoret-nv opened this issue Oct 16, 2023 · 1 comment · Fixed by #1285
Closed
2 tasks done
Tracked by #1298

[FEA]: Improve the RSSSourceStage for Sherlock workflows #1274

mdemoret-nv opened this issue Oct 16, 2023 · 1 comment · Fixed by #1285
Assignees
Labels
feature request New feature or request sherlock Issues/PRs related to Sherlock workflows and components

Comments

@mdemoret-nv
Copy link
Contributor

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

There are a few missing features that are necessary to support Sherlock workflows using RSS feeds as a source.

Describe your ideal solution

The following features should be added to the RSSSourceStage:

  1. Ability to specify more than one URL to fetch as a source
  2. Ability to manually specify whether it should run indefinitely or only once
  3. Ability to manually specify the batch size different from the pipeline batch size
  4. Ability to cache the results using requests_cache (for testing and to prevent getting blocked for too many requests)
  5. Improve the parsing capability to use different sources if one fails
  6. Not sure why, but sometimes using requests to download the feed and then parsing it with feedparser works better. Add a fallback path when a parsing error occurs to use a secondary method before erroring.
  7. Add ability to continue even when parsing fails.

Additional context

See prototype implementation of the above changes here: https://github.com/nv-morpheus/Morpheus/blob/1f06493e3d9fe6ec22c0b373f6284cc8369d6f52/morpheus/controllers/rss_controller.py

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@mdemoret-nv mdemoret-nv added the feature request New feature or request label Oct 16, 2023
@mdemoret-nv mdemoret-nv added this to the 23.11 - Sherlock milestone Oct 16, 2023
@mdemoret-nv mdemoret-nv added the sherlock Issues/PRs related to Sherlock workflows and components label Oct 16, 2023
@bsuryadevara bsuryadevara moved this from Todo to In Progress in Morpheus Boards Oct 17, 2023
@bsuryadevara bsuryadevara linked a pull request Oct 17, 2023 that will close this issue
@jarmak-nv jarmak-nv moved this from In Progress to Review - Ready for Review in Morpheus Boards Oct 18, 2023
rapids-bot bot pushed a commit that referenced this issue Oct 19, 2023
- Added fallback parsing functionality to parse RSS feed using BeautifulSoup library
- Updated tests

Closes #1274 

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Bhargav Suryadevara (https://github.com/bsuryadevara)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #1285
@mdemoret-nv
Copy link
Contributor Author

Fixed by #1285

@github-project-automation github-project-automation bot moved this from Review - Ready for Review to Done in Morpheus Boards Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request sherlock Issues/PRs related to Sherlock workflows and components
Projects
Status: Done
2 participants