Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle different body fields #100

Merged
merged 20 commits into from
Mar 27, 2024
Merged

fix: handle different body fields #100

merged 20 commits into from
Mar 27, 2024

Conversation

shouya
Copy link
Owner

@shouya shouya commented Mar 26, 2024

This PR clarifies the concept for "body" used in code and config.

Fixes #95 and #96.

Motivation

Previously, I name a generic field in the code "description" to distinguish it from the title. For rss format it refers to the description field and for atom it refers to the content field. The choice of the name and the selected fields are purely arbitrary based on the few example feeds I had in hand. Overall, it is supposed be the field that ultimately get displayed in rss feeder beneath the title.

In this PR I renamed the general term to "body". Unlike the old notion, a post can have multiple body fields. We need this if we want to handle all types of different fields that considered as body in the RSS reader. For example, if we consider all the body fields, then we can correctly filter posts matching certain keyword using the keep_only and discard filter (#95).

In addition, some feeds do not use the typical body fields. On example is YouTube, who puts the video description in the media:description field under the media:group tag (#92). And we hope to support filtering on this field as well.

Implementation

First, I removed the single-field accessor for Post.description field.

Then I provided various APIs for accessing the bodies:

  • Post.bodies_mut
  • Post.bodies
  • Post.modify_bodies
  • Post.first_body
  • Post.first_body_mut
  • Post.create_body
  • Post.ensure_body

The following fields are considered as body fields:

  • rss
    • content
    • description
    • media:description
    • itunes:summary
  • atom
    • content
    • summary
    • media:description

Config changes

  • Rename the content variant to body of the field field for keep_only/discard filter.
  • Rename the description_selector field to body_selector for the extract filter.

Both changes are backward compatible. The old fields are currently marked deprecated, and may be removed in a future breaking release.

Checklist

  • update filter docs
  • review all usage of the term "description" in code

@shouya shouya merged commit 2cf74d3 into master Mar 27, 2024
2 checks passed
@shouya shouya deleted the improve-post-body-handling branch March 27, 2024 14:39
shouya added a commit that referenced this pull request Apr 5, 2024
Previously, the "Rendered" view was parsed from the raw XML feed on the
front-end. The logic on determining which field to show as posts' bodies
is arbitrary. Since #100, I changed the logic to recognize more post
body types. However, such change was not reflected on the Web UI. This
PR updates the Web UI to show the expected body content.

Besides, I made some visual tweaks to the Rendered view to show more
information including post's publication date, feed's description, etc.

In addition, The status bar will now show the number of posts on
successful fetches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Help is needed as remove_regex is not working as expected
1 participant