Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for <lastmod>, <priority>, and <changefreq> fields in XMLSitemap #195

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jaric
Copy link

@jaric jaric commented May 20, 2024

  • Implemented validation for : Ensured the date follows the W3C date format (YYYY-MM-DD) or the full W3C datetime format (YYYY-MM-DDThh:mm:ss±hh:mm or YYYY-MM-DDThh:mm:ssZ). Added a regex check to validate the date format.

  • Implemented validation for : Restricted the values to the allowed set: {"always", "hourly", "daily", "weekly", "monthly", "yearly", "never"}. Added a function to check the validity of the changefreq value.

  • Implemented validation for : Ensured the priority is a float value between 0.0 and 1.0. Added a function to validate the priority value.

  • Updated add_url method:

    • Added checks for the validity of the lastmod, changefreq, and priority parameters.
    • If the values are invalid, they are not included in the sitemap entry, and a warning is logged.
  • Added regex patterns and validation functions:

    • W3C_DATE_REGEX: Matches the date format YYYY-MM-DD.
    • W3C_DATETIME_REGEX: Matches the full datetime format YYYY-MM-DDThh:mm:ss±hh:mm or YYYY-MM-DDThh:mm:ssZ.
    • is_valid_date: Validates whether a given date string matches the W3C date or datetime format.
    • is_valid_changefreq: Checks if changefreq is one of the allowed values.
    • is_valid_priority: Checks if priority is a float between 0.0 and 1.0.
  • Logging:

    • Added logging warnings for invalid lastmod, changefreq, and priority values when they are encountered in the add_url method.

These changes ensure that only correctly formatted values are included in the sitemap, enhancing the robustness and compliance of the generated XML sitemaps with the standard protocols.

…elds in XMLSitemap**

- **Implemented validation for <lastmod>**: Ensured the date follows the W3C date format (YYYY-MM-DD) or the full W3C datetime format (YYYY-MM-DDThh:mm:ss±hh:mm or YYYY-MM-DDThh:mm:ssZ). Added a regex check to validate the date format.

- **Implemented validation for <changefreq>**: Restricted the values to the allowed set: {"always", "hourly", "daily", "weekly", "monthly", "yearly", "never"}. Added a function to check the validity of the `changefreq` value.

- **Implemented validation for <priority>**: Ensured the priority is a float value between 0.0 and 1.0. Added a function to validate the `priority` value.

- **Updated `add_url` method**:
  - Added checks for the validity of the `lastmod`, `changefreq`, and `priority` parameters.
  - If the values are invalid, they are not included in the sitemap entry, and a warning is logged.

- **Added regex patterns and validation functions**:
  - `W3C_DATE_REGEX`: Matches the date format YYYY-MM-DD.
  - `W3C_DATETIME_REGEX`: Matches the full datetime format YYYY-MM-DDThh:mm:ss±hh:mm or YYYY-MM-DDThh:mm:ssZ.
  - `is_valid_date`: Validates whether a given date string matches the W3C date or datetime format.
  - `is_valid_changefreq`: Checks if `changefreq` is one of the allowed values.
  - `is_valid_priority`: Checks if `priority` is a float between 0.0 and 1.0.

- **Logging**:
  - Added logging warnings for invalid `lastmod`, `changefreq`, and `priority` values when they are encountered in the `add_url` method.

These changes ensure that only correctly formatted values are included in the sitemap, enhancing the robustness and compliance of the generated XML sitemaps with the standard protocols.
@macbre macbre self-assigned this Aug 16, 2024
@macbre
Copy link
Contributor

macbre commented Aug 16, 2024

@jaric - thanks for you PR and sorry for such a later reply :-(

Can you please add some comments for the new helpers? pylint complains that:

************* Module xml_sitemap_writer
xml_sitemap_writer.py:67:0: C0301: Line too long (129/100) (line-too-long)
xml_sitemap_writer.py:17:0: C0116: Missing function or method docstring (missing-function-docstring)
xml_sitemap_writer.py:20:0: C0116: Missing function or method docstring (missing-function-docstring)
xml_sitemap_writer.py:23:0: C0116: Missing function or method docstring (missing-function-docstring)
xml_sitemap_writer.py:67:4: C0116: Missing function or method docstring (missing-function-docstring)

Also can you reformat the code (with the black tool) and add some code coverage for your changes? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants