Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle parsing issues in METs packages #549

Open
cristianvasquez opened this issue Oct 14, 2024 · 1 comment · May be fixed by #556
Open

Handle parsing issues in METs packages #549

cristianvasquez opened this issue Oct 14, 2024 · 1 comment · May be fixed by #556
Labels
bug Something isn't working

Comments

@cristianvasquez
Copy link

Some METs packages have been reported to fail parsing due to issues with their contents. The causes identified are:

  1. Character encoding issues

image

 org.xml.sax.SAXParseException; lineNumber: 24; columnNumber: 48; The entity name must immediately follow the '&' in the entity reference.
  1. It is not allowed to have HTML markup in the title text

image

@cristianvasquez
Copy link
Author

Apparently this is to escape the contents in the XML jinja template through operators:

https://tedboy.github.io/jinja2/templ10.html

For instance,

<cdm:work_title xml:lang="{{ lang }}">{{ work.title[lang] }}</cdm:work_title>

becomes

        <cdm:work_title xml:lang="{{ lang }}">{{ work.title[lang]| e }}</cdm:work_title>
     

@cristianvasquez cristianvasquez linked a pull request Oct 24, 2024 that will close this issue
@rousso rousso added the bug Something isn't working label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants