Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abbr Extension: Definition Sorting and Glossary storage #1467

Merged
merged 7 commits into from
Jun 11, 2024

Conversation

nbanyan
Copy link
Contributor

@nbanyan nbanyan commented Jun 3, 2024

Several changes/features here. All input welcome on each.

  • Sorting: A simple sort by length to process longer abbreviations first solves all instances of smaller abbreviations preventing longer ones from being used.
  • glossary: An option to include a Markdown file with abbreviations to apply to every page.
  • use_last_abbr: An option to only keep the first instance of an abbreviation (useful when using other methods to append glossaries to pages).

The glossary ended up being the more complex of the three and involved replicating some of the code/process from core.py.

list by length before processing the element tree

This ensures that multi-word abbreviations are
implemented even if an abbreviation exists for
one of those component words.

To test this,
added `test_abbr_superset_vs_subset()` to
`tests.test_syntax.extensions.test_abbr`.
handling for empty abbreviations.

- Configuration option `use_last_abbr` allows using
the first instance of a term rather than the last.
This is useful for sites that auto-append a glossary
to pages.
- If an abbreviation does not have a definition it
will skip the term instead of writing abbr tags with
an empty title.
The glossary file also uses the Markdown
abbreviation syntax (`AbbrBlockprocessor` is
used to process the file) and keeps the glossary
definitions separate from the page definitions,
allowing the glossary to be applied to every page
while only being processed once.
@waylan
Copy link
Member

waylan commented Jun 3, 2024

So far I've only taken a cursory look at your changes. Here are my initial thoughts.

  • Sorting: A simple sort by length to process longer abbreviations first solves all instances of smaller abbreviations preventing longer ones from being used.

This is good. I should have caught the issue this fixes with the recent refactor. Thank you.

  • glossary: An option to include a Markdown file with abbreviations to apply to every page.

I'm not sure I am comfortable with this. In our discussion in #1465 I suggested allowing a user to pass in a Python object (such as a dict), not importing a file of Markdown content. I'm inclined to reject this on the basis that extensions already exist which allow Markdown content to be imported.

  • use_last_abbr: An option to only keep the first instance of an abbreviation (useful when using other methods to append glossaries to pages).

As previously explained, I don't see any value in this. You still need to ensure your definitions are in a certain order. That order is now reversed which is counterintuitive (IMO) and works the opposite of every other reference-type syntax (reference links, footnotes, etc) in Markdown. I realize this is an option which if off by default, but with there being no value to changing the setting, then why even offer it? That said, if you can provide a concrete example which demonstrates its value and changes my mind, then I would suggest renaming it use_first_abbr with the default being False. That name more clearly aligns with the description of the option.

to accept a dictionary instead of a file.

Changed `load_glossary` to merge in a dictionary,
overwriting any existing records.

Added a `reset_glossary` method to clear the glossary.

Removed the `use_last_abbr` option, since `glossary`
can be used to resolve the use case `use_last_abbr`
was written for.
@nbanyan nbanyan changed the title Abbr Extension: Sorting, Glossary, and order precedence. Abbr Extension: Definition Sorting and Glossary storage Jun 5, 2024
@nbanyan
Copy link
Contributor Author

nbanyan commented Jun 5, 2024

Discarded use_last_abbr and changed glossary to accept and manage dictionary objects directly.

docs/changelog.md Outdated Show resolved Hide resolved
docs/extensions/abbreviations.md Outdated Show resolved Hide resolved
markdown/extensions/abbr.py Outdated Show resolved Hide resolved
markdown/extensions/abbr.py Outdated Show resolved Hide resolved
markdown/extensions/abbr.py Show resolved Hide resolved
markdown/extensions/abbr.py Outdated Show resolved Hide resolved
tests/test_syntax/extensions/test_abbr.py Show resolved Hide resolved
Completed removal of artifacts from removing
`use_last_abbr` and updated documentation.
Also added testing to avoid having an abbreviation
where `{"": "Definition"}`.
@waylan waylan merged commit 33359fa into Python-Markdown:master Jun 11, 2024
15 checks passed
@waylan
Copy link
Member

waylan commented Jun 11, 2024

@nbanyan thanks for the work on this. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants