Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to attributes table parser needed as of December 2024 #6

Open
golightlyb opened this issue Dec 16, 2024 · 0 comments
Open

Updates to attributes table parser needed as of December 2024 #6

golightlyb opened this issue Dec 16, 2024 · 0 comments
Labels
good first issue Good for newcomers

Comments

@golightlyb
Copy link
Member

Here are a few issues with parsing the current spec that have been highlighted by @abhillman's work.

These issues need to be addressed for the updated machine-readable spec to be fully useful.

Current output

These are the generated json files where the current parser generates incorrect output at times.

Issues

Outdated workaround

Remove the workaround in parse.py line 91

Update global attributes

List of global attributes needs updating in parse.py line 34.

This can be done manually for now, but it would be nice to be able to parse this automatically in future.

handling "the empty string" as an attribute keyword

When parsing attributes, in keyword lists such as "true"; "false"; the empty string, the text "the empty string" is causing the list of keywords to not match the regular expression. Instead, it should be recognised, and the empty string should be emitted as a value_keywords entry of "".

This leads to suboptimal output, for example in attributes.json line 614:

    "hidden":
    {
        "desc": "Whether the element is relevant",
        "elements": ["HTML"],
        "value_keywords": [],
        "value_type": "\"until-found\"; \"hidden\"; the empty string"
    },

should read instead

    "hidden":
    {
        "desc": "Whether the element is relevant",
        "elements": ["HTML"],
        "value_keywords": ["", "until-found", "hidden"],
        "value_type": "Keywords"
    },

Parenthesis in attribute elements parsed correctly

See WHATWG Attributes

In attributes.json, the attribute element list for height requires parsing the HTML text canvas; embed; iframe; img; input; object; source (in picture); video.

Currently it is parsing like so:

    "height":
    {
        "desc": "Vertical dimension",
        "elements":
        [
            "(in",
            "canvas",
            "embed",
            "iframe",
            "img",
            "input",
            "object",
            "video"
        ],
        "value_keywords": [],
        "value_type": "Valid non-negative integer"
    },

The elements array should instead read, with "(in" removed and "source" added:

        "elements":
        [
            "canvas",
            "embed",
            "iframe",
            "img",
            "input",
            "object",
            "source",
            "video"
        ],

value_type should probably also have ".The actual rules are more complicated than indicated" appended.

Attribute keyword list chokes on trailing semicolon

attributes.json line1108 fails to record correct keywords for the popover attribute due to a trailing semicolon, which should be ignored.

It currently reads

    "popover":
    {
        "desc": "Makes the element a popover element",
        "elements": ["HTML"],
        "value_keywords": [],
        "value_type": "\"auto\"; \"manual\";"
    },

But should read

    "popover":
    {
        "desc": "Makes the element a popover element",
        "elements": ["HTML"],
        "value_keywords": ["auto", "manual"],
        "value_type": "Keywords"
    },

Intellectual property notice updates

COPYING.txt (which is copied into the JSON) should be updated - in particular there is a new version of the W3C document license that needs linking to. This should also be updated in COPYING.md.

@golightlyb golightlyb added the good first issue Good for newcomers label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant