Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

id property is always removed when using <h2> tag #27

Closed
4 tasks done
ben519 opened this issue Sep 11, 2023 · 9 comments
Closed
4 tasks done

id property is always removed when using <h2> tag #27

ben519 opened this issue Sep 11, 2023 · 9 comments
Labels
💪 phase/solved Post is done

Comments

@ben519
Copy link

ben519 commented Sep 11, 2023

Initial checklist

Affected packages and versions

5.0.0

Link to runnable example

No response

Steps to reproduce

This is a weird one as <h1>, <h3>, <h4>, ... tags all work. It's just <h2> that has a problem.

In the below code, I'm attempting to sanitize the html string <h2 id="foo">Hello, world!</h2>. I want the id to be retained (or at least sanitized into user-content-foo). However, it is removed entirely.

import deepmerge from "deepmerge"
import { defaultSchema } from "hast-util-sanitize"
import rehypeParse from "rehype-parse"
import rehypeSanitize from "rehype-sanitize"
import rehypeStringify from "rehype-stringify"
import { unified } from "unified"

const schema = deepmerge(defaultSchema, { attributes: { "*": ["id"] } })

const file = await unified()
  .use(rehypeParse)
  .use(rehypeSanitize, schema)
  .use(rehypeStringify)
  .process('<h2 id="foo">Hello, world!</h2>')

console.log(String(file))  // <h2>Hello, world!</h2>

Expected behavior

<h2 id="user-content-foo">Hello, world!</h2>
(The id property should not be removed.)

Actual behavior

<h2>Hello, world!</h2>
(The id property is removed.)

Affected runtime and version

node v20.5.1

Affected package manager and version

npm 10.0.0

Affected OS and version

mac os 13.5.2 (22G91)

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Sep 11, 2023
@wooorm
Copy link
Member

wooorm commented Sep 11, 2023

@ben519
Copy link
Author

ben519 commented Sep 11, 2023

Oh, interesting. So, <h2>s can only have the id "footnote-label"? Seems oddly restrictive. Can you propose a fix or a workaround for me? I need to allow any id. Thanks!

@wooorm
Copy link
Member

wooorm commented Sep 11, 2023

You can change the h2 stuff in your schema?

The default schema is that way because that's the default on github.

@ben519
Copy link
Author

ben519 commented Sep 11, 2023

Oh I see what you're saying. Something like this

const schema = deepmerge(defaultSchema, { attributes: { "*": ["id"] } })
schema["attributes"]["h2"] = ["id", "className"]

@wooorm
Copy link
Member

wooorm commented Sep 16, 2023

@tommyvn responding here from the other issue, where you said:

Am I reading that correctly, that h2 is restricted to a specific id because github uses it in this way? that's wild.
anyway, this answers my question, sounds like it's a feature (for GH at least) not a bug, so i'm closing this one, I'll follow the advice on that issue to use my own schema.
thanks for the link :)

The scheme, the default handling, is the feature indeed!
I do think this should be solved though. But not 100% on how yet?
It probably makes sense that if a h2 only allows certain IDs, and a * allows all IDs, that practically on h2 also all IDs are allowed. (right?1)
But it all has to be secure, and I might be missing a case where that wouldn’t be the case?

So: there’s a nice workaround as @ben519 shows above, though it probably should be fixed, so if someone wants to do the work and spend some time on it, that’d be appreciated!

@ben519
Copy link
Author

ben519 commented Sep 17, 2023

I agree. It's extra confusing that <h2> doesn't work but <h1>, <h3>, ... do.

I'm not sure what the optimal solution is, but I think it'd be great to call out https://github.com/syntax-tree/hast-util-sanitize/blob/main/lib/schema.js in the docs (README). Would've saved me a lot of headaches debugging.

@wooorm
Copy link
Member

wooorm commented Sep 20, 2023

I’d appreciate a PR linking it directly in the docs!
Probably here https://github.com/syntax-tree/hast-util-sanitize?tab=readme-ov-file#defaultschema

@ben519
Copy link
Author

ben519 commented Sep 20, 2023

Will do, but give me a few days to get to it.

@wooorm wooorm closed this as completed in 63c34b6 Oct 26, 2023
@github-actions

This comment has been minimized.

@wooorm wooorm added the 💪 phase/solved Post is done label Oct 26, 2023
@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

No branches or pull requests

2 participants