Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeScript definitions #5

Open
mindplay-dk opened this issue Sep 28, 2021 · 9 comments
Open

TypeScript definitions #5

mindplay-dk opened this issue Sep 28, 2021 · 9 comments

Comments

@mindplay-dk
Copy link

I noticed this comment in the TypeScript definitions:

// TODO (FIXME): Update typescript annotations
// (Internals changed, now using generator function)
// Postponing till I have decided on a nice API.

I think the API is very nice, but I don't know what else you might have had in mind.

You already tagged a release-candidate, so maybe you just forgot? 🙂

I can try to adjust them to match the current API and submit a PR, if you'd like?

@alwinb
Copy link
Owner

alwinb commented Sep 28, 2021

It was based on a PR from @dalyIsaac. But as I don't use TypeScript myself, I found that keeping it in sync with the API was a bit cumbersome.

There may well be some more slight changes, I think there's no need to supply an external named-character-reference decoder now, and the built in one only handles a very small number of them only, see Limitations.

Note that this was mostly a research project to get to know the lexical grammar much better (for other projects), and an exercise in minimalism.

Depending on your use case, you may want to check how fast it is for large strings. It might be fine, but I have not checked how well the use of regular expressions here scales. Just a caution.

@mindplay-dk
Copy link
Author

Yeah, I love minimalism, that's what drew me in here - this is good work! 🙂👍

(I had to move to htmlparser2 though - it's big, but had what I need. Don't prioritize further work on these issues for my sake.)

@alwinb
Copy link
Owner

alwinb commented Oct 6, 2021

Thank you! You can let me know what things would have helped? I probably won't have time to work on it, but I'd love to know. I always find feedback really helpful. (And in this case it can also help me with my other projects)

@mindplay-dk
Copy link
Author

I opened issues for the things that would have helped already. 🙂

One area where htmlparser2 does not spark joy is with it's non-standard DOM - I don't need a full DOM by any means, but it would have helped if the subset of DOM objects/properties/methods they supported had followed the DOM standard. I've had to write adapters to use them in mocks i unit-tests. That's something you could probably do better than them, I guess. 🙂

@alwinb
Copy link
Owner

alwinb commented Oct 8, 2021

Thanks :)

it would have helped if the subset of DOM objects/properties/methods they supported had followed the DOM standard

Yeah, I'll keep that in mind.
I can note that I use 'name' over 'tagName' in StartTag and EndTag objects here, so there is that.

Tree construction for HTML is a very complex thing (I'm working on finding a neat declarative description for it…), so even if you'd not need a full DOM, doing something like that yourself based on a library like this might have been too much.

@mindplay-dk
Copy link
Author

mindplay-dk commented Oct 11, 2021

doing something like that yourself based on a library like this might have been too much.

I don't know about that? Something like undom is very little code and works quite well. I wouldn't expect a DOM layer in a minimalist library like this one to have much more in terms of features than that. (Come to think of it, if this was something you wanted to pursue, writing an adapter from your token model to undom might be low hanging fruit...)

@alwinb
Copy link
Owner

alwinb commented Oct 11, 2021

Yeah, implementing a minimal DOM data structure is not so much an issue. But building a DOM tree from a 'flat' stream of start-tags, end-tags and text-nodes is, because there are are a lot rules for ignoring tags and/or for adding implicit ones. For example, <table><td>one<tr><td>two</table> 'has' implicit <tr>, </tr> and </td> tags and should produce a table with two rows of one cell each without problems. And that is almost the easiest example :)

@mindplay-dk
Copy link
Author

Ah, right - the whole error tolerance thing.

Just for fun, I looked at the parser spec and kind of wished I hadn't. 😂

(I think I could write a program to convert all those rules into code faster than I could implement them by hand, haha)

@alwinb
Copy link
Owner

alwinb commented Oct 11, 2021

Yep…. 😅

I also wish I didn’t, but after saying “no” decidedly a couple of times… it turned into this html-parser project. Not ready for use yet, though!

The parse5 project is really good, it is definitely worth reading the source. I’m just trying an other algorithm, hoping to rephrase the rules into a sort of coherent schema.

It is not even just error correction, valid HTML has a lot of such rules still.

We’ll see where it goes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants