TypeScript definitions #5

mindplay-dk · 2021-09-28T07:10:53Z

I noticed this comment in the TypeScript definitions:

tiny-html-lexer/lib/index.d.ts.bak

Lines 1 to 3 in ec1be86

    
           // TODO (FIXME): Update typescript annotations 
        
           // (Internals changed, now using generator function) 
        
           // Postponing till I have decided on a nice API.

I think the API is very nice, but I don't know what else you might have had in mind.

You already tagged a release-candidate, so maybe you just forgot? 🙂

I can try to adjust them to match the current API and submit a PR, if you'd like?

alwinb · 2021-09-28T08:59:13Z

It was based on a PR from @dalyIsaac. But as I don't use TypeScript myself, I found that keeping it in sync with the API was a bit cumbersome.

There may well be some more slight changes, I think there's no need to supply an external named-character-reference decoder now, and the built in one only handles a very small number of them only, see Limitations.

Note that this was mostly a research project to get to know the lexical grammar much better (for other projects), and an exercise in minimalism.

Depending on your use case, you may want to check how fast it is for large strings. It might be fine, but I have not checked how well the use of regular expressions here scales. Just a caution.

mindplay-dk · 2021-10-06T07:58:37Z

Yeah, I love minimalism, that's what drew me in here - this is good work! 🙂👍

(I had to move to htmlparser2 though - it's big, but had what I need. Don't prioritize further work on these issues for my sake.)

alwinb · 2021-10-06T08:54:00Z

Thank you! You can let me know what things would have helped? I probably won't have time to work on it, but I'd love to know. I always find feedback really helpful. (And in this case it can also help me with my other projects)

mindplay-dk · 2021-10-08T08:45:01Z

I opened issues for the things that would have helped already. 🙂

One area where htmlparser2 does not spark joy is with it's non-standard DOM - I don't need a full DOM by any means, but it would have helped if the subset of DOM objects/properties/methods they supported had followed the DOM standard. I've had to write adapters to use them in mocks i unit-tests. That's something you could probably do better than them, I guess. 🙂

alwinb · 2021-10-08T08:57:36Z

Thanks :)

it would have helped if the subset of DOM objects/properties/methods they supported had followed the DOM standard

Yeah, I'll keep that in mind.
I can note that I use 'name' over 'tagName' in StartTag and EndTag objects here, so there is that.

Tree construction for HTML is a very complex thing (I'm working on finding a neat declarative description for it…), so even if you'd not need a full DOM, doing something like that yourself based on a library like this might have been too much.

mindplay-dk · 2021-10-11T08:33:14Z

doing something like that yourself based on a library like this might have been too much.

I don't know about that? Something like undom is very little code and works quite well. I wouldn't expect a DOM layer in a minimalist library like this one to have much more in terms of features than that. (Come to think of it, if this was something you wanted to pursue, writing an adapter from your token model to undom might be low hanging fruit...)

alwinb · 2021-10-11T09:10:24Z

Yeah, implementing a minimal DOM data structure is not so much an issue. But building a DOM tree from a 'flat' stream of start-tags, end-tags and text-nodes is, because there are are a lot rules for ignoring tags and/or for adding implicit ones. For example, <table><td>one<tr><td>two</table> 'has' implicit <tr>, </tr> and </td> tags and should produce a table with two rows of one cell each without problems. And that is almost the easiest example :)

mindplay-dk · 2021-10-11T15:56:34Z

Ah, right - the whole error tolerance thing.

Just for fun, I looked at the parser spec and kind of wished I hadn't. 😂

(I think I could write a program to convert all those rules into code faster than I could implement them by hand, haha)

alwinb · 2021-10-11T17:01:27Z

Yep…. 😅

I also wish I didn’t, but after saying “no” decidedly a couple of times… it turned into this html-parser project. Not ready for use yet, though!

The parse5 project is really good, it is definitely worth reading the source. I’m just trying an other algorithm, hoping to rephrase the rules into a sort of coherent schema.

It is not even just error correction, valid HTML has a lot of such rules still.

We’ll see where it goes :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeScript definitions #5

TypeScript definitions #5

mindplay-dk commented Sep 28, 2021

alwinb commented Sep 28, 2021

mindplay-dk commented Oct 6, 2021

alwinb commented Oct 6, 2021 •

edited

Loading

mindplay-dk commented Oct 8, 2021

alwinb commented Oct 8, 2021

mindplay-dk commented Oct 11, 2021 •

edited

Loading

alwinb commented Oct 11, 2021 •

edited

Loading

mindplay-dk commented Oct 11, 2021

alwinb commented Oct 11, 2021 •

edited

Loading

TypeScript definitions #5

TypeScript definitions #5

Comments

mindplay-dk commented Sep 28, 2021

alwinb commented Sep 28, 2021

mindplay-dk commented Oct 6, 2021

alwinb commented Oct 6, 2021 • edited Loading

mindplay-dk commented Oct 8, 2021

alwinb commented Oct 8, 2021

mindplay-dk commented Oct 11, 2021 • edited Loading

alwinb commented Oct 11, 2021 • edited Loading

mindplay-dk commented Oct 11, 2021

alwinb commented Oct 11, 2021 • edited Loading

alwinb commented Oct 6, 2021 •

edited

Loading

mindplay-dk commented Oct 11, 2021 •

edited

Loading

alwinb commented Oct 11, 2021 •

edited

Loading

alwinb commented Oct 11, 2021 •

edited

Loading