-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add typescript definition #667
Conversation
index.d.ts
Outdated
charThreshold?: number; | ||
classesToPreserve?: string[]; | ||
keepClasses?: boolean; | ||
serializer?: (node: Node) => string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serializer
option is used exclusively to obtain the content
property in the parse()
call. It's not necessary for it to produce a string. Whatever you return from the serializer function will be reflected in content
, but I'm not sure how to model that in TypeScript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, thanks. 44a9c32 should do it
Thanks for the PR. Can you clarify what this does? That is, what is the point of the PR? You left the initial comment here empty, I don't know very much about typescript, and this repository doesn't use it. It seems you're making some kind of type description of the API surface, but it also seems like we're duplicating comments/descriptions of all those API arguments etc. - so my main concerns would be, can we avoid that duplication, and how do we make sure that this stays up-to-date if/when we make API changes. |
This PR adds the type definitions of the library. It is indeed a description of the API surface that's used by typescript. The source code doesn't have to use TS to expose types (as this PR shows) but in order to avoid duplication, it would have to be build with it (or use another build step to somehow generate the types from the comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like typescript supports generating API docs from JSDoc comments - https://www.typescriptlang.org/docs/handbook/declaration-files/dts-from-js.html . Could that be used here? I also see https://www.npmjs.com/package/tsd-jsdoc and I don't know if that's different or not...
I understand that getting automation to work reliably is perhaps more work than just writing this definition file once, but ISTM we will have a better chance keeping this up to date if there are straightforward steps to update it if/when the API surface changes -- and improved JSDoc comments are probably useful for other things in addition to typescript support.
OTOH, if the automation support doesn't work well or at all for the readability setup, I'm happy to just merge this without adding support for updating it. But I'd like to understand the difficulty with that better before taking on having to learn typescript definition formats and "manually" updating this every time.
} | ||
): boolean; | ||
|
||
export class Readability<T = string> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused - what other T is supported here? Why not just not have this template-like construction and use string
literally below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "literally below". This makes sure that the type of content
will be T
(which is the return type of serializer
and defaults to string
if there is no serializer) (see #667 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hm. That's... interesting. @danburzo is there a particular reason we shouldn't require it to be a string (even if it can be a non-string right now)? Feels odd to have something called "serializer" that then does other transforms and produces something other than a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a particular reason we shouldn't require it to be a string (even if it can be a non-string right now)?
We introduced serializer
to support other types of DOM serialization (e.g. XHTML/XML for EPUB, #605). In my particular case, where Readability is part of a larger processing pipeline, I found it's actually more efficient to skip an entire serialization/parse pair altogether:
const R = new Readability(dom.window.document, {
// ...
/*
Change Readability's serialization to return
a DOM element (instead of a HTML string)
as the `.content` property returned from `.parse()`
This makes it easier for us to run subsequent
transformations (sanitization, hyphenation, etc.)
without having to parse/serialize the HTML repeatedly.
*/
serializer: el => el
});
In my previous comment, I had suggested the change in the TypeScript definition for the sake of accuracy, but if this detail of the serializer
option obscures the class as a whole, I think we can safely pretend it always returns a string here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Sorry for not doing a good job at reading before. No-op seems more in keeping (like saying somehow "don't serialize this at all"). We could make it return Node | string
, assuming the typescript definition file allows for that? Then it won't require parametrization/templatization of the class. We should probably clarify this in the jsdoc and/or docs though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's a bit hard to have the return type of parse
depends on the return type of serializer
without having it in a template. IMO it's more important to have an exact return type for parse
rather than for serializer
.
The template is transparent for the user, they won't ever need to use it so I don't think it's obscuring the class too much?
But I agree it should be clarified in the docs (none of the options are documented at the moment)
So I had a go at the generation of the types from the JS Doc and it requires a lot of changes to the source code (only cosmetic of course) and the template for the I push it here: aec77ec. The types are generating automatically before each release (before running Let me know what you prefer. |
Oof. Yeah, that's unfortunate. Let's go with this version for now then, and we can file a follow-up issue to consider if we can do this in an automated fashion. I'd want to avoid all the additional Apologies for all the back and forth on this. I'm going to accept this PR as-is, but I'd like to understand what to put in the |
It would for sure, but would drop support for some old version of Nodejs and some browsers.
Me neither, I got a really cryptic error without them ¯_(ツ)_/¯
You're right but I tend to think that TS users would have added local types to cover the library (that's what we did) and that use case won't break (the local types will take precedence). I usually release types under a patch version, especially pre 1.0, but I don't have a strong opinion |
(I ended up settling on a patch release as suggested by Mathieu, so this is now in 0.4.1) |
Awesome, could you publish the version on npm? Thanks |
🤔 (it's not actually there, you're absolutely right - but I didn't notice.) |
I re-ran |
No worries, thanks a lot! |
This PR adds the type definitions of the library.