Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add typescript definition #667

Merged
merged 2 commits into from
Jan 5, 2021
Merged

Add typescript definition #667

merged 2 commits into from
Jan 5, 2021

Conversation

mathieudutour
Copy link
Contributor

@mathieudutour mathieudutour commented Jan 1, 2021

This PR adds the type definitions of the library.

index.d.ts Outdated
charThreshold?: number;
classesToPreserve?: string[];
keepClasses?: boolean;
serializer?: (node: Node) => string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serializer option is used exclusively to obtain the content property in the parse() call. It's not necessary for it to produce a string. Whatever you return from the serializer function will be reflected in content, but I'm not sure how to model that in TypeScript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks. 44a9c32 should do it

@gijsk
Copy link
Contributor

gijsk commented Jan 1, 2021

Thanks for the PR. Can you clarify what this does? That is, what is the point of the PR? You left the initial comment here empty, I don't know very much about typescript, and this repository doesn't use it.

It seems you're making some kind of type description of the API surface, but it also seems like we're duplicating comments/descriptions of all those API arguments etc. - so my main concerns would be, can we avoid that duplication, and how do we make sure that this stays up-to-date if/when we make API changes.

@mathieudutour
Copy link
Contributor Author

This PR adds the type definitions of the library. It is indeed a description of the API surface that's used by typescript.

The source code doesn't have to use TS to expose types (as this PR shows) but in order to avoid duplication, it would have to be build with it (or use another build step to somehow generate the types from the comments).

Copy link
Contributor

@gijsk gijsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like typescript supports generating API docs from JSDoc comments - https://www.typescriptlang.org/docs/handbook/declaration-files/dts-from-js.html . Could that be used here? I also see https://www.npmjs.com/package/tsd-jsdoc and I don't know if that's different or not...

I understand that getting automation to work reliably is perhaps more work than just writing this definition file once, but ISTM we will have a better chance keeping this up to date if there are straightforward steps to update it if/when the API surface changes -- and improved JSDoc comments are probably useful for other things in addition to typescript support.

OTOH, if the automation support doesn't work well or at all for the readability setup, I'm happy to just merge this without adding support for updating it. But I'd like to understand the difficulty with that better before taking on having to learn typescript definition formats and "manually" updating this every time.

}
): boolean;

export class Readability<T = string> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused - what other T is supported here? Why not just not have this template-like construction and use string literally below?

Copy link
Contributor Author

@mathieudutour mathieudutour Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "literally below". This makes sure that the type of content will be T (which is the return type of serializer and defaults to string if there is no serializer) (see #667 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hm. That's... interesting. @danburzo is there a particular reason we shouldn't require it to be a string (even if it can be a non-string right now)? Feels odd to have something called "serializer" that then does other transforms and produces something other than a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason we shouldn't require it to be a string (even if it can be a non-string right now)?

We introduced serializer to support other types of DOM serialization (e.g. XHTML/XML for EPUB, #605). In my particular case, where Readability is part of a larger processing pipeline, I found it's actually more efficient to skip an entire serialization/parse pair altogether:

const R = new Readability(dom.window.document, {
        // ...
	/*
		Change Readability's serialization to return 
		a DOM element (instead of a HTML string) 
		as the `.content` property returned from `.parse()`

		This makes it easier for us to run subsequent
		transformations (sanitization, hyphenation, etc.)
		without having to parse/serialize the HTML repeatedly.
	 */
	serializer: el => el
});

In my previous comment, I had suggested the change in the TypeScript definition for the sake of accuracy, but if this detail of the serializer option obscures the class as a whole, I think we can safely pretend it always returns a string here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Sorry for not doing a good job at reading before. No-op seems more in keeping (like saying somehow "don't serialize this at all"). We could make it return Node | string, assuming the typescript definition file allows for that? Then it won't require parametrization/templatization of the class. We should probably clarify this in the jsdoc and/or docs though.

Copy link
Contributor Author

@mathieudutour mathieudutour Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's a bit hard to have the return type of parse depends on the return type of serializer without having it in a template. IMO it's more important to have an exact return type for parse rather than for serializer.

The template is transparent for the user, they won't ever need to use it so I don't think it's obscuring the class too much?

But I agree it should be clarified in the docs (none of the options are documented at the moment)

@mathieudutour
Copy link
Contributor Author

So I had a go at the generation of the types from the JS Doc and it requires a lot of changes to the source code (only cosmetic of course) and the template for the serializer isn't working (because of microsoft/TypeScript#29401).

I push it here: aec77ec. The types are generating automatically before each release (before running npm run release) and can be manually generated with npm run generate-typescript-definition.

Let me know what you prefer.

@gijsk
Copy link
Contributor

gijsk commented Jan 5, 2021

So I had a go at the generation of the types from the JS Doc and it requires a lot of changes to the source code (only cosmetic of course) and the template for the serializer isn't working (because of microsoft/TypeScript#29401).

Oof. Yeah, that's unfortunate. Let's go with this version for now then, and we can file a follow-up issue to consider if we can do this in an automated fashion. I'd want to avoid all the additional @private annotations (feels like there should be an opt-in way specifying what methods are (supposed to be) exposed, instead of having to opt-out nearly the entire class...), and ideally all the whitespace/cosmetic changes. Perhaps switching to class Readability instead of the Readability.prototype.foo bit would go better, wrt to that part? Even so, I don't really understand why typescript's generator requires the use of individual property assignments instead of being able to parse N.prototype = { ... }, but that's not a battle we should fight here.

Apologies for all the back and forth on this.

I'm going to accept this PR as-is, but I'd like to understand what to put in the CHANGELOG file and how this will affect versioning. For typescript users, is this an API-surface-breaking change, or not? Specifically, I guess that if you're a typescript user of this library, and now the library comes with a type specification, and your use does not match the type specification, that'd break typescript compilation, right? If my understanding is correct, per semver I guess this requires a minor (rather than patch) rev to the version because we're pre-1.0. OTOH, in theory, the new definition should be compatible with "correct" existing usages (but it's hard to imagine all of those usages, and perhaps people use the not-intentionally-exposed methods on the class from typescript, which will break after this PR is merged, maybe) ? @mathieudutour do you have thoughts on this?

@gijsk gijsk merged commit e1c7905 into mozilla:master Jan 5, 2021
@mathieudutour
Copy link
Contributor Author

Perhaps switching to class Readability instead of the Readability.prototype.foo bit would go better, wrt to that part?

It would for sure, but would drop support for some old version of Nodejs and some browsers.

I don't really understand why typescript's generator requires the use of individual property assignments

Me neither, I got a really cryptic error without them ¯_(ツ)_/¯

I guess that if you're a typescript user of this library, and now the library comes with a type specification, and your use does not match the type specification, that'd break typescript compilation, right?

You're right but I tend to think that TS users would have added local types to cover the library (that's what we did) and that use case won't break (the local types will take precedence). I usually release types under a patch version, especially pre 1.0, but I don't have a strong opinion

gijsk added a commit that referenced this pull request Jan 5, 2021
@gijsk
Copy link
Contributor

gijsk commented Jan 13, 2021

(I ended up settling on a patch release as suggested by Mathieu, so this is now in 0.4.1)

@mathieudutour
Copy link
Contributor Author

Awesome, could you publish the version on npm? Thanks

@gijsk
Copy link
Contributor

gijsk commented Jan 22, 2021

? Publish @mozilla/readability to npm? Yes
? Commit (Release 0.4.1)? Yes
? Tag (0.4.1)? Yes
? Push? Yes
🔗 https://www.npmjs.com/package/@mozilla/readability
🏁 Done (in 23s.)

🤔

(it's not actually there, you're absolutely right - but I didn't notice.)

@gijsk
Copy link
Contributor

gijsk commented Jan 22, 2021

I re-ran npm publish manually from the 0.4.1 tag, and it looks like that did the trick. Apologies for missing that.

@mathieudutour
Copy link
Contributor Author

No worries, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants