Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add typescript definition #667

Merged
merged 2 commits into from
Jan 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions index.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/**
* Decides whether or not the document is reader-able without parsing the whole thing.
* @return {boolean} Whether or not we suspect Readability.parse() will suceeed at returning an article object.
*/
export function isProbablyReaderable(
document: Document,
options?: {
/** The minimum node content length used to decide if the document is readerable. */
minContentLength?: number;
/** The minumum cumulated 'score' used to determine if the document is readerable. */
minScore?: number;
/** The function used to determine if a node is visible. */
visibilityChecker?: (node: Node) => boolean;
}
): boolean;

export class Readability<T = string> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused - what other T is supported here? Why not just not have this template-like construction and use string literally below?

Copy link
Contributor Author

@mathieudutour mathieudutour Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "literally below". This makes sure that the type of content will be T (which is the return type of serializer and defaults to string if there is no serializer) (see #667 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hm. That's... interesting. @danburzo is there a particular reason we shouldn't require it to be a string (even if it can be a non-string right now)? Feels odd to have something called "serializer" that then does other transforms and produces something other than a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason we shouldn't require it to be a string (even if it can be a non-string right now)?

We introduced serializer to support other types of DOM serialization (e.g. XHTML/XML for EPUB, #605). In my particular case, where Readability is part of a larger processing pipeline, I found it's actually more efficient to skip an entire serialization/parse pair altogether:

const R = new Readability(dom.window.document, {
        // ...
	/*
		Change Readability's serialization to return 
		a DOM element (instead of a HTML string) 
		as the `.content` property returned from `.parse()`

		This makes it easier for us to run subsequent
		transformations (sanitization, hyphenation, etc.)
		without having to parse/serialize the HTML repeatedly.
	 */
	serializer: el => el
});

In my previous comment, I had suggested the change in the TypeScript definition for the sake of accuracy, but if this detail of the serializer option obscures the class as a whole, I think we can safely pretend it always returns a string here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Sorry for not doing a good job at reading before. No-op seems more in keeping (like saying somehow "don't serialize this at all"). We could make it return Node | string, assuming the typescript definition file allows for that? Then it won't require parametrization/templatization of the class. We should probably clarify this in the jsdoc and/or docs though.

Copy link
Contributor Author

@mathieudutour mathieudutour Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's a bit hard to have the return type of parse depends on the return type of serializer without having it in a template. IMO it's more important to have an exact return type for parse rather than for serializer.

The template is transparent for the user, they won't ever need to use it so I don't think it's obscuring the class too much?

But I agree it should be clarified in the docs (none of the options are documented at the moment)

constructor(
document: Document,
options?: {
debug?: boolean;
maxElemsToParse?: number;
nbTopCandidates?: number;
charThreshold?: number;
classesToPreserve?: string[];
keepClasses?: boolean;
serializer?: (node: Node) => T;
disableJSONLD?: boolean;
}
);

parse(): null | {
/** article title */
title: string;
/** author metadata */
byline: string;
/** content direction */
dir: string;
/** HTML of processed article content */
content: T;
/** text content of the article (all HTML removed) */
textContent: string;
/** length of an article, in characters */
length: number;
/** article description, or short excerpt from the content */
excerpt: string;
siteName: string;
};
}
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"version": "0.4.0",
"description": "A standalone version of the readability library used for Firefox Reader View.",
"main": "index.js",
"types": "index.d.ts",
"scripts": {
"lint": "eslint .",
"test": "mocha test/test-*.js",
Expand Down