Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to register third-party extensions #399

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions core.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,10 @@ declare namespace core {
```
*/
function stream(readableStream: ReadableStream): Promise<core.ReadableStreamWithFileType>

type Detection = (tokenizer: ITokenizer, fileType?: core.FileTypeResult) => Promise<core.FileTypeResult | undefined>;

function addDetection(detection: Detection): void;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the same docs as the readme.

}

export = core;
Expand Down
22 changes: 19 additions & 3 deletions core.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ const supported = require('./supported');

const minimumBytes = 4100; // A fair amount of file-types are detectable within this range

const detections = [_detectFileType];

async function fromStream(stream) {
const tokenizer = await strtok3.fromStream(stream);
try {
Expand Down Expand Up @@ -66,15 +68,28 @@ async function _checkSequence(sequence, tokenizer, ignoreBytes) {

async function fromTokenizer(tokenizer) {
try {
return _fromTokenizer(tokenizer);
return detectFileType(tokenizer);
} catch (error) {
if (!(error instanceof strtok3.EndOfStreamError)) {
throw error;
}
}
}

async function _fromTokenizer(tokenizer) {
async function addDetection(detection) {
detections.push(detection);
}

async function detectFileType(tokenizer) {
let fileType;
for (const detection of detections) {
fileType = await detection(tokenizer, fileType);
}

return fileType;
}

async function _detectFileType(tokenizer) {
let buffer = Buffer.alloc(minimumBytes);
const bytesRead = 12;
const check = (header, options) => _check(buffer, header, options);
Expand Down Expand Up @@ -1400,7 +1415,8 @@ const fileType = {
fromStream,
fromTokenizer,
fromBuffer,
stream
stream,
addDetection
};

Object.defineProperty(fileType, 'extensions', {
Expand Down
4 changes: 3 additions & 1 deletion index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,7 @@ export {
fromTokenizer,
extensions,
mimeTypes,
stream
stream,
Detection,
addDetection
} from './core';
10 changes: 10 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,16 @@ Returns a set of supported file extensions.

Returns a set of supported MIME types.

### FileType.addDetection(detection)

Register a detection function. This function will be called after `file-type` and other detections complete.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other detections?

Copy link

@jonyyz jonyyz Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proprietary software file extensions that are not well known. The use case would be a file that could be standard MIME type or proprietary/used-defined MIME types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even plain text files that the users will be willing to support. Say, there's not enough confidence to be able to detect an svg or HTML to include it in the core, but the users could accept that and provide their own detection which will be enough for their use case.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the whole point of this PR. That's not what I'm asking. I'm saying that "other detections" is not clear enough in the docs.

Copy link

@jonyyz jonyyz Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "other" is probably intended to leave it open for extension. Examples could be provided but are probably not going to be exhaustive. Perhaps what you're asking for is some examples? That was the point of my comment and others.


The `detection(tokenizer, fileType)` method has two arguments:
- [tokenizer](https://github.com/Borewit/strtok3#tokenizer)
- `fileType`, which is `undefined` or the return value of a detected file type.

If the detector returns `undefined`, the `tokenizer.position` should be 0. That allows other detectors to parse the file.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot file-type reset the position automatically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because underlying source can be a stream. A stream can only be read once.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should be made clear that the tokenizer.position should be 0 should not be done if it's a stream.


## Supported file types

- [`jpg`](https://en.wikipedia.org/wiki/JPEG)
Expand Down