Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds node.js logic for networking tasks for PDF.js #8712

Merged
merged 3 commits into from
Aug 24, 2017

Conversation

mukulmishra18
Copy link
Contributor

This PR implements IPDFStream interface in node.js environment. fs.createReadStream is used to create ReadableStream, that can be used to stream PDF data incrementally.

};

function PDFNetworkStreamRangeReader(path, begin, end) {
this._rangeRequest = fs.createReadStream(path, { begin, end, });
Copy link
Contributor

@yurydelendik yurydelendik Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change begin=>start fs.createReadStream(path, { start, end, });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with new commit.

let fs = require('fs');
let assert = require('assert');

function createPromiseCapabilities() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't duplicate this functionality, just import it from src/shared/util.js instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few more changes:

  • fs.lstat can be used to get length -- it is async so you can use it as a part of headersReady
  • rename PDFNetworkStream to PDFNodeStream (and class syntax can be used here)
  • at "src/pdf.js", import setPDFNetworkStream and depending on isNodeJS, use it to set PDFNetworkStream or PDFNodeStream.

},

cancel(reason) {
this._fullRequest.cancel(reason);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this._rangeRequeste.destroy(reason);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
PDFNetworkStreamRangeReader.prototype = {
get isStreamingSupported() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

},

cancel(reason) {
this._fullRequest.close(reason);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this._fullRequest.destroy(reason);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

},

get isStreamingSupported() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

},

get isRangeSupported() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

PDFNetworkStreamFullReader.prototype = {

get contentLength() {

Copy link
Contributor

@yurydelendik yurydelendik Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when fs.Stats are received, we will know stream length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mukulmishra18 mukulmishra18 force-pushed the node_stream branch 2 times, most recently from 67aaf16 to 712de53 Compare July 28, 2017 20:04
if (this._validateRangeRequestCapabilities()) {
// Setting right content length.
this._contentLength =
parseInt(this.getResponseHeader('Content-Length'), 10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't just assigning a value, without doing any validation, mean that if the Content-Length header isn't present then this._contentLength will be NaN which probably isn't very good (compare with the old code which uses a !isInt(...) check)?

The following might not matter, but asking anyway since I'm not sure:
With this patch, you're always setting this._contentLength whereas before that was only done when range requests were supported; does this matter at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In refactoring patches we want to preserve initial logic as much as possible, so I recommend to return multiple values/object from validateRangeRequestCapabilities, e.g. {allowRangeRequests: ..., suggestedLength: ... }

Copy link
Contributor Author

@mukulmishra18 mukulmishra18 Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this._contentLength will be NaN which probably isn't very good (compare with the old code which uses a !isInt(...) check)?

Yeah, you are right. Sorry I didn't noticed that. I will add check for isInt. Thanks.

you're always setting this._contentLength whereas before that was only done when range requests were supported

This commit is not intended to change any functionality of the network.js file(just to extract the logic for range request support). I think setting this._contentLength inside if (validateRangeRequestCapability) {... would be better. Thanks for pointing out.

return multiple values/object from validateRangeRequestCapabilities, e.g. {allowRangeRequests: ..., suggestedLength: ... }

Okay, we'll do this way. Thanks.

@mukulmishra18 mukulmishra18 force-pushed the node_stream branch 2 times, most recently from c30ca67 to de29337 Compare August 3, 2017 19:49
Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a cursory review, and found some issues.

Some observations:

  • It seems that you have based the logic on src/display/transport_stream.js. This is not necessarily a bad thing, but I wonder whether this indeed makes most sense in Node.js.

  • The implementation does not try to minimize resource usage. At the very least, once a request has finished and the response has yielded, it is probably sufficient to discard the underlying handle to the data (e.g. the IncomingMessage).

  • There are no unit tests that demonstrates the new functionality, let alone that the functionality works as expected. This could have caught some of the bugs that I mentioned below. Please add them. If you need inspiration: Currently we have one Node.js-specific test, namely https://github.com/mozilla/pdf.js/blob/5b5781b45d234666241bf3354c0d390315c31d1a/test/unit/display_svg_spec.js.

this.source = options.source;
this.url = url.parse(this.source.url);
this.isHttp = this.url.protocol === 'http' ||
this.url.protocol === 'https';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the protocol property of a parsed URL ends with a colon, so replace 'http' with 'http:' (similarly for 'https').

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for pointing out.

method: 'GET',
headers: stream.httpHeaders,
};
this.request = http.request(options, (response) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to check whether this._url.protocol is https:, and if so, use the https.request, and http.request otherwise. Similarly for other places where you use http.request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the advice.

method: 'GET',
headers: stream.httpHeaders,
};
this.request = http.request(options, (response) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.request should be private, this._request at the very least.

And also see my previous comment about http.request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

}

cancel(reason) {
this._rangeRequest.cancel(reason);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_rangeRequest should not be a property of the base class, because the base class does not need to know about the implementation detail, so you should remove the use of _rangeRequest here. Looking at the implementation, you should probably not directly expose _rangeRequest, but possibly store a reference to the .read method, maybe as _read.
Besides, the _rangeRequest object is not the same in the subclasses:

The argument of the callback to http.request is an IncomingMessage, which does not have a cancel method - https://nodejs.org/api/http.html#http_class_http_incomingmessage

In PDFNodeStreamFsRangeReader, the type is a ReadableStream, which does not have a cancel method either.

Copy link
Contributor Author

@mukulmishra18 mukulmishra18 Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http.request is an IncomingMessage, which does not have a cancel method

Sorry for typo. This should be this._rangeRequest.close(reason)

but possibly store a reference to the .read method, maybe as _read.

Sorry, but I really don't understand your concern here. Should we have to remove cancel from BaseRangeReader class, and also not call this._rangeReder.read() directly in read() method ?
Can you please explain it a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but possibly store a reference to the .read method, maybe as _read.

As far as I understand, do you mean to store reference of .read method of this._rangeRequest in _read property of PDFNodeStreamRangeReader(same for PDFNodeStreamFsRangeReader), somthing like: this._read = this._rangeRequest.read, and then use this._read inside read() of base class?

this._loaded = 0;
this._readCapability = createPromiseCapability();

this._rangeRequest = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed, see my note below about _rangeRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -593,8 +569,6 @@ PDFNetworkStreamRangeRequestReader.prototype = {
},
};

setPDFNetworkStreamClass(PDFNetworkStream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In src/pdf.js, we are trying to detect the environment(using isNodeJS), and then setting PDFNodeStream or PDFNetworkStream, see one of the TODO here: #8712 (review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in non-Node.js, the stream class is no longer set. That's not good.

It is safe to repeatedly call setPDFNetworkStreamClass, so restore this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webpack can package node_stream and network in one file and initialize these in some random order. so it's better to call setPDFNetworkStreamClass from api.js and remove this line from the network.js

},

_onHeadersReceived:
function PDFNetworkStreamFullRequestReader_onHeadersReceived() {
let { allowRangeRequests, suggestedLength, } =
validateRangeRequestCapabilities({
getResponseHeader: this.getResponseHeader.bind(this),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't bind() -- create a function here e.g. getResponseHeader: (name) => { return fullRequestXhr.getResponseHeader(name); },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.


cancel(reason) {
this._fullRequest.close(reason);
this._fullRequest.destroy(reason);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we need two of these. can we use either close() or destroy()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this._fullRequest.close(reason);. Thanks.

}

get isStreamingSupported() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have it set to true here, if it's enabled in options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

this._isStreamingSupported = !stream.source.disableStream;
this._isRangeSupported = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have the above options applied for Fs readers, can we move them into base reader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

this._isRangeSupported = true;
}
this._length = suggestedLength;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this entire function body into the handleResponse -- notice that this._headersCapability.resolve() still has to be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better than before. I am still expecting some unit tests to ascertain that the introduced logic works as expected (see my previous review for more details).

@@ -593,8 +569,6 @@ PDFNetworkStreamRangeRequestReader.prototype = {
},
};

setPDFNetworkStreamClass(PDFNetworkStream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in non-Node.js, the stream class is no longer set. That's not good.

It is safe to repeatedly call setPDFNetworkStreamClass, so restore this line.

this._length = stream.source.length;
this._loaded = 0;

this._fullRequest = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a bit confusing, because you use the same name elsewhere in the file.
Furthermore, _fullRequest is not an accurate description of the value.

How about renaming _fullRequest of the *FullReader classes to _readableStream? This would also make the code more understandable for those who are already familiar with Node.js streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming _fullRequest of the *FullReader classes to _readableStream?

Sounds like a good idea. Do you think we should also rename _rangeRequest with _readableStream(?) in *RangeReader?

this.url = url.parse(this.source.url);
this.isHttp = this.url.protocol === 'http:' ||
this.url.protocol === 'https:';
this.isFsUrl = !this.url.host;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is isFsUrl supposed to say whether a URL refers to the filesystem?
If so, this is not the right way to say so. The file: scheme is commonly used to refer to local files, and it can have a host part too - see e.g. https://en.wikipedia.org/wiki/File_URI_scheme

What are the expected values for url that you intend to support here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the expected values for url that you intend to support here?

I am checking here if url refers to the filesystem or not, so that we can choose between PDFNodeStream***Reader or PDFNodeStreamFs***Reader.

I am trying to parse URL using url.parse, and then checking for host. In most of cases I found that urls are of the form ../....path/to/some/pdf.pdf or if url is absolute home/userName/path/to/some/pdf.pdf or file:///home/user/path/to/some/pdf.pdf-- (and all these cases works fine with above check). The only case that I found failing is file://localhost/path/to/some/pdf.pdf, and I think this can be fix by checking for protocol === file:.

Is it sounds reasonable to you, or you have something else in your mind? If not, can you suggest me something concrete?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canonical URL of a local file is a file:-URL, so I suggest to check whether the scheme is file:.

@mukulmishra18
Copy link
Contributor Author

mukulmishra18 commented Aug 8, 2017

I am trying to add some tests for node_stream logic[1]. While writing test I find out that following #8712 (comment), breaks rangeReader.read() operation, so we are not getting anything while reading here. To fix this we have to shift read() in **RangeReaders class from BaseRangeReader class, and call this._rangeReader.read() instead.

Although running node_stream_spec separately I am getting all output as expected, but I am facing some problems, mainly here.

  • When running gulp unittest, I am getting __non_webpack_require__ is not defined.
  • When running gulp unittestcli, I am getting different value for pdf2 as excepted(Getting: home/user/pdf.js/build/lib/test/pdfs/tracemonkey.pdf, expected: home/user/pdf.js/test/pdfs/tracemonkey.pdf).

[1]: Note that I am following network_spec file to write tests, as node_stream is just adding IPDFStream for node.js environment.

@@ -14,10 +14,11 @@
*/

import {
assert, createPromiseCapability, globalScope, isInt, MissingPDFException,
assert, createPromiseCapability, globalScope, MissingPDFException,
UnexpectedResponseException
} from '../shared/util';
import { setPDFNetworkStreamClass } from './api';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove setPDFNetworkStreamClass import from this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some confusion at #8712 (comment). Done with new commit, Thanks.

@mukulmishra18 mukulmishra18 force-pushed the node_stream branch 2 times, most recently from c4bad64 to d3537c0 Compare August 14, 2017 19:06
return Promise.reject(this._reason);
}

let chunk = this._rangeRequest.read();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplication of code. The two implementations of read are identical.

What stops you from putting read in the base class?
If -for some reason- you cannot use this._rangeRequest.read() in the base class, then you can create a method in the subclass (e.g. _read() { return this._rangeRequest.read(); } and call this._read(); from the base class instead.

And also, I thought that you had replaced "_rangeRequest" with "_readableStream", but I can not find that change (any more?).

Copy link
Contributor Author

@mukulmishra18 mukulmishra18 Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What stops you from putting read in the base class?

Sorry, but not sure why when I am doing something like #8712 (comment), I am unable to read from the stream, see #8712 (comment).

Also _read() { return this._rangeReques.read(); } creates the same issue.

And also, I thought that you had replaced "_rangeRequest" with "_readableStream", but I can not find that change (any more?).

Done with new commit. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context is missing from the comment that you've linked. I think that the breakage is unrelated to the method being in the base class. You now literally have two (actually, after looking again THREE, BaseFullReader.prototype.read is almost identical, except with the "length" property in onProgress) identical methods read in the sub classes.

The inheritance tree looks like this:

  • BaseRangeReader
  • PDFNodeStreamRangeReader extends BaseRangeReader
  • PDFNodeStreamFsRangeReader extends BaseRangeReader

Your current code duplicates the read method in the latter two classes (the implementation is byte for byte identical). The runtime behavior is identical regardless of where you put the implementation of the "read" method, so you should put the implementation in the base class (BaseRangeReader), to improve the maintainability of the code.

@Rob--W
Copy link
Member

Rob--W commented Aug 15, 2017

Although running node_stream_spec separately I am getting all output as expected, but I am facing some problems, mainly here.

  • When running gulp unittest, I am getting __non_webpack_require__ is not defined.

It seems that we are not passing the right babelOptions to the transpiler when running tests in the browser.

But even if we do, then the error should still be "require is not defined" or something like "require("fs") is not a module", because it does not make any sense to run the Node.js-only test in the browser.

At minimum, skip the test when you detect that the environment is not Node, e.g. as done here:

if (!isNodeJS()) {
pending('zlib.deflateSync is not supported in non-Node environments.');
}

@Rob--W
Copy link
Member

Rob--W commented Aug 15, 2017

I've submitted a patch for the __non_webpack_require__ issue at #8781. Note that an error is still expected, the true solution is to not load the module in non-Node.js environments.

@mukulmishra18
Copy link
Contributor Author

At minimum, skip the test when you detect that the environment is not Node.

Okay, sure. Tests for node_stream is now removed from gulp unittest and only covered in gulp unittestcli.


this._readableStream = null;
let rangeStr = start + '-' + (end - 1);
stream.httpHeaders['Range'] = 'bytes=' + rangeStr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create new object for httpHeaders (before 'Range' is changed) -- we don't want to pollute original httpHeaders object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks,


this._readableStream = null;
let rangeStr = start + '-' + (end - 1);
stream.httpHeaders['Range'] = 'bytes=' + rangeStr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use bytes=${start}-${end - 1} format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

path: this._url.path,
method: 'GET',
headers: stream.httpHeaders,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move options creating into separate function (we have similar code above) and add auth and port field, also probably agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created createRequestOptions function that returns options object. Also tried to add a simple http.Agent, please check if it looks good to you. Thanks.

this._errored = true;
this._reason = reason;
this._readCapability.resolve();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move all .on('readable'/'end'/'error', into e.g. private _setReadableStream into base class -- it can be use by Fs constructor as well.

The same thing needs to be done for "full" readers too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit.

Note: I have moved this._readableStream property in BaseRangeReader from concrete classes and it works just fine for me.

}
return Promise.resolve({ value: chunk, done: false, });
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with moving read() and cancal() into base class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

constructor(stream, start, end) {
super(stream);

this._readableStream = fs.createReadStream(this._url.path, { start, end, });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, use this._setReadableStream(fs.createReadStream(this._url.path));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import { isInt } from '../shared/util';

function validateRangeRequestCapabilities({ getResponseHeader, isHttp,
rangeChunkSize, disableRange, }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add assert(rangeChunkSize > 0); here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

this._rangeChunkSize = stream.source.rangeChunkSize;
if (!this._rangeChunkSize && !this._disableRange) {
this._disableRange = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this_disableRange and this._rangeChunkSize logic into base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.


let { allowRangeRequests, suggestedLength, } =
validateRangeRequestCapabilities({
getResponseHeader: this.getResponseHeader.bind(this),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove getResponseHeader function below and inline the response.headers access here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

keepAlive: true,
maxSockets: 10,
};
let agent = url.protocol === 'http:' ? new http.Agent(agentOptions) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too complicated, let's remove agent for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

cancel(reason) {
this._readableStream.destroy(reason);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and at the range base reader, we need to think "what if cancel is called before _readableStream is set". I recommend to extract on('error' body into _error(reason) method and call it when this._readableStream is not set. in _setReadableStream we can check if this._errored = true we can quickly destroy it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

@yurydelendik
Copy link
Contributor

yurydelendik commented Aug 21, 2017

I see UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Bad end offset: 65537 when running getinfo.js example (with getDocument(pdfPath) change).

@yurydelendik
Copy link
Contributor

If I pass HTTP URL is the getinfo.js I get the following error:

Error: Cannot find module '/Users/yury/Work/pdf.js/examples/node/http:/mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf'

@mukulmishra18
Copy link
Contributor Author

mukulmishra18 commented Aug 21, 2017

Error: Cannot find module '/Users/yury/Work/pdf.js/examples/node/http:/mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf'

Strange! seems like we are getting wrong url in node_stream.js. Although on running getinfo.js locally, I find out that I am getting some initial chunk of data, but not all. I am doing something like:

var pdfjsLib = require('../../build/lib/pdf.js');

pdfjs.getDocument('http://mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf').
  then((doc) => {...});

Although, earlier running the above version of getinfo.js gives me correct output, seems like using read() and this._readableStream in BaseRangeReader causing this error.

@mukulmishra18 mukulmishra18 force-pushed the node_stream branch 2 times, most recently from 2250094 to 3493f14 Compare August 23, 2017 08:47
@mukulmishra18
Copy link
Contributor Author

Okay, I find out the problem. The headers received by http.requestin node.js environment is in lower case(also mentioned here: https://nodejs.org/api/http.html#http_message_headers), but not in browser(when requesting using XHR or fetch) Example: Accept-Ranges in XHR or fetch whereas accept-ranges in node.js.

This create problem here: mukulmishra18@3493f14#diff-35bcb37ef640f9436dd67f5077c08b55R271 -- (returns undefined when trying to access Accept-Ranges). Changing the requested header to lower case solves the problem.

getinfo.js example is now working fine for me on my local system. @yurydelendik can you please check, if it works for you?


describe('node_stream', function() {
let pdf1 =
'http://mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in an external network request, and slow down the unit test if network is slow, or even cause the unit test to fail if network is somehow unavailable.

Please use a local file server in Node for testing. Don't forget to close the server regardless of test failure. To do so you can use http.createServer in a beforeAllhook (and start the server on a random free port using.listen(0), and call .close()on the server in anafterAllhook. The return value of.listenhas anaddressmethod that contains the random port in theportproperty. After starting the server, you can use something likehttp://127.0.0.1:${port}/tracemonkey.pdf`.

And for the browser... Why does this test have a special case !isNode() if the API that you are testing is only supposed to work on Node? Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script is not added to jasmine-bootstrap.js -- let's also add assert(isNode()); just in case somebody will decide to add it.

Copy link
Contributor Author

@mukulmishra18 mukulmishra18 Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this test have a special case !isNode() if the API that you are testing is only supposed to work on Node?

Sorry, removed with new commit, as we don't that any more.

Please use a local file server in Node for testing

Added with new commit. Can you please check if it looks good to you?

let's also add assert(isNode()); just in case somebody will decide to add it.

Done. Thanks.

let path = __non_webpack_require__('path');
let url = __non_webpack_require__('url');
pdf2 =
url.parse(path.join(process.cwd(), './test/pdfs/tracemonkey.pdf')).href;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, pdf2 looks like /path/to/pdf.js/test/pdfs/tracemonkey.pdf.
I was expecting a file:// prefix, because otherwise it would not be a URL. Does the test really pass? Regardless, can you make sure that the url really looks like a URL and not a path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting a file:// prefix, because otherwise it would not be a URL.

Then I think adding file:// with process.cwd() will help, something like : url.parse(path.join('file://' + process.cwd(), './test/pdfs/tracemonkey.pdf'))

super(stream);

this._setReadableStream(
fs.createReadStream(this._url.path, { start, end, }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have (node:90529) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Bad end offset: 65537. Node.js operates with end being inclusive. Make it end: end - 1,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

let range12Reader = stream1.getRangeReader(pdfLength - tailSize, pdfLength);

let range21Reader = stream2.getRangeReader(pdfLength - tailSize - rangeSize,
pdfLength - tailSize - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be pdfLength - tailSize without - 1 based on our API definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, with Rob's requested changes and off-by-one fix in the fs range reader

@yurydelendik
Copy link
Contributor

let's remove fs.readFileSync() from the getinfo.js example to show that we implemented this functionality and it works

import { assert, isNodeJS } from '../../src/shared/util';
import { PDFNodeStream } from '../../src/display/node_stream';

// Make sure that we are not running this script is Node.js environment.
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Hasn't this comment accidentally been inverted, since I'm assuming that it should be e.g. Make sure that we are only running this script in Node.js environments.?

Copy link
Contributor Author

@mukulmishra18 mukulmishra18 Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops..., Thanks for pointing out.

@yurydelendik
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c7f9509b9d69410/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://54.67.70.0:8877/2ee68f404042106/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/2ee68f404042106/output.txt

Total script time: 16.50 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/c7f9509b9d69410/output.txt

Total script time: 29.72 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@yurydelendik yurydelendik merged commit e82811a into mozilla:master Aug 24, 2017
@yurydelendik
Copy link
Contributor

Thank you for the patch.

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
Adds node.js logic for networking tasks for PDF.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants