Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a eagerTextCapture option. #124

Closed
wants to merge 1 commit into from
Closed

Adds a eagerTextCapture option. #124

wants to merge 1 commit into from

Conversation

jails
Copy link

@jails jails commented Apr 5, 2015

It's an alternative to #123.
Changes:

  • 'ontext' will still the only event
  • 'eagerTextCapture' will control whether the 'ontext' should be triggered as soon as some text is available or need to wait for the whole text sequence.
  • 'onparserinit' can return the options to be used for initializing the Tokenizer. this way it'll be possible to customize the options right from the handler.
  • removes 'onparserinit' call inside parser's reset(). The handler has already been initialized with the parser, since the parser is the same after a reset the handler doesn't need to be initialized twice. 'onreset' is enough for doing some cleanup imo.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.44%) to 96.09% when pulling 7c4a84b on jails:eager-text-capture-option into 9e770fc on fb55:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.43%) to 96.08% when pulling 25d75e7 on jails:eager-text-capture-option into 9e770fc on fb55:master.

@fb55
Copy link
Owner

fb55 commented Apr 6, 2015

Wow, thanks a lot for your effort. This could (with some minor changes) definitely be merged, if the next step for this project wouldn't be a switch to high5 as a tokenizer (see #114) and the behavior would be reverted at that point. Have a look at the high5 branch for the current state of that switch.

In case you would be interested in porting this to high5, here is some feedback for this patch:

  • Not emitting onparserinit on resets would break the API, which I'd like to avoid, especially as there is no real reason for it.
  • The addition of an additional argument is fine, although I'm not sure that options should ever be modified by a handler. What use-case do you have in mind?
  • IMHO there is no need for an additional option. Especially when switching to high5, the behavior of ontext changes anyway (which will be signified by a major release).

@jails
Copy link
Author

jails commented Apr 6, 2015

Thanks for the head up !
The reasoning behind 'onparserinit' was the following. Right now for building a "dom object representation" we need:

  • To initialize the parser with some options.
  • Provide some handlers with compatible options passed to the parser (for example xmlMode or eagerTextCapture change the data send to the handlers).

So I thought it would be interesting to let the 'onparserinit' handler to setup the options the handlers are compatible with. This way it wouldn't be necessary to add another layer like the following:

function DomBuilder() {
  this._parser = new Parser(this, {/* the required options for handlers */});
}
util.inherits(DomBuilder, Parser);
DomBuilder.prototype.ontext = function(text){ // handler logic };
DomBuilder.prototype.onxxx = function(){ // handler logic };
// etc.

var builder = new DomBuilder();
builder.parserComplete(/* some html */);

So to keep the reasoning consistent, if 'onparserinit' has an influence on how the parser needs to be instantiated, calling 'onparserinit' at the end of the parsing should renew the parser to take the options changes into account which was not the case.
Anyhow I probably overthought it. I'll simply leave the 'onparserinit' behavior unchanged since it's not really required.

Otherwise I will definilty have a look at high5, but I didn't understand what you mean by "IMHO there is no need for an additional option". By addionnal option you mean 'eagerTextCapture' ? Because I'm not sure get how to control the ontext() behavior to either "spit out" text as soon as some is available or waiting for the whole text sequence otherwise ?

@fb55
Copy link
Owner

fb55 commented Apr 6, 2015

eagerTextCapture is what I meant. Text should simply be concatenated by default, with no way of opting out. (high5 already has some internal properties used for this in other contexts.)

From a user perspective, having the handler modify options is opaque and unexpected. I'd like to avoid that.

@jails
Copy link
Author

jails commented Apr 6, 2015

Agreed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants