Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: improve parsing speed and code style #419

Closed
wants to merge 1 commit into from
Closed

url: improve parsing speed and code style #419

wants to merge 1 commit into from

Conversation

ysmood
Copy link

@ysmood ysmood commented Jan 14, 2015

All the reg patterns should be defined outside the functions. Minor changes were made.

@mscdex
Copy link
Contributor

mscdex commented Jan 14, 2015

There's still a /\./ on line 235. Maybe it could just be replaced with the string '.' since the regexp was being used in split().

@bnoordhuis
Copy link
Member

I have a hard time believing that actually makes things faster because V8 normally inlines regular expressions at the call site. Can you run the benchmarks and post numbers? The relevant ones are in benchmark/url. There aren't that many currently, maybe you can add a few while you're there.

@ysmood
Copy link
Author

ysmood commented Jan 15, 2015

@bnoordhuis http://jsperf.com/regex-inside-vs-outside
I use the regex userAtHostPattern in the url, you can run it yourself. Before that I also thought V8 should do it for me.

@sonewman
Copy link
Contributor

actually this is probably more realistic: http://jsperf.com/regex-inside-vs-outside/2

So it is faster, but not significantly, i mean if you compare it to the spliceOne trick or var self = this vs fn.bind() which are 100% faster. We could be hunting rats here... 🐀

@ysmood
Copy link
Author

ysmood commented Jan 15, 2015

@mscdex The missed one is fixed: 54c2a9f17873f69dc1906909e3013b0ec0ae24ba

@sonewman
Copy link
Contributor

The other thing is, that by defining them up front, it means that all programs have these regexp compiled when they might not even require them. I mean if you are programming a robot, you might not need to parse many URL's in the process.

@sonewman
Copy link
Contributor

Bare in mind, i am playing devil's advocate here 😉

@ysmood
Copy link
Author

ysmood commented Jan 15, 2015

@sonewman For a running server that may run months, the startup speed trade off is not that important, but this url.parse may be called thousands of time in a single second on a http server. It should be as fast as possible.

@sonewman
Copy link
Contributor

yes this is very true, I have similar considerations myself as it may also need to be called when using http.get(). I'm not saying it shouldn't be merged. Like I said I am playing devils advocate 😈

@ysmood
Copy link
Author

ysmood commented Jan 15, 2015

Beside, this definition convention of regex is used in many other files in io.js. I'm just trying to keep the code style more uniform.

@sonewman
Copy link
Contributor

It's also interesting that Safari is WAY faster!
How about this amendment: http://jsperf.com/regex-inside-vs-outside/2

@sonewman
Copy link
Contributor

Well it is definitely quicker and i'm not going to deny that I the same upfront definitions of regexps in my own code.

But there is no harm in the challenge, it just enforces that we are doing things for the right reason and not for the sake of it.

I guess I was just alluding that there are probably bigger 🐠 to fry.
Anyway LGTM 👍

@mscdex
Copy link
Contributor

mscdex commented Jan 15, 2015

Additionally moving the url formatting code into a separate function which accepts an object (avoiding .call() in urlFormat()) gives a slight speed increase:

With node v0.10.33 + url.js from iojs:

url#oldFormat x 4,501,453 ops/sec ±0.19% (199 runs sampled)
url#newFormat x 4,622,917 ops/sec ±0.11% (198 runs sampled)
Fastest is url#newFormat

@sonewman
Copy link
Contributor

@mscdex That would definitely be worth doing, thats over 100,000 ops/sec more, in comparison to the 14,000 with just the previous code change.

@mscdex
Copy link
Contributor

mscdex commented Jan 15, 2015

It also looks like avoiding the use of the util.* convenience type checking functions gives more:

url#oldFormat x 4,506,078 ops/sec ±0.08% (198 runs sampled)
url#newFormat x 4,714,076 ops/sec ±0.11% (197 runs sampled)
Fastest is url#newFormat

And just for kicks, removing the instanceof Url check and checking on a custom property instead gives:

url#oldFormat x 4,501,381 ops/sec ±0.11% (198 runs sampled)
url#newFormat x 4,759,322 ops/sec ±0.13% (198 runs sampled)
Fastest is url#newFormat

@sonewman
Copy link
Contributor

I often wondered about those methods... it's likely they can't be inlined as they are in a separate file.

@mscdex
Copy link
Contributor

mscdex commented Jan 15, 2015

I just tested my url.format() changes with iojs 1.0.1 (before I was using node v0.10.33 + url.js from iojs 1.0.1) and the speed increase is more significant:

url#oldFormat x 4,218,590 ops/sec ±0.15% (197 runs sampled)
url#newFormat x 6,817,775 ops/sec ±0.13% (191 runs sampled)
Fastest is url#newFormat

@sonewman
Copy link
Contributor

yeah that is pretty insane

@rvagg
Copy link
Member

rvagg commented Jan 27, 2015

this hasn't got any traction amongst collaborators yet so unless someone steps up and wants to sponsor this change by taking responsibility for it I'm inclined to close it.

Now all reg patterns are cached outside functions.
@Fishrock123
Copy link
Contributor

This patch appears marginally faster than the current url.parse, @mscdex's patch is even a bit faster.

https://gist.github.com/Fishrock123/e9e8664ccc0ba3e1a323

cc @bnoordhuis?

(looks like mscdex's format is a lot faster than ours, but that is for another PR)

@bnoordhuis
Copy link
Member

@Fishrock123 Are the results repeatable? The url benchmarks unfortunately (but intrinsically) generate a lot of garbage. One run may produce significantly better numbers than the next one, depending on factors like memory pressure, number of running processes, etc.

@ysmood
Copy link
Author

ysmood commented Jan 27, 2015

At least this PR make the code style more unified, and apparently has no side effect, so why not merge it?

@Fishrock123
Copy link
Contributor

Seems like a style change. If our policy is otherwise, it can be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants