Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache successful paths internally #18

Merged
merged 3 commits into from
Dec 4, 2013
Merged

Cache successful paths internally #18

merged 3 commits into from
Dec 4, 2013

Conversation

gjtorikian
Copy link
Owner

Fixes #17.

Heh, so I tried this out on help.github.com, a largeish site. Dropped from 5m12s to 4m34s. Going to learn myself some Ruby tooling and try to trim this down a bit more.

Don’t recheck already approved links
@parkr
Copy link
Contributor

parkr commented Dec 4, 2013

Hm, what about caching invalid paths? Then any call is cached and doesn't need to be run twice.

@gjtorikian
Copy link
Owner Author

Hm, what about caching invalid paths?

I thought about that, but actually all the paths in help.github.com succeed already (the tests are green). I'll definitely add it, but it isn't the main slowndown in this case.

@parkr
Copy link
Contributor

parkr commented Dec 4, 2013

Ah. But I guess it could help those of us who don't have perfect links 😉

@gjtorikian
Copy link
Owner Author

Can't seem to make this faster. All the slowdown is in the Ethon code. 😿

gjtorikian added a commit that referenced this pull request Dec 4, 2013
Cache successful paths internally
@gjtorikian gjtorikian merged commit c652254 into master Dec 4, 2013
@gjtorikian gjtorikian deleted the cache-matched-urls branch December 4, 2013 05:05
@parkr
Copy link
Contributor

parkr commented Dec 4, 2013

What's the slowdown in the Ethon code? Are there a particular set of commands that could be sped up? /cc @i0rek

@hanshasselberg
Copy link

@parkr thanks for letting me know.

I would be happy to help! Also Typhoeus provides a caching interface: https://github.com/typhoeus/typhoeus#caching. You should be able to use this instead. But thats only a side note.

@parkr
Copy link
Contributor

parkr commented Dec 4, 2013

I knew @i0rek would swoop in and save the day. :superman:

@benbalter
Copy link
Contributor

🤘

@gjtorikian
Copy link
Owner Author

@i0rek Thanks for jumping in!

Using ruby-prof, here are the top few results when executing link checks:

Links failed 1 times
Thread ID: 2210752180
Fiber ID: 2218012100
Total: 16.407398
Sort by: self_time

 %self      total      self      wait     child     calls  name
 18.39      3.511     3.018     0.000     0.493    34264   <Module::Ethon::Curl>#multi_perform 
  7.04      1.260     1.156     0.000     0.105    29091   <Module::Ethon::Curl>#select 
  5.30      0.893     0.869     0.000     0.023       92   <Module::Ethon::Curl>#easy_perform 
  4.59      2.885     0.753     0.000     2.132    33976   Ethon::Multi::Operations#set_fds 
  3.67      1.899     0.603     0.000     1.297   102792  *Class#new 
  2.53     13.298     0.415     0.000    12.883      572   Ethon::Multi::Operations#perform 
  2.48      0.621     0.408     0.000     0.214   101928   FFI::Struct#clear 
  2.46      0.792     0.404     0.000     0.388    17020   URI::Generic#initialize 
  2.25      7.939     0.369     0.000     7.571    34264   Ethon::Multi::Operations#run 
  1.93      3.096     0.316     0.000     2.779    34264   Ethon::Multi::Operations#check 
  1.83      0.599     0.300     0.000     0.299    34264   Ethon::Multi::Operations#get_timeout 
  1.82      0.298     0.298     0.000     0.000   125901   FFI::Enum#from_native 
  1.73      0.284     0.284     0.000     0.000    68533   FFI::MemoryPointer#initialize 
  1.72      0.282     0.282     0.000     0.000     4885   Kernel#sleep 

I'm on Typhoeus 0.6.6, which uses Ethon 0.6.1. I didn't investigate what the multi_perform call does, nor am I certain if it's the real bottleneck or just a red herring.

@parkr
Copy link
Contributor

parkr commented Dec 4, 2013

@gjtorikian Looks like it's delegating out to FFI? @i0rek would know way more here.

@gjtorikian gjtorikian mentioned this pull request Dec 5, 2013
@gjtorikian
Copy link
Owner Author

Hi @i0rek, sorry for the false alarm. I made some speed improvements that dropped the run of these tests from three minutes down to one: #19 (comment)

The change had nothing to do with the URL check, but rather, how the files were iterated over. Thanks for dropping by, though!

@hanshasselberg
Copy link

Glad you figured it out.
The reason why I wasn't surprised to find Ethon at the top is that this gem is used by Typhoeus to actually make requests. multi_perform reaches into libcurl and fetches the url. I hope it is clearer now. Thanks and keep build awesome things!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internally cache status of known URLs
4 participants