Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve syft performance by memoizing filetree #1480

Closed
aeg opened this issue Jan 18, 2023 · 4 comments · Fixed by #1510
Closed

Improve syft performance by memoizing filetree #1480

aeg opened this issue Jan 18, 2023 · 4 comments · Fixed by #1510
Labels
enhancement New feature or request

Comments

@aeg
Copy link

aeg commented Jan 18, 2023

What would you like to be added:
To speed up syft packages processing, it is effective to memoize file.path.Normalize () in stereoscope/filetree.

Why is this needed:
It thought filetree of stereoscope used in syft is wasting CPU resources.

Additional context:

Profiling for syft packages dir:/ shows that path(*lazybuf).append() and `path.Clean()' consume more than 20% of CPU resources at runtime, as follows.

      flat  flat%   sum%        cum   cum%
    69.18s 19.52% 19.52%     69.46s 19.60%  path.(*lazybuf).append
    26.48s  7.47% 26.99%     96.49s 27.23%  path.Clean
    21.83s  6.16% 33.15%     75.73s 21.37%  runtime.scanobject
    18.44s  5.20% 38.36%     20.80s  5.87%  runtime.findObject
    13.42s  3.79% 42.15%     13.56s  3.83%  syscall.Syscall
    10.48s  2.96% 45.10%     10.48s  2.96%  aeshashbody
     9.90s  2.79% 47.90%     32.64s  9.21%  runtime.mapaccess1_faststr

I profile stereoscope, and I found this node() function in the filetree stereoscope was called too many times.
func (t \*FileTree) node(p file.Path, strategy linkResolutionStrategy) (\*filenode.FileNode, error)

In my case, with 55148 paths system, node() was called 58,882,000 times. It was called 1067 times each a path on average.

call flow:
  node() -> file.Path.Normalize() 
            -> path.Clean()
               -> (\*lazubuf).append() 

Actually, file.Path.Normalize() was only needed to process once for the same path, I tried simply to implement memoization for file.Path.Normalize(), then path.(\*lazubuf).append() were not at the top of the ranking.

      flat  flat%   sum%        cum   cum%
    21.24s  7.55%  7.55%     23.79s  8.45%  runtime.findObject
    20.08s  7.14% 14.68%     75.85s 26.95%  runtime.scanobject
    18.93s  6.73% 21.41%     18.93s  6.73%  runtime.evacuated (inline)
    16.51s  5.87% 27.28%     16.51s  5.87%  aeshashbody
    12.43s  4.42% 31.69%     12.65s  4.50%  syscall.Syscall
    10.41s  3.70% 35.39%     38.29s 13.61%  runtime.mapaccess2_faststr

In this case, the execution time is much shorter, from 385 seconds to 257 seconds.
It seems to improve speed by 33% .
Essentially, the filetree process should be improved, but since the performance improves on average 10%~20%, it would be a good idea to include this workaround first.

@Mikcl
Copy link
Contributor

Mikcl commented Jan 18, 2023

nice change

regarding:

, the filetree process should be improved

Have an attempt at that here #1463

(not a maintainer)

@aeg
Copy link
Author

aeg commented Jan 19, 2023

Thanks for your comment. and nice your PR!

I think doublestar search is very costly. In fact, I think a search like find . -name pipfile is better.

And If I tried syft packages dir:/home/myhome , syft's catalogers searches for link information from all directories to '/home/myhome'. I don't really see the need for this.

@kzantow kzantow added this to OSS Jan 26, 2023
@kzantow
Copy link
Contributor

kzantow commented Jan 26, 2023

We are addressing this right now #1510

@wagoodman
Copy link
Contributor

Though #1510 will be addressing the symptom (slower cataloging process) it won't be by memoizing Normalize() calls. Instead we took a look at the common calls done and added indexes to the underlying datastructure to be leveraged before searching the tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants