Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve Generating metadata files and linking package files... step #599

Closed
sandrotosi opened this issue Jul 25, 2017 · 4 comments
Closed

Comments

@sandrotosi
Copy link

hello,
when doing the usual mirror -> snapshot -> publish process to mirror a big repo (in this case, i'm preparing the stretch point release 9.1 local repo), the Generating metadata files and linking package files... step during the publish phase is pretty slow:

Generating metadata files and linking package files...
 13749 / 76155 [=============================>----------------------------------------------------------------------------------------------------------------------------------------]  18.05% 50m26s

and this is on a machine with a fast 4xSSD raid5

looking at the processes on the host, it seems every package is unpacked to extract the metadata, and that's done sequentially

do you think there's a way to improve the performances, like using a tmpfs (/dev/shm?) or running the unpacking in parallel? that would cut down the time for this step by a lot (it could also be an option to enable only on machiens with fast I/O and several cores to spare)

thanks for considering!

@smira
Copy link
Contributor

smira commented Jul 25, 2017

Sandro, aptly is generating data for Contents indexes here.

You can disable it with -skip-contents (https://www.aptly.info/doc/aptly/publish/snapshot/).

But even if enabled, data for each package is generated only once (it's cached for subsequent publishes). So it's only slow for the first time.

@smira smira added the question label Sep 29, 2017
@smira
Copy link
Contributor

smira commented Sep 29, 2017

Closing on inactivity, please feel free to re-open

@smira smira closed this as completed Sep 29, 2017
@sunilbhogulkar
Copy link

sunilbhogulkar commented Oct 11, 2018

Hello,

We have aptly installed on EC2 instance wherein aptly root directory is mounted on ebs volume and aptly publishes to S3 endpoint. We have observed that while publishing the snapshots Generating metadata files and linking package files... stage takes consistently around 2.5 hours with -skip-contents option set to true and there are no changes between new and previously published snapshots. We are using aptly 1.3.0 version at the moment and published repository size is ~160GB.

Any pointers towards what could be wrong with the configurations.

Thanks

@smira
Copy link
Contributor

smira commented Oct 12, 2018

@sunilbhogulkar hard to say what is going on there. Contents generation itself (you have it disabled, but still), was significantly improved in #707.

What aptly does at this step:

  1. Scans the database for packages to publish, loading information about each package to be published.
  2. Generates temporary files with package indexes (in temporary directory).
  3. Verifies that file exists in S3, and if not, uploads it. As you have no changes this step should be no-op, as even verification is done via getting list of bucket contents once.

I have two suggestions:

  1. Verify that temp directory is not on EBS volume.
  2. Try running strace against aptly and analyze that to figure out which operation takes a lot of time. 2.5 hours is not expected for no-op publish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants