Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt yumpkg implementation painfully slow in some circumstances #26129

Closed
GreatSnoopy opened this issue Aug 8, 2015 · 20 comments
Closed

salt yumpkg implementation painfully slow in some circumstances #26129

GreatSnoopy opened this issue Aug 8, 2015 · 20 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Execution-Module fixed-pls-verify fix is linked, bug author to confirm fix help-wanted Community help is needed to resolve this P1 Priority 1 Platform Relates to OS, containers, platform-based utilities like FS, system based apps Regression The issue is a bug that breaks functionality known to work in previous releases. severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@GreatSnoopy
Copy link

In certain configurations, salt's yumpkg implementation is painfully slow when having to deal with multiple package installs despite using the pkgs list instead of names.
This happens because the current implementation calls repoquery on a package by package basis.
Now, when you have plugins installed and some of them are slow you end up waiting a few good seconds for each repoquery invocation. If your list to pkg.installed has tens of packages, you end up waiting minutes for the state to be applied, even if no installation is to be made at all.
Problem resides in yumpkg.py in function check_db :

    for name in names:
        ret.setdefault(name, {})['found'] = name in avail
        if not ret[name]['found']:
            repoquery_cmd = repoquery_base + ' {0}'.format(name)
            provides = sorted(
                set(x.name for x in _repoquery_pkginfo(repoquery_cmd))
            )

In the current implementation, for a list of packages I would get :

        comment:
            57 targeted packages were installed/updated. 30 targeted packages were already installed.
        duration:
            1409453.224

I propose the following workaround : for the first name in names, call repoquery as default, to make it refresh its caches. For any subsequent packages in the list tell it to use its local cache.
The following patch implements this

705a706
>     cache_control_params=" "
709c710
<             repoquery_cmd = repoquery_base + ' {0}'.format(name)

---
>             repoquery_cmd = repoquery_base + ' {0} {1}'.format(name,cache_control_params) 
717a719
>         cache_control_params=" --cache "

This may not be the best solution, as it's not very clear if a first run of repoquery will actually recreate all the metadata for all the packages in its local cache so that all subsequent queries return meaningful results.
The best solution would be to actually call repoquery once with ALL the names in the command line - just as the documentation for parameter pkgs pretends it would do:

        Unlike ``pkgs``, the ``names`` parameter cannot specify a version.
        In addition, it makes a separate call to the package management
        frontend to install each package, whereas ``pkgs`` makes just a
        single call. It is therefore recommended to use ``pkgs`` instead of
        ``names`` to install multiple packages, both for the additional
        features and the performance improvement that it brings.

This obviously does not happen now. It may call yum install with all the needed packages, but querying package details is still going one by one and this is very slow when you have yum plugins that each takes a few seconds to execute, or when network is slow or unavailable (and you are waiting for sockets to timeout at each repoquery invocation)

@DanyC97
Copy link

DanyC97 commented Aug 9, 2015

nice catch!

@GreatSnoopy
Copy link
Author

Well, it was hard to miss after staring at the screen for so long waiting for the thing to finish :)

@jfindlay jfindlay added Execution-Module Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P3 Priority 3 Platform Relates to OS, containers, platform-based utilities like FS, system based apps labels Aug 10, 2015
@jfindlay jfindlay added this to the Approved milestone Aug 10, 2015
@jfindlay
Copy link
Contributor

@GreatSnoopy, thanks for the report. I do not know much about repoquery and haven't used it before to (re)generate the package cache, so I am unqualified to say whether it necessarily regenerates the complete cache for all packages in all installed repositories. I do know that yum makecache does claim to do this, at least according to the manpage.

@jfindlay jfindlay added the help-wanted Community help is needed to resolve this label Aug 10, 2015
@jfindlay
Copy link
Contributor

I wonder if yum makecache would be a better solution here if this is true. In any case, you are very welcome to submit your patch to 2015.5. Thanks.

@GreatSnoopy
Copy link
Author

I'll look into it the following days, if yum make cache really is the way to go we could instead call yum makecache once and add --cache as a default parameter for repoquery, along with the already present --plugins. Until then, the current workaround should work.

@jfindlay
Copy link
Contributor

@GreatSnoopy, that sounds like a good plan to me, thanks. Let me know if you can't get a pull request in and I'll do it.

@ghost
Copy link

ghost commented Aug 19, 2015

I'm not sure I understand why it is required to use the high level "yum" as opposed to the far faster "rpm" when simply checking to see if a package is installed.

@fbretel
Copy link
Contributor

fbretel commented Oct 14, 2015

@GreatSnoopy I found out about skip_suggestions. This requires strict naming of packages, but it probably what we want anyway.

@danlsgiga
Copy link
Contributor

I've noticed that after I updated to 2015.8.1 my packages lookup were taking forever to run (basically the repoquery process was taking too long to run). After setting the skip_suggestions: True in my pkg.installed state everything ran really fast. Is there any change in the yumpkg module that is causing this? A simple pkg.installed is taking ~1.5 minutes to run and this is killing performance.

Then I downgraded to 2015.8.0 and the performance hit is not there, even with skip_suggestions set to False.

@jfindlay jfindlay added Regression The issue is a bug that breaks functionality known to work in previous releases. P1 Priority 1 and removed P3 Priority 3 labels Oct 16, 2015
@GreatSnoopy
Copy link
Author

The problem I raised initially should not be related to anything with package suggestions. The issue is due to the yum module querying package details for each package one by one, and making this a remote call (not using cache). I will look to see if using skip_suggestions will make yum lookups more inclined to use the existing cache, however I doubt it. Core issue is making too many yum calls one by one while each yum call also gets online to (re)download its cache.
On my environments, i still use 2015.05 to avoid surprises, and I overrided the yum module with one patched as above. Due to pressing job issues, I did not even verify if the above workaround was integrated in 2015.08, maybe one of the core devs can chime in ?

@aflat
Copy link

aflat commented Oct 23, 2015

I hit the same issue in 2015.8 as well. I was using 2014.7, installing about 25 rpms, and moving up to 2015.8 has increased my configure time more then 1.5 minutes. Since I'm building jenkins machines in AWS on the fly, this adds up, especially for my machines that require more rpms. Each repoquery adds 3+ seconds.

@vitaliyf
Copy link
Contributor

Ran experiments, definitely "broken" on 2015.8.1 and not broken on 2015.8.0. I also tried latest yumpkg.py from 2015.8 branch (which had some reverts done to it) and it's also slow.

@themalkolm
Copy link
Contributor

So I can just drop yumpkg.py from 2015.8.0 to my _modules and it should solve the issue for 2015.8.1, right?

@danlsgiga
Copy link
Contributor

@themalkolm, that's how I did and it works! Just make sure to run salt '*' saltutil.sync_all refresh=True

@variia
Copy link

variia commented Nov 5, 2015

I have just upgraded from 2015.5.2 to 2015.5.6 from SaltStack repo and it is broken there too. I was unable to see references about this regarding 2015.5.6 hence reporting here.

@vitaliyf
Copy link
Contributor

I can confirm that this issue was fixed by the fix in #28526

@jfindlay
Copy link
Contributor

jfindlay commented Dec 3, 2015

@variia, this seems to have been fixed on 2015.8, but we should probably backport it to 2015.5.

@variia
Copy link

variia commented Dec 3, 2015

@jfindlay thanks, i agree. what is the ref for the fix? in the meantime, the skip_suggestion = True flag does solve the problem and in my view, that should be default.

@jfindlay
Copy link
Contributor

jfindlay commented Dec 4, 2015

@variia, #29439.

@clarkperkins
Copy link
Contributor

So, this was fixed in the 2015.8 line, but somehow it got reverted back to the slow version in this commit. 410da78

This affects at least 2015.8.7 if not earlier versions.

@jfindlay jfindlay added the fixed-pls-verify fix is linked, bug author to confirm fix label Jun 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Execution-Module fixed-pls-verify fix is linked, bug author to confirm fix help-wanted Community help is needed to resolve this P1 Priority 1 Platform Relates to OS, containers, platform-based utilities like FS, system based apps Regression The issue is a bug that breaks functionality known to work in previous releases. severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests