Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify user_ldap checkPassword to not fetch records from ldap each time because of extremely high CPU usage! #35867

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

akhil1508
Copy link
Contributor

@akhil1508 akhil1508 commented Dec 22, 2022

Summary

  • Currently on each checkPassword call, the function getLDAPUserByLoginName is called

  • This calls fetchListOfUsers which is defined here

  • So on each authentication request:

  • DAV mobile clients(for example DAVx5) don't support token auth or oauth yet. This means that they pass the username and password on every single sync request

  • We have an NC instance serving around 100 DAV sync requests every second

  • For every sync request the LDAP logs look like(admin bind + search for user):

conn=1687813954 fd=21 ACCEPT from IP=*** (IP=***)
conn=1687813954 op=0 BIND dn="***" method=128
conn=1687813954 op=0 BIND dn="***" mech=SIMPLE bind_ssf=0 ssf=0
conn=1687813954 op=0 RESULT tag=97 err=0 qtime=0.000014 etime=0.000156 text=
conn=1687813954 op=1 SRCH base="***" scope=2 deref=0 filter="***"
conn=1687813954 op=1 SRCH attr=entryuuid nsuniqueid objectguid guid ipauniqueid dn uid samaccountname memberof 
conn=1687813954 op=1 SEARCH RESULT tag=101 err=0 qtime=0.000027 etime=0.000332 nentries=1 text=
  • followed by BIND(to check user credentials)
conn=1687813956 fd=51 ACCEPT from IP=*** (IP=***)
conn=1687813956 op=0 BIND dn="***" method=128
conn=1687813956 op=0 BIND dn="***" mech=SIMPLE bind_ssf=0 ssf=0
conn=1687813956 op=0 RESULT tag=97 err=0 qtime=0.000012 etime=0.004391 text=
  • CPU Usage by openldap is extremely high! CPU usage by database is similarly extremely high but lower compared to LDAP

Proposed solution here

  • Get username and dn from cache if existing
    • Use the loginName2UserName function that will fetch the ldap record and update attributes if it is not in cache
  • Check credentials with BIND but don't process attributes again
  • Will save ~0.5ms per auth request if user already cached
    • Doesn't sound like too big a deal but with 100 auth requests per second, this can definitely help the load issue
  • Will also avoid ~12 SQL UPDATE operations on each authentication request
    • ~6 if user is not cached yet
    • UNBIND happens in the destructor of the Connection class so pretty sure it is done only after all these SQL commands are completed, which means the LDAP bind stays open till then

Please do let me know if there are any pitfalls to this and if the ldap record fetching and processing on every single authentication request happens for an unavoidable reason I am not aware of..

Checklist

@szaimen szaimen added the 3. to review Waiting for reviews label Dec 22, 2022
@szaimen szaimen added this to the Nextcloud 26 milestone Dec 22, 2022
@szaimen szaimen requested review from come-nc, ChristophWurst, blizzz, a team and icewind1991 and removed request for a team December 22, 2022 14:56
@ChristophWurst ChristophWurst removed their request for review December 22, 2022 15:14
@come-nc
Copy link
Contributor

come-nc commented Jan 2, 2023

Hello, thanks for looking into this!

Proposed solution here

  • Get username and dn from cache if existing

    • Use the loginName2UserName function that will fetch the ldap record and update attributes if it is not in cache

This means that the new users will not be able to login until cache expires, which may lead to worse user experience on some setups.

  • Check credentials with BIND but don't process attributes again

Same thing here, forcing attributes update from LDAP on login is most likely to improve user experience and avoid having to wait on background job to see a change applied in Nextcloud.

So, I think it was on purpose that cache is not used in checkPassword. That said, it’s of course a big problem if that gets triggered on each request for DAV requests in some cases.

We should try to see if we can find a middle ground here, maybe look into cache but if user does not exist, force an LDAP search?
For the update of DB data I’m not sure, maybe we can not force it, but it does seem handy to be able to force refresh by logging out then in. But if the data is from cache anyway it does make sense to push to DB though…

@blizzz Thoughts?

@akhil1508
Copy link
Contributor Author

akhil1508 commented Jan 2, 2023

@come-nc

This means that the new users will not be able to login until cache expires, which may lead to worse user experience on some setups.

This is fine imo(new users will be able to login) as the function loginName2Username does first check cache and if user not existing, it fetches from LDAP as seen over here

Same thing here, forcing attributes update from LDAP on login is most likely to improve user experience and avoid having to wait on background job to see a change applied in Nextcloud.

This is true for sure; my only counter-point is that the cache usually does reset much quicker than the background job run time(especially with lots of users) so the user is fetched from LDAP much more frequently than the bg job(just not everytime). Otherwise, maybe we could even set a config value in config.php for this and if set to true(defaulting to false), we can fetch from cache or otherwise fetch and update attributes on every single login.

We should try to see if we can find a middle ground here, maybe look into cache but if user does not exist, force an LDAP search?

This is already done, please see my point about the loginName2Username function above :)

For the update of DB data I’m not sure, maybe we can not force it, but it does seem handy to be able to force refresh by logging out then in. But if the data is from cache anyway it does make sense to push to DB though…

For me its okay to update once on every cache reset but if other users want the update every time, we can use a config option for this as mentioned above.

@come-nc
Copy link
Contributor

come-nc commented Jan 3, 2023

This means that the new users will not be able to login until cache expires, which may lead to worse user experience on some setups.

This is fine imo(new users will be able to login) as the function loginName2Username does first check cache and if user not existing, it fetches from LDAP as seen over here

No, if the user does not exists false is cached:

$this->access->connection->writeToCache($cacheKey, false);

On the next call if false is cached it is returned without LDAP request:

if ($username !== null) {
return $username;
}

@akhil1508
Copy link
Contributor Author

akhil1508 commented Jan 3, 2023

@come-nc

On the next call if false is cached it is returned without LDAP request

Fair point. My bad, didn't notice the strict comparison against null. I try to see how we can achieve correct behaviour where we fetch if not cached.

But also, this seems like an edge case(an important one maybe?) to me. How often can we expect users to try login with a non-existent username and then a new user with that same username is added within the cache timeout time?

@akhil1508
Copy link
Contributor Author

akhil1508 commented Jan 4, 2023

@come-nc What about something like:

public function loginName2UserName($loginName, bool $overrideCacheIfFalse = false) {
		$cacheKey = 'loginName2UserName-' . $loginName;
		$username = $this->access->connection->getFromCache($cacheKey);

		$overrideCache = $overrideCacheIfFalse && $username === false;
		if ($username !== null && !$overrideCache) {
			return $username;
		}
               ....
}

And then:

public function checkPassword($uid, $password) {
	$username = $this->loginName2UserName($uid, true);
        ...

@akhil1508
Copy link
Contributor Author

@come-nc @blizzz @icewind1991 Anything? Please comment if possible if it is okay overall(does not break authentication) so I can test on our servers to see if load goes down a lot by this..

Thanks a lot :)

@come-nc
Copy link
Contributor

come-nc commented Jan 12, 2023

@come-nc @blizzz @icewind1991 Anything? Please comment if possible if it is okay overall(does not break authentication) so I can test on our servers to see if load goes down a lot by this..

Thanks a lot :)

Yes it looks good enough that you can test it.

@blizzz
Copy link
Member

blizzz commented Jan 12, 2023

Mh, Attributes are not processed i.e. not updated anymore now on login despite no cache hit. It will be done only for newly mapped users. Unless I oversaw something?

@akhil1508
Copy link
Contributor Author

akhil1508 commented Jan 12, 2023

Mh, Attributes are not processed i.e. not updated anymore now on login despite no cache hit. It will be done only for newly mapped users. Unless I oversaw something?

@blizzz If no cache hit we proceed to

$ldapRecord = $this->getLDAPUserByLoginName($loginName);
so they are processed

Looks like I am mistaken.. Seems like over here, an empty array is sent even after cache invalidation.

$this->batchApplyUserAttributes($recordsToUpdate);

You are right, let me see how this can be handled.

@akhil1508
Copy link
Contributor Author

akhil1508 commented Jan 12, 2023

Mh, Attributes are not processed i.e. not updated anymore now on login despite no cache hit. It will be done only for newly mapped users. Unless I oversaw something?

@blizzz Handled now. Please check.

apps/user_ldap/lib/User_LDAP.php Outdated Show resolved Hide resolved
apps/user_ldap/lib/User_LDAP.php Outdated Show resolved Hide resolved
@akhil1508 akhil1508 requested review from blizzz and removed request for come-nc and icewind1991 January 16, 2023 10:49
@akhil1508
Copy link
Contributor Author

akhil1508 commented Jan 17, 2023

@blizzz Seems like I removed reviewers by mistake and am unable to re-add :/

Please do re-add @come-nc and @icewind1991 when possible

@akhil1508
Copy link
Contributor Author

@come-nc @blizzz @icewind1991 It seems good on test servers. Do you think it's okay to put into production as it is currently as server load is crazy high..

@blizzz blizzz mentioned this pull request Feb 1, 2023
@skjnldsv skjnldsv mentioned this pull request Feb 23, 2023
@blizzz blizzz mentioned this pull request Mar 7, 2023
This was referenced Jul 30, 2024
This was referenced Aug 5, 2024
@skjnldsv skjnldsv mentioned this pull request Aug 13, 2024
@skjnldsv skjnldsv closed this Aug 14, 2024
@skjnldsv skjnldsv removed this from the Nextcloud 30 milestone Aug 14, 2024
@sorbaugh sorbaugh reopened this Aug 16, 2024
@sorbaugh sorbaugh removed the stale Ticket or PR with no recent activity label Aug 16, 2024
@sorbaugh
Copy link
Contributor

sorbaugh commented Aug 16, 2024

Nice PR! Let's see that we get it added to 31 🚀

@akhil1508
Copy link
Contributor Author

Nice PR! Let's see that we get it added to 30 🚀

@sorbaugh Do you need anything from my side to make the merge?

@skjnldsv
Copy link
Member

@akhil1508 if possible, could you fix the ldap test failure in apps/user_ldap/tests/User_LDAPTest.php:235 ?
Do you need help with it? :)

1) OCA\User_LDAP\Tests\User_LDAPTest::testCheckPasswordUidReturn
TypeError: Mock_Access_94083a34::areCredentialsValid(): Argument #1 ($name) must be of type string, null given, called in /home/runner/actions-runner/_work/server/server/apps/user_ldap/lib/User_LDAP.php on line 162

/home/runner/actions-runner/_work/server/server/apps/user_ldap/lib/User_LDAP.php:162
/home/runner/actions-runner/_work/server/server/apps/user_ldap/tests/User_LDAPTest.php:184

2) OCA\User_LDAP\Tests\User_LDAPTest::testCheckPasswordPublicAPI
TypeError: Mock_Access_94083a34::areCredentialsValid(): Argument #1 ($name) must be of type string, null given, called in /home/runner/actions-runner/_work/server/server/apps/user_ldap/lib/User_LDAP.php on line 162

/home/runner/actions-runner/_work/server/server/apps/user_ldap/lib/User_LDAP.php:162
/home/runner/actions-runner/_work/server/server/lib/private/User/Manager.php:231
/home/runner/actions-runner/_work/server/server/lib/private/User/Manager.php:201
/home/runner/actions-runner/_work/server/server/apps/user_ldap/tests/User_LDAPTest.php:235

@akhil1508
Copy link
Contributor Author

Do you need help with it? :)

@skjnldsv Yes please. It doesn't seem like my changes should break unit tests. Is it maybe related to being up to date with master? Should I merge master into the branch again?

@skjnldsv
Copy link
Member

skjnldsv commented Aug 16, 2024

@akhil1508 could you enable us to make modifications to this branch?
There should be an option at the bottom right
image

If possible avoid merging master into this branch, it's preferable if you rebase instead (to avoid creating a merge commit) :)

@akhil1508
Copy link
Contributor Author

@skjnldsv Is it OK if I add a new PR from a personal fork?

Organization forks don't allow maintainer modifications https://github.com/orgs/community/discussions/5634

@skjnldsv
Copy link
Member

skjnldsv commented Aug 16, 2024

@skjnldsv Is it OK if I add a new PR from a personal fork?
Organization forks don't allow maintainer modifications github.com/orgs/community/discussions/5634

Oh! I didn't know that :)
It's alright, I think rebasing to master and addressing my last comment should do it :)

@skjnldsv skjnldsv self-assigned this Aug 16, 2024
Signed-off-by: Akhil <akhil@e.email>
@akhil1508
Copy link
Contributor Author

@skjnldsv Rebased and also attempted to fix the unit test

@skjnldsv
Copy link
Member

@skjnldsv Rebased and also attempted to fix the unit test

Amazing! Looks good! I approved the tests, let's see if they pass 🤞

@skjnldsv
Copy link
Member

skjnldsv commented Aug 16, 2024

Cypress & Performance testing failure expected
We'll ignore the unconventional commits for this one 😉
If you create future PRs, please follow the conventional commits pattern 😜

@skjnldsv skjnldsv added 4. to release Ready to be released and/or waiting for tests to finish and removed 2. developing Work in progress labels Aug 16, 2024
@skjnldsv skjnldsv merged commit d63148e into nextcloud:master Aug 16, 2024
157 of 161 checks passed
@skjnldsv
Copy link
Member

Thank you for your patience and commitment @akhil1508 !! 🎉 🎉

@akhil1508
Copy link
Contributor Author

@skjnldsv

Thank you for your patience and commitment @akhil1508 !! 🎉 🎉

And thanks to you too! :)

@skjnldsv skjnldsv added this to the Nextcloud 31 milestone Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4. to release Ready to be released and/or waiting for tests to finish performance 🚀
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants