Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

user-directory API wrong limit and limited (off by 1) #10840

Closed
douph1 opened this issue Sep 17, 2021 · 3 comments · Fixed by #14631
Closed

user-directory API wrong limit and limited (off by 1) #10840

douph1 opened this issue Sep 17, 2021 · 3 comments · Fixed by #14631
Labels
P4 (OBSOLETE: use S- labels.) Okay backlog: will not schedule, will accept patches S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@douph1
Copy link

douph1 commented Sep 17, 2021

Description

https://matrix.org/docs/spec/client_server/latest#user-directory

Response:
limited | boolean | Required. Indicates if the result list has been truncated by the limit.

The limited response attribut is wrongly reported
If I search for a term with a limit of : 50 this query can return 2 elements and the result is not truncated : "limited": false : ok
But if I limit the search to 1 element. The result show me the two elements and respond with "limited": true which is twice wrong as the result is complete and I have asked to limit the response to 1...

If I don't specify limit in query I can get 11 response but the doc says :

The maximum number of results to return. Defaults to 10.

If I use ask to limit to more than 50 the result is truncated at 51 elements.
This hard limit (not configurable)

limit = min(limit, 50)

is not documented in the api (and off by 1)
https://matrix.org/docs/spec/client_server/latest#user-directory

curl -s --location --request POST 'https://hostname/_matrix/client/r0/user_directory/search' --header 'Authorization: Bearer mybearer' --header 'Content-Type: application/json' --data-raw '{
  "limit": 20,
  "search_term": "ab"}' | jq
{
  "limited": false,
  "results": [
    {
      "user_id": "@ab61f215-a3ea-442b-8c7c-49cee57aad66:podo-dev",
      "display_name": "ab61f215-a3ea-442b-8c7c-49cee57aad66",
      "avatar_url": null
    },
    {
      "user_id": "@ab32d61e-5774-4716-b1f2-14cb9b07cb00:podo-dev",
      "display_name": "ab32d61e-5774-4716-b1f2-14cb9b07cb00",
      "avatar_url": null
    }
  ]
}

return 2 result : ok , limited : false : ok

then limit by 1 in query:

curl -s --location --request POST 'https://hostname/_matrix/client/r0/user_directory/search' --header 'Authorization: Bearer mybearer' --header 'Content-Type: application/json' --data-raw '{
  "limit": 1,
  "search_term": "ab"}' | jq
{
  "limited": true,
  "results": [
    {
      "user_id": "@ab61f215-a3ea-442b-8c7c-49cee57aad66:podo-dev",
      "display_name": "ab61f215-a3ea-442b-8c7c-49cee57aad66",
      "avatar_url": null
    },
    {
      "user_id": "@ab32d61e-5774-4716-b1f2-14cb9b07cb00:podo-dev",
      "display_name": "ab32d61e-5774-4716-b1f2-14cb9b07cb00",
      "avatar_url": null
    }
  ]
}
  • return 2 elements instead of 1
  • pretend the result is not complete: "limited": true .. but this is wrong

Version information

This was discover in a old version but the source code of the directory_search and rest api endpoint wich limit to 50(51) does not have change a lot since

apt-cache policy matrix-synapse-py3
matrix-synapse-py3:
  Installé : 1.31.0+bionic1
  Candidat : 1.31.0+bionic1
 Table de version :
 *** 1.31.0+bionic1 500
        500 https://matrix.org/packages/debian bionic/main amd64 Packages

Discoverd in our homeserver

curl http://localhost:8008/_synapse/admin/v1/server_version
{"server_version":"1.31.0","python_version":"3.6.9"}
  • Platform:
    Ubuntu
@DMRobertson
Copy link
Contributor

Thanks for the report! We should check to see if this is reproducible with current versions. Speaking of versions, please be aware that we recently released Synapse 1.41.1 which includes security fixes. See here for more details.

I remember the user directory storage functions request limit + 1 elements from the database, to see if there are extra elements that we are going to limit. If so, it sounds like we haven't dropped that extra +1 result from the response.

@DMRobertson DMRobertson added P4 (OBSOLETE: use S- labels.) Okay backlog: will not schedule, will accept patches S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Sep 17, 2021
@anoadragon453
Copy link
Member

The following was returned from matrix.org running c5ba1d6 today:

curl 'https://matrix-client.matrix.org/_matrix/client/r0/user_directory/search' -X POST -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:97.0) Gecko/20100101 Firefox/97.0' -H 'Accept: application/json' -H 'Content-Type: application/json' -H 'Authorization: Bearer $$$' --data '{"search_term":"mic"}' --silent  | jq .
{
  "limited": true,
  "results": []
}

Presumably limited should be false.

@DMRobertson
Copy link
Contributor

The following was returned from matrix.org running c5ba1d6 today:

curl 'https://matrix-client.matrix.org/_matrix/client/r0/user_directory/search' -X POST -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:97.0) Gecko/20100101 Firefox/97.0' -H 'Accept: application/json' -H 'Content-Type: application/json' -H 'Authorization: Bearer $$$' --data '{"search_term":"mic"}' --silent  | jq .
{
  "limited": true,
  "results": []
}

Presumably limited should be false.

This is an unfortunate interaction between the spam checking logic and the database logic.

results = await self.store.search_user_dir(user_id, search_term, limit)
# Remove any spammy users from the results.
non_spammy_users = []
for user in results["results"]:
if not await self.spam_checker.check_username_for_spam(user):
non_spammy_users.append(user)
results["results"] = non_spammy_users
return results

In this example:

  • The DB returned limited: True because it only produced 10 users, but there was at least one more matching the search query.
  • The spam checker excluded all of these users from the results.
  • limited: True remains in the results object.

limited: True is still correct in principle here: if you asked the DB to look up say 30 users, the spam checker might have deemed users 11--30 to be acceptable and returned you something. You can imagine a world where we make a second DB query if len(non_spammy_users) < len(results["results]"). But I think that's probably overkill.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P4 (OBSOLETE: use S- labels.) Okay backlog: will not schedule, will accept patches S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
3 participants