Skip to content

Commit

Permalink
Spec the v2 lookup API
Browse files Browse the repository at this point in the history
Spec for [MSC2134](#2134)
  • Loading branch information
turt2live committed Sep 10, 2019
1 parent a24bcc2 commit 6cfd761
Show file tree
Hide file tree
Showing 3 changed files with 291 additions and 2 deletions.
2 changes: 1 addition & 1 deletion api/identity/lookup.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# limitations under the License.
swagger: '2.0'
info:
title: "Matrix Identity Service Lookup API"
title: "Matrix Identity Service Lookup API"
version: "1.0.0"
host: localhost:8090
schemes:
Expand Down
145 changes: 145 additions & 0 deletions api/identity/v2_lookup.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Copyright 2016 OpenMarket Ltd
# Copyright 2017 Kamax.io
# Copyright 2017 New Vector Ltd
# Copyright 2018 New Vector Ltd
# Copyright 2019 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
swagger: '2.0'
info:
title: "Matrix Identity Service Lookup API"
version: "2.0.0"
host: localhost:8090
schemes:
- https
basePath: /_matrix/identity/v2
consumes:
- application/json
produces:
- application/json
securityDefinitions:
$ref: definitions/security.yaml
paths:
"/hash_details":
get:
summary: Gets hash function information from the server.
description: |-
Gets parameters for hashing identifiers from the server. This can include
any of the algorithms defined in this specification.
operationId: getHashDetails
security:
- accessToken: []
parameters: []
responses:
200:
description: The hash function information.
examples:
application/json: {
"lookup_pepper": "matrixrocks",
"algorithms": ["none", "sha256"]
}
schema:
type: object
properties:
lookup_pepper:
type: string
description: |-
The pepper the client MUST use in hashing identifiers, and MUST
supply to the ``/lookup`` endpoint when performing lookups.
Servers SHOULD rotate this string often.
algorithms:
type: array
items:
type: string
description: |-
The algorithms the server supports. Must contain at least ``sha256``.
required: ['lookup_pepper', 'algorithms']
"/lookup":
post:
summary: Look up Matrix User IDs for a set of 3PIDs.
description: |-
Looks up the set of Matrix User IDs which have bound the 3PIDs given, if
bindings are available. Note that the format of the addresses is defined
later in this specification.
operationId: lookupUsersV2
security:
- accessToken: []
parameters:
- in: body
name: body
schema:
type: object
properties:
algorithm:
type: string
description: |-
The algorithm the client is using to encode the ``addresses``. This
should be one of the available options from ``/hash_details``.
example: "sha256"
pepper:
type: string
description: |-
The pepper from ``/hash_details``. This is required even when the
``algorithm`` does not make use of it.
example: "matrixrocks"
addresses:
type: array
items:
type: string
description: |-
The addresses to look up. The format of the entries here depend on
the ``algorithm`` used. Note that queries which have been incorrectly
hashed or formatted will lead to no matches.
example: [
"4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc",
"nlo35_T5fzSGZzJApqu8lgIudJvmOQtDaHtr-I4rU7I"
]
required: ['algorithm', 'pepper', 'addresses']
responses:
200:
description:
The associations for any matched ``addresses``.
examples:
application/json: {
"mappings": {
"4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc": "@alice:example.org"
}
}
schema:
type: object
properties:
mappings:
type: object
description: |-
Any applicable mappings of ``addresses`` to Matrix User IDs. Addresses
which do not have associations will not be included, which can make
this property be an empty object.
title: AssociatedMappings
additionalProperties:
type: string
required: ['mappings']
400:
description:
The client's request was invalid in some way. One possible problem could
be the ``pepper`` being invalid after the server has rotated it - this is
presented with the ``M_INVALID_PEPPER`` error code. Clients SHOULD make
a call to ``/hash_details`` to get a new pepper in this scenario, being
careful to avoid retry loops.
examples:
application/json: {
"errcode": "M_INVALID_PEPPER",
"error": "Unknown or invalid pepper - has it been rotated?"
}
schema:
$ref: "../client-server/definitions/errors/error.yaml"
146 changes: 145 additions & 1 deletion specification/identity_service_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,23 @@ should allow a 3PID to be mapped to a Matrix user identity, but not in the other
direction (i.e. one should not be able to get all 3PIDs associated with a Matrix
user ID, or get all 3PIDs associated with a 3PID).

Version 1 API deprecation
-------------------------

.. TODO: Remove this section when the v1 API is removed.
As described on each of the version 1 endpoints, the v1 API is deprecated in
favour of the v2 API described here. The major difference, with the exception
of a few isolated cases, is that the v2 API requires authentication to ensure
the user has given permission for the identity server to operate on their data.

The v1 API is planned to be removed from the specification in a future version.

Clients SHOULD attempt the v2 endpoints first, and if they receive a ``404``,
``400``, or similar error they should try the v1 endpoint or fail the operation.
Clients are strongly encouraged to warn the user of the risks in using the v1 API,
if they are planning on using it.

Web browser clients
-------------------

Expand Down Expand Up @@ -258,7 +275,134 @@ Association lookup

{{lookup_is_http_api}}

.. TODO: TravisR - Add v2 lookup API in future PR
{{v2_lookup_is_http_api}}

Client behaviour
~~~~~~~~~~~~~~~~

.. TODO: Remove this note when v1 is removed completely
.. Note::
This section only covers the v2 lookup endpoint. The v1 endpoint is described
in isolation above.

Prior to performing a lookup clients SHOULD make a request to the ``/hash_details``
endpoint to determine what algorithms the server supports (described in more detail
below). The client then uses this information to form a ``/lookup`` request and
receive known bindings from the server.

Clients MUST support at least the ``sha256`` algorithm.

Server behaviour
~~~~~~~~~~~~~~~~

.. TODO: Remove this note when v1 is removed completely
.. Note::
This section only covers the v2 lookup endpoint. The v1 endpoint is described
in isolation above.

Servers, upon receipt of a ``/lookup`` request, will compare the query against
known bindings it has, hashing the identifiers it knows about as needed to
verify exact matches to the request.

Servers MUST support at least the ``sha256`` algorithm.

Algorithms
~~~~~~~~~~

Some algorithms are defined as part of the specification, however other formats
can be negotiated between the client and server using ``/hash_details``.

``sha256``
++++++++++

This algorithm MUST be supported by clients and servers at a minimum. It is
additionally the preferred algorithm for lookups.

When using this algorithm, the client converts the query first into strings
separated by spaces in the format ``<address> <medium> <pepper>``. The ``<pepper>``
is retrieved from ``/hash_details``, the ``<medium>`` is typically ``email`` or
``msisdn`` (both lowercase), and the ``<address>`` is the 3PID to search for.
For example, if the client wanted to know about ``alice@example.org``'s bindings,
it would first format the query as ``alice@example.org email ThePepperGoesHere``.

.. admonition:: Rationale

Mediums and peppers are appended to the address to prevent a common prefix
for each 3PID, helping prevent attackers from pre-computing the internal state
of the hash function.

After formatting each query, the string is run through SHA-256 as defined by
`RFC 4634 <https://tools.ietf.org/html/rfc4634>`_. The resulting bytes are then
encoded using URL-Safe `Unpadded Base64`_ (similar to `room version 4's
event ID format <../../rooms/v4.html#event-ids>`_).

An example set of queries when using the pepper ``matrixrocks`` would be::

"alice@example.com email matrixrocks" -> "4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc"
"bob@example.com email matrixrocks" -> "LJwSazmv46n0hlMlsb_iYxI0_HXEqy_yj6Jm636cdT8"
"18005552067 msisdn matrixrocks" -> "nlo35_T5fzSGZzJApqu8lgIudJvmOQtDaHtr-I4rU7I"


The set of hashes is then given as the ``addresses`` array in ``/lookup``. Note
that the pepper used MUST be supplied as ``pepper`` in the ``/lookup`` request.

``none``
++++++++

This algorithm performs plaintext lookups on the identity server. Typically this
algorithm should not be used due to the security concerns of unhashed identifiers,
however some scenarios (such as LDAP-backed identity servers) prevent the use of
hashed identifiers. Identity servers (and optionally clients) can use this algorithm
to perform those kinds of lookups.

Similar to the ``sha256`` algorithm, the client converts the queries into strings
separated by spaces in the format ``<address> <medium>`` - note the lack of ``<pepper>``.
For example, if the client wanted to know about ``alice@example.org``'s bindings,
it would format the query as ``alice@example.org email``.

The formatted strings are then given as the ``addresses`` in ``/lookup``. Note that
the ``pepper`` is still required, and must be provided to ensure the client has made
an appropriate request to ``/hash_details`` first.

Security considerations
~~~~~~~~~~~~~~~~~~~~~~~

.. Note::
`MSC2134 <https://github.com/matrix-org/matrix-doc/pull/2134>`_ has much more
information about the security considerations made for this section of the
specification. This section covers the high-level details for why the specification
is the way it is.

Typically the lookup endpoint is used when a client has an unknown 3PID it wants to
find a Matrix User ID for. Clients normally do this kind of lookup when inviting new
users to a room or searching a user's address book to find any Matrix users they may
not have discovered yet. Rogue or malicious identity servers could harvest this
unknown information and do nefarious things with it if it were sent in plain text.
In order to protect the privacy of users who might not have a Matrix identifier bound
to their 3PID addresses, the specification attempts to make it difficult to harvest
3PIDs.

.. admonition:: Rationale

Hashing identifiers, while not perfect, helps make the effort required to harvest
identifiers significantly higher. Phone numbers in particular are still difficult
to protect with hashing, however hashing is objectively better than not.

An alternative to hashing would be using bcrypt or similar with many rounds, however
by nature of needing to serve mobile clients and clients on limited hardware the
solution needs be kept relatively lightweight.

Clients should be cautious of servers not rotating their pepper very often, and
potentially of servers which use a weak pepper - these servers may be attempting to
brute force the identifiers or use rainbow tables to mine the addresses. Similarly,
clients which support the ``none`` algorithm should consider at least warning the user
of the risks in sending identifiers in plain text to the identity server.

Addresses are still potentially reversable using a calculated rainbow table given
some identifiers, such as phone numbers, common email address domains, and leaked
addresses are easily calculated. For example, phone numbers can have roughly 12
digits to them, making them an easier target for attack than email addresses.


Establishing associations
-------------------------
Expand Down

0 comments on commit 6cfd761

Please sign in to comment.