matrix-org · tulir · May 1, 2023 · Aug 9, 2023 · Aug 9, 2023 · Feb 27, 2024
@@ -0,0 +1 @@
+Clarify that arbitrary unicode is allowed in user/room IDs and room aliases.
@@ -611,11 +611,20 @@ characters permitted in user ID localparts. There are currently active
 users whose user IDs do not conform to the permitted character set, and
 a number of rooms whose history includes events with a `sender` which
 does not conform. In order to handle these rooms successfully, clients
-and servers MUST accept user IDs with localparts from the expanded
-character set:
+and servers MUST accept user IDs with localparts consisting of any legal
+unicode codepoint except for `:`, including zero characters. Localparts
+MUST be valid UTF-8 sequences.
+
+Servers SHOULD NOT produce user IDs with localparts outside of the following
+character set, and SHOULD NOT forward such user IDs to clients when referenced
+outside the context of an event. For example, device list updates from "invalid"
+user IDs would be dropped by the receiving server.
-Servers SHOULD NOT produce user IDs with localparts outside of the following
-character set, and SHOULD NOT forward such user IDs to clients when referenced
-outside the context of an event. For example, device list updates from "invalid"
-user IDs would be dropped by the receiving server.
+User IDs with localparts containing characters outside the range U+0021 to U+007E, or with
+an empty localpart, are considered non-compliant. For current room versions, servers must
+still accept events using such user IDs over federation; however they SHOULD NOT forward
+such user IDs to clients when referenced outside the context of an event. For example,
+device list updates from non-compliant user IDs would be dropped by the receiving server.
-Servers SHOULD NOT produce user IDs with localparts outside of the following
-character set, and SHOULD NOT forward such user IDs to clients when referenced
-outside the context of an event. For example, device list updates from "invalid"
-user IDs would be dropped by the receiving server.
+User IDs with localparts containing characters outside the range U+0021 to U+007E, or with
+an empty localpart, are considered non-compliant. For current room versions, servers must
+still accept events using such user IDs over federation; however they SHOULD NOT forward
+such user IDs to clients when referenced outside the context of an event. For example,
+device list updates from non-compliant user IDs would be dropped by the receiving server.
 
     extended_user_id_char = %x21-39 / %x3B-7E  ; all ASCII printing chars except :
 
+A future room version may prevent users using a historical character set
+from participating. Use of the historical character set is *deprecated*.
+
 ##### Mapping from other character sets
 
 In certain circumstances it will be desirable to map from a wider
@@ -663,6 +672,10 @@ Room IDs are case-sensitive. They are not meant to be
 human-readable. They are intended to be treated as fully opaque strings
 by clients.
 
+The localpart of a room ID (`opaque_id` above) may contain any valid
+unicode codepoints except `:`, but it is recommended to only include
+ASCII letters and digits when generating them.
+
 #### Room Aliases
 
 A room may have zero or more aliases. A room alias has the format:
@@ -673,8 +686,11 @@ The `domain` of a room alias is the [server name](#server-name) of the
 homeserver which created the alias. Other servers may contact this
 homeserver to look up the alias.
 
-Room aliases MUST NOT exceed 255 bytes (including the `#` sigil and the
-domain).
+The localpart of a room alias may contain any valid unicode codepoints
+except `:`.
+
+Room aliases MUST NOT exceed 255 bytes as UTF-8 (including the `#` sigil
+and the domain).
 
 #### Event IDs