Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the supported HTML subset for message events #1562

Merged
merged 6 commits into from
Aug 28, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/client_server/newsfragments/1562.clarification
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Clarify the supported HTML features for room messages.
4 changes: 3 additions & 1 deletion event-schemas/examples/m.room.message#m.notice
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"age": 242352,
"content": {
"body": "This is an example notice",
"msgtype": "m.notice"
"msgtype": "m.notice",
"format": "org.matrix.custom.html",
"formatted_body": "This is an <strong>example</strong> notice"
},
"origin_server_ts": 1431961217939,
"event_id": "$WLGTSEFSEF:localhost",
Expand Down
48 changes: 48 additions & 0 deletions specification/modules/instant_messaging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,54 @@ of message being sent. Each type has their own required and optional keys, as
outlined below. If a client cannot display the given ``msgtype`` then it SHOULD
display the fallback plain text ``body`` key instead.

Some message types support HTML in the event content that clients should prefer
to display if available. Currently ``m.text``, ``m.emote``, and ``m.notice``
support an additional ``format`` parameter of ``org.matrix.custom.html``. When
this field is present, a ``formatted_body`` with the HTML must be provided. The
plain text version of the HTML should be provided in the ``body``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a informational link through to #1225 as an FYI on how this might evolve in general, to reassure people we're not baking in the crappy formatted_body thing forever. (do we have a way of doing informational blocks in the spec atm, which aren't normative but just FYI?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a "note" which ends up being a dark gray box of text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a note further down


Clients should limit the HTML they render to avoid Cross-Site Scripting, HTML
injection, and similar attacks. The strongly suggested set of HTML tags to permit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also specify a maximum nesting of the tags, to avoid thousands of nested b-tags crashing browsers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does arbitrarily picking the number 42 seem reasonable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftr went with 100 further down

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i was worried with 42 being too small given scope for legitimate nesting from RTE copy-paste antics)

denying the use and rendering of anything else, is: ``font``, ``del``, ``h1``,
``h2``, ``h3``, ``h4``, ``h5``, ``h6``, ``blockquote``, ``p``, ``a``, ``ul``,
``ol``, ``sup``, ``sub``, ``nl``, ``li``, ``b``, ``i``, ``u``, ``strong``, ``em``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no idea what nl is doing in there. i'd be inclined to remove it.

``strike``, ``code``, ``hr``, ``br``, ``div``, ``table``, ``thead``, ``tbody``,
``tr``, ``th``, ``td``, ``caption``, ``pre``, ``span``, ``img``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove the table stuff (table, thead, tbody, tr, th, td, caption) from this list as nobody uses it, and it makes implementation much harder on non-web-based clients. We could include it as a "you could always whitelist this if your client can render it safely, but it's not currently recommended" perhaps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use tables all the time in various alerts from things. It's the closest we have to arbitrary fields like slack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.


Not all attributes on those tags should be permitted as they may be avenues for
other disruption attempts, such as adding ``onclick`` handlers or excessively
large text. Clients should only permit the attributes listed for the tags below.
Where ``data-mx-bg-color`` and ``data-mx-color`` are listed, clients should
translate the value (a 6-character hex color code) to the appropriate CSS/attributes
for the tag.


:``font``:
``data-mx-bg-color``, ``data-mx-color``

:``span``:
``data-mx-bg-color``, ``data-mx-color``

:``a``:
``name``, ``target``, ``href`` (provided the value is not relative and has a scheme
matching one of: ``https``, ``http``, ``ftp``, ``mailto``, ``magnet``)

:``img``:
``width``, ``height``, ``alt``, ``title``, ``src`` (provided it is a Matrix Content
URI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"provided it uses the mxc:// scheme" perhaps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that people generally don't know how the media repository works, or what an "mxc://" scheme is. To be fair, a "Matrix Content URI" also doesn't click.

A link on mxc:// maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that making "Matrix Content URI" to the section where it's defined (#module-content) would be helpful. In that section, it's called "Matrix Content (MXC) URI", so I think we should go with that.


:``ol``:
``start``

:``code``:
``class`` (only classes which start with ``language-`` for syntax highlighting)


Additionally, clients should ensure that *all* ``a`` tags get a ``rel="noopener"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/clients/web clients/

Copy link
Contributor

@Half-Shot Half-Shot Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping tables, lots of bots n things use it. The discord bridge will be using it to display discords tables in clients.


Apparently FastHub (android client) has rendered all inline comments in one thread, will move this comment when at a PC

to prevent the target page from referencing the client's tab/window.



{{msgtype_events}}


Expand Down