Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve [Matrix] badge generation #2527

Merged
merged 30 commits into from
Dec 20, 2018

Conversation

babolivier
Copy link
Contributor

@babolivier babolivier commented Dec 13, 2018

Fixes #2524

This PR addresses the issues expressed in #2524, in that it:

  • checks if a server has advertised a FQDN it can be reached at and if that FQDN hosts Matrix's client APIs
  • uses room aliases instead of room IDs, in order to avoid a badge being impossible to generate if the server that created the room leaves it

It also changes the URI syntax for badges from /matrix/ROOM_ID_LOCALPART/SERVER_NAME.svg to /matrix/STRIPPED_ROOM_ALIAS.svg, where STRIPPED_ROOM_ALIAS is an alias of the room that has had its hash (#) stripped from it.

Finally, it also adds the use of an in-memory store for access tokens, in order to prevent registering a new user each time a badge is requested.

I also updated the documentation and tests, and added a new test case, of a failed room alias lookup.

FYI tests are failing on my laptop, and couldn't find out why despite an hour spent looking into it. Let's see whether it goes well with the CI. If not I'll have another look at them in the next few days.

Also please keep in mind that I haven't been writing JavaScript for a loooong time, so I might be a bit rusty 🙂

And finally, huge shout out and thanks to @fr1kin and the reviewers who did an amazing job at getting the first version of badge generation for Matrix written up. We (New Vector, the company behind Riot and most of the work on Matrix) really appreciate it 😃

Some Matrix homeserver only use the address in the SRV record for federation, which uses its own set of routes (and can sometimes use self-signed certificates, or certs using the server's name rather than the address in their SRV record).
Therefore, we check if the client can contact the client APIs beforehand, and fallback to the server's name in case of an error.
@shields-ci
Copy link

shields-ci commented Dec 13, 2018

Messages
📖 ✨ Thanks for your contribution to Shields, @babolivier!

Generated by 🚫 dangerJS against 61f7194

@paulmelnikow paulmelnikow added the service-badge Accepted and actionable changes, features, and bugs label Dec 14, 2018
@PyvesB
Copy link
Member

PyvesB commented Dec 14, 2018

@fr1kin, as you are the author of our Matrix badge, would you like to review this pull request and share your thoughts? 😉

@chris48s chris48s changed the title Improve Matrix badge generation Improve [matrix] badge generation Dec 16, 2018
@chris48s
Copy link
Member

I haven't reviewed this fully, but one problem I can see here is that in some cases we're going to need to make 3 or 4 (sequential) HTTP requests in order to render a single badge image.

The most common place our badges are used is on GitHub where any image that takes more than 4 seconds to load will render as a broken image (even if the call eventually returns an image). We try to avoid badges which make multiple calls because the additional network latency of multiple requests can often causes the badges not to render in the environment where they are most commonly used.

A couple of suggestions we might consider here:

  1. Instead of adding additional network I/O to the transaction, could this issue be better addressed by adding an additional (optional?) parameter to allow the user to manually specify the FQDN if it is not the same as the server name. Its nice to infer it, but there is a hard limit on performance to consider.
  2. Is there any scope to try and do some of this network I/O in parallel to reduce wait time? For example, instead of trying this.registerAccount({ host, guest: true }) and then this.registerAccount({ host, guest: false }) if it fails, could we fire this.registerAccount({ host, guest: true }) and this.registerAccount({ host, guest: false }) at the same time and just use the results of whichever yields a valid access_token first? Even if we end up making a larger number of requests, we may be able to get it all done in a smaller elapsed time which would result in better performance for the end user.

Having run the test suite, there are some test failures which need to be addressed. Also as proposed, this PR changes the route without considering backwards-compatibility for existing users but how we address performance probably impacts on those issues, so lets think about that first.

@babolivier
Copy link
Contributor Author

babolivier commented Dec 16, 2018

we're going to need to make 3 or 4 (sequential) HTTP requests in order to render a single badge image

More likely 2 to 5 requests, because best case scenario we already have an access token for the server in the in-memory store, and worst case scenario we have to check if the client APIs can be reached on the provided FQDN, we don't have an access token for the server and it doesn't accept guest registration. But I get your point nonetheless.

For the record, a test call I just did using the #matrix:matrix.org alias, matching a big rooms with thousands of users from a server that's known to be slow due to heavy load, with no access token stored in memory, and 4 HTTP requests (which would, in my opinion, be quite close to the worst case scenario), takes 2.5s to return, which is imho quite far from the 4s threshold.

Instead of adding additional network I/O to the transaction, could this issue be better addressed by adding an additional (optional?) parameter to allow the user to manually specify the FQDN if it is not the same as the server name. Its nice to infer it, but there is a hard limit on performance to consider.

This would indeed be interesting, and my favourite solution between both of them. It should be kept optional, though, in order not to make it harder to grasp how to build a badge's URL, imho.

Even if we end up making a larger number of requests, we may be able to get it all done in a smaller elapsed time which would result in better performance for the end user.

My concern isn't so much the amount of requests, here, but rather not flooding server databases with a ton of throwaway accounts, half of them being useless. The in-memory store for access tokens I included in my PR is also an effort towards that end.

Having run the test suite, there are some test failures which need to be addressed

As I mentioned in my initial comment, some tests would fail with very non-explicit error messages, and for reasons I couldn't understand despite spending a whole hour trying to. I figured I'd open the PR anyway and see if the tests passed with the CI in case it would be caused by my local setup. They passed when I opened the PR, so I didn't think much more about them.

As far as I understand, they've been restarted and are now failing (which I discover just now) with the same error messages as on my local setup, so I'll try and investigate further. Is there a communication channel I can ask for help in in case I don't manage to make them pass?

Also as proposed, this PR changes the route without considering backwards-compatibility for existing users

I didn't mention it in my initial comment, but I considered this when working on it. My take was that, since it's a badge that has only been available for around a week ago, and didn't get a lot of attention when announced at the time, I assumed that "existing users" would only match a very small amount of people, if any (but of course I might be wrong).

Switching from room IDs to room aliases breaks backward compatibility in any case (except if we want to support both, which seems a bit ugly imho), so I thought it would be be an improvement to use a format that would be more obvious to Matrix users (as I personally had some troubles understanding how badge URLs should be built when I discovered it).

Finally, though that's only me being picky here, regarding the renaming of the PR: "[matrix]" is only the project's logo, which name is actually "Matrix" (or "Matrix.org"), as used on https://matrix.org/, hence me using "Matrix" in the PR's name.

@chris48s
Copy link
Member

This would indeed be interesting, and my favourite solution between both of them

Mine too. Sounds like a plan then. We can assume the server name matches the FQDN by default but accept a FQDN as an optional param for the cases where it doesn't to save 1 network call.

The in-memory store for access tokens I included in my PR is also an effort towards that end

👍 Some kind of token persistance to cut down on the level of network I/O needed to render a badge after the first request is a good plan, but I suspect this would be better implemented using the same redis token peristance layer we're using for GH tokens. That way multiple server instances can share the same cache (if you do it as an in-memory object every server in the cluster will have its own different cache). It will also persist across restarts. Unfortunately I've not done much work with that. Maybe @paulmelnikow can provide some better advice on this.

If that ends up being complex, it might also be useful to split that work out into a different PR to improve performance in order to avoid blocking the main bugfix on that.

regarding the renaming of the PR...

I renamed your PR so that we would run the tests. See: https://github.com/badges/shields/blob/master/doc/service-tests.md#pull-requests . The square brackets are there so that we can parse the PR title and work out what tests to run in the CI build. If you want to change the capitalisation within the square brackets, it is case-insensitive - edit it as you like.

By completely unrelated coincidence, it appears that the logo for this service also contains square brackets.

As far as I understand, they've been restarted and are now failing

The first time no service tests were run. I restarted the build after renaming the PR as described above which is why the build is now failing.

Is there a communication channel I can ask for help in in case I don't manage to make them pass?

Yep: https://discord.gg/HjJCwm5 Feel free to ask in #dev

Also check out the service test docs: https://github.com/badges/shields/blob/master/doc/service-tests.md

If you want more info on test failures, you can run the tests with extra debug info using npm run test:services:trace -- --only=matrix

@babolivier
Copy link
Contributor Author

babolivier commented Dec 16, 2018

Mine too. Sounds like a plan then. We can assume the server name matches the FQDN by default but accept a FQDN as an optional param for the cases where it doesn't to save 1 network call.

Right, I'll have a look at it soon then.

+1 Some kind of token persistance to cut down on the level of network I/O needed to render a badge after the first request is a good plan, but I suspect this would be better implemented using the same redis token peristance layer we're using for GH tokens. That way multiple server instances can share the same cache (if you do it as an in-memory object every server in the cluster will have its own different cache). It will also persist across restarts. Unfortunately I've not done much work with that. Maybe @paulmelnikow can provide some better advice on this.

I did this in-memory because I didn't see from a quick glance any caching system in use (which I'd have looked into if I had noticed it), and totally agree on that it's better to have some kind of persistence and to be able to share tokens between instances. I'll investigate it further then.

I renamed your PR so that we would run the tests. See: https://github.com/badges/shields/blob/master/doc/service-tests.md#pull-requests . The square brackets are there so that we can parse the PR title and work out what tests to run in the CI build. If you want to change the capitalisation within the square brackets, it is case-insensitive - edit it as you like.

Oh, that explains why the tests were not run by the CI. I mistaken this with a common mistake people sometimes make by calling Matrix "[matrix]", but apparently didn't read the tests doc well enough. Not that I really matters, though, I was just being picky.

Yep: https://discord.gg/HjJCwm5 Feel free to ask in #dev

Also check out the service test docs: https://github.com/badges/shields/blob/master/doc/service-tests.md

If you want more info on test failures, you can run the tests with extra debug info using npm run test:services:trace -- --only=matrix

Roger that, thanks for all of this useful info!

@paulmelnikow
Copy link
Member

Hi! Thanks so much for working on this!

Long discussion below, which feels out of scope for this PR and may be more than you'd like to take on. Tldr, we could either merge the in-module cache and clean it up later, or implement it sans cache and add the caching when the work above can be done.

I'm inclined to implement sans cache because it's easier to say no than it is to untangle things later, and the performance boost is a good motivation to finish this work "at quality."

I did this in-memory because I didn't see from a quick glance any caching system in use (which I'd have looked into if I had noticed it), and totally agree on that it's better to have some kind of persistence and to be able to share tokens between instances. I'll investigate it further then.

Apart from GitHub, we don't have anything like this in place today. That said, I agree we should share the tokens between the servers, and I'm happy to nudge this forward along with #1848 and the yak shave that #1205 has been.

It probably makes sense to create an abstraction layer as discussed at #1848 (comment) so the Redis support can be unit tested and turned on in production, and tests can function with an on-disk or in-memory store.

Another small concern related to this is testing. In the context of testing, running a service needs to be side-effect-free. The way this is handled now is via the reset() function in server.js (note that code is in flight in #2519).

Probably at this point we should create a reusable chunk of persistence code. FsTokenPersistence and RedisTokenPersistence are almost the right thing, but need to be generalized slightly so they can handle key-value assignment.

Then an instance of the persistence can be created in lib/server.js (see #2519) for each service which needs it, injected to the service via register(), and reset in reset(). GitHubConstellation can continue to create an instance, as it does now.

I can prioritize this work after #2496 done. Probably #1848 and #1806 need to happen first.

@babolivier
Copy link
Contributor Author

babolivier commented Dec 17, 2018

I'm inclined to implement sans cache because it's easier to say no than it is to untangle things later, and the performance boost is a good motivation to finish this work "at quality."

Makes sense. Then what's left for me is to focus on removing the in-memory cache from my PR and making tests pass. I'll try to keep an eye on the caching work after that and contribute where/if I can (but can't make any promise on that).

@babolivier
Copy link
Contributor Author

babolivier commented Dec 18, 2018

Tests are now fixed (thanks to @paulmelnikow), the PR is ready for review 🙂

}
}

static get examples() {
return [
{
title: 'Matrix',
exampleUrl: '!ltIjvaLydYAWZyihee/matrix.org',
pattern: ':roomId/:host',
exampleUrl: 'twim:matrix.org',
Copy link
Member

@calebcartwright calebcartwright Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use namedParams here instead of exampleUrl? See #2308 for more info

I think it would be something like this if I'm reading the route pattern correctly

namedParams: {
  roomAlias: 'twin:matrix.org',
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3ac4a25 (and d655fa1)

host = splitAlias[2] + splitAlias[3]
break
default:
throw new errors.InvalidParameter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be immediately obvious to users what the error is with the value they have provided for roomAlias or would it be helpful to add a prettyMessage? For example:

...InvalidParameter({ prettyMessage: ' whatever makes sense here' })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, right, that was before the second (optional) parameter was added, so it was obvious back then, but not that much now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 75b799d

@babolivier babolivier changed the title Improve [matrix] badge generation Improve [Matrix] badge generation Dec 20, 2018
@@ -8,7 +8,7 @@ const t = new ServiceTester({ id: 'matrix', title: 'Matrix' })
module.exports = t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to switch to a shorthand for all the service tests. Would you mind adding the following line:

const t = (module.exports = require('../create-service-tester')())

and and remove lines 4, 7, & 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fad3add

host = splitAlias[2] + splitAlias[3]
break
default:
throw new errors.InvalidParameter({ prettyMessage: 'invalid alias' })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a solid suite of tests already 👏, but I noticed there's no tests for these latter two cases (invalid alias and when room alias includes a port). Would it be feasible to add tests for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

@babolivier babolivier Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8bb5ebe (which was apparently necessary as I hadn't implemented the latter case correctly 😛 )

Copy link
Member

@calebcartwright calebcartwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple minor feedback items (which you have already fixed 😄) but it is looking good!

I haven't gone through the above discussions on auth/networking/etc. in detail so I'll defer approval on this to a core team member (who will be needed to merge this anyway).

Thanks so much for this!

</br>
If that is the case of the homeserver that created the room alias used for generating the badge, you will need to add an extra parameter to the URL, by adding a forward slash followed by the server's FQDN (fully qualified domain name) between the room alias and the <code>.svg</code> extension.
</br>
The final badge URL should then look something like this <code>/matrix/mysuperroom:example.com/matrix.example.com.svg</code>.
Copy link
Member

@paulmelnikow paulmelnikow Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option we have with the API is to put the FQDN in the query string, like the way we handle custom NPM registries. Would that be better?

Usually obscure parameters are handled that way, whereas commonly used parameters are placed into the path.

I don't have an opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I didn't think about putting it there. I agree on that it belongs there rather than the path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gitlab pipeline status badge is a good modern example. Are you up for updating that? I'm good to get this merged after that! Really appreciate all your work on this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 410015e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, didn't see your messages in the meantime. I based my work on the NPM badges (in a much simpler version though).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prob :) I just pushed a commit with validation, and switching to snake case which for better or worse is what we've used for query params in services.

I'll merge this in a second if all looks good!

Copy link
Member

@paulmelnikow paulmelnikow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for picking up this work, @baboliver, and for jumping on the review @calebcartwright!

} else throw e
async lookupRoomAlias({ host, roomAlias, accessToken }) {
return this._requestJson({
url: `https://${host}/_matrix/client/r0/directory/room/%23${roomAlias}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What characters are valid for a room alias? Would it be better to use

`https://${host}/_matrix/client/r0/directory/room/${encodeURIComponent(`#${roomAlias}`)}`

in case there is some other character that needs URL-encoding?

Copy link
Contributor Author

@babolivier babolivier Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fairly okay without it, but better safe than sorry. Good catch, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2a94b15

@paulmelnikow
Copy link
Member

I'm okay with breaking backward compatibility here. It's not something we usually do, though this badge is very new so hopefully folks will be able to take note quickly and update their badges.

@paulmelnikow paulmelnikow merged commit da5ca74 into badges:master Dec 20, 2018
@shields-deployment
Copy link

This pull request was merged to master branch. This change is now waiting for deployment, which will usually happen within a few days. Stay tuned by joining our #ops channel on Discord!

After deployment, changes are copied to gh-pages branch:

@chris48s
Copy link
Member

I'm okay with breaking backward compatibility here. It's not something we usually do, though this badge is very new so hopefully folks will be able to take note quickly and update their badges.

Doing a quick search, I think there are only 2 keen users who are affected by this:

https://github.com/CromFr/matrix-doge-bot/blob/master/README.md
https://github.com/fr1kin/ForgeHax/blob/master/README.md

@babolivier
Copy link
Contributor Author

https://github.com/CromFr/matrix-doge-bot/blob/master/README.md
https://github.com/fr1kin/ForgeHax/blob/master/README.md

The first one is a friend of mine, the second one is the initial contributor of the Matrix badge who got pinged earlier in the discussion, so I think we should be fine.

@paulmelnikow
Copy link
Member

cc @fr1kin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants