Skip to content
This repository has been archived by the owner on Sep 11, 2024. It is now read-only.

Add support for platform specific event indexing and search #3550

Merged
merged 64 commits into from
Nov 20, 2019

Conversation

poljar
Copy link
Contributor

@poljar poljar commented Oct 11, 2019

This patch-set adds support for search in E2E encrypted rooms.

We start of by extending our BasePlatform to have the necessary function prototypes. After that, support to create an index and start room history crawling. Followed by adding live events, deleting the index, and finally extending our search to make use of our event index.

This is crucially still missing any type of UI but is in a usable and seemingly mostly-stable state and ready for a first look.

This patch adds support to create an event index if the clients platform
supports it and starts an event crawler.

The event crawler goes through the room history of encrypted rooms and
eventually indexes the whole room history of such rooms.

It does this by first creating crawling checkpoints and storing them
inside a database. A checkpoint consists of a room_id, direction and
token.

After the checkpoints are added the client starts a crawler method in
the background. The crawler goes through checkpoints in a round-robin
way and uses them to fetch historic room messages using the
rooms/roomId/messages API endpoint.

Every time messages are fetched a new checkpoint is created that will
be stored in the database with the fetched events in an atomic way, the
old checkpoint is deleted at the same time as well.
A sync call may not have all events that happened since the last time
the client synced. In such a case the room is marked as limited and
events need to be fetched separately.

When such a sync call happens our event index will have a gap. To
close the gap checkpoints are added to start crawling our room again.
Unnecessary full re-crawls are prevented by checking if our current
/room/roomId/messages request contains only events that were already
present in our event index.
This patch extends our search to include our platform specific event
index.

There are 3 search scenarios and are handled differently when platform
support for indexing is present:

    - Search a single non-encrypted room: Use the server-side search
        like before.
    - Search a single encrypted room: Search using our platform specific
        event index.
    - Search across all rooms: Search encrypted rooms using our local
        event index. Search non-encrypted rooms using the classic
        server-side search. Combine the results.

The combined search will result in having twice the amount of search
results since comparing the scores fairly wasn't deemed sensible.
Copy link
Contributor

@bwindels bwindels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial style comments

@@ -151,4 +151,44 @@ export default class BasePlatform {
async setMinimizeToTrayEnabled(enabled: boolean): void {
throw new Error("Unimplemented");
}

supportsEventIndexing(): boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: perhaps we can put all the methods here related to event indexing into an object returned from the platform? E.g. platform.getEventIndexingManager or whatever, returning null on unsupported platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, one way to abstract it. Depends on if we want to declare the event plumbing (convert the MatrixEvent into a plain json object) platform specific to Seshat or if we want it to be more of a guidance for other indexer implementations ("hey your platform specific indexer needs to accept these types and return such a search result").

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @bwindels... Since it's expected that all these methods would be either implemented or unimplemented as a set, it would be good to package them in single object and have one accessor here to retrieve it.

@@ -1262,6 +1270,8 @@ export default createReactClass({
// to do the first sync
this.firstSyncComplete = false;
this.firstSyncPromise = Promise.defer();
this.crawlerChekpoints = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crawlerCheckpoints?

console.log("Request to reset timeline in room ", roomId, " viewing:", self.state.currentRoomId);
// TODO is there a better place to plug this in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the Room.timelineReset event on the client you could listen for. It's probably best to avoid making this callback async as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that event but I don't think it fired reliably, or maybe it doesn't like multiple event listeners? I can retry moving it there. Any more info why not make it async?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that event but I don't think it fired reliably

Hmm, interesting. It does seem to be fired unconditionally in EventTimelineSet. Any particular case you found it didn't fire?

Any more info why not make it async?

Just feels like this "informational" callback shouldn't have too many side-effects, let alone async ones. There's a slight risk of introducing race conditions as well in case multiple syncs would come in fast. And it would need some plumbing in the js-sdk, as the callback currently isn't expected to return a promise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really remember the specifics anymore, might have imagined it or done something stupid. Will change it up and see if it works.

@@ -1930,4 +2058,220 @@ export default createReactClass({
{view}
</ErrorBoundary>;
},

async addLiveEventToIndex(ev) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to move most of this code that's been added to MatrixChat into a separate file/class, given how large it is already, and how independent the seshat integration seems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should happen. It's kept mostly in a single file for this PR to make it easier to review. @dbkr mentioned that already and I agreed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely separate this out to a src/EventIndexing.js or similar file.

@bwindels bwindels requested a review from a team October 28, 2019 17:11
@jryans jryans changed the title [RFC] Add support for platform specific event indexing and search. Add support for platform specific event indexing and search Oct 28, 2019
@turt2live turt2live requested review from bwindels and jryans and removed request for a team November 4, 2019 17:22
@jryans jryans removed the request for review from bwindels November 6, 2019 12:01
Copy link
Collaborator

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start! 😁 It may take a few rounds to get the right architecture in place. I'm happy chat in the room etc. if that would help.

Once we've got the right design, we should be able to merge this React SDK side, since it's all guarded on having an indexer from the platform.

@@ -151,4 +151,44 @@ export default class BasePlatform {
async setMinimizeToTrayEnabled(enabled: boolean): void {
throw new Error("Unimplemented");
}

supportsEventIndexing(): boolean {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @bwindels... Since it's expected that all these methods would be either implemented or unimplemented as a set, it would be good to package them in single object and have one accessor here to retrieve it.

@@ -222,6 +223,9 @@ class MatrixClientPeg {

this.matrixClient = createMatrixClient(opts);

const platform = PlatformPeg.get();
if (platform.supportsEventIndexing()) platform.initEventIndex(creds.userId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you don't need this to happen precisely here for some reason, I'd suggest moving this to Lifecycle.js which has a startMatrixClient function that should work well for starting a service such as this.

You could perhaps craft a start method on the new indexer object suggested above, and then internally call MatrixClientPeg.get().getUserId() and pass it to initEventIndex.

if (platform.supportsEventIndexing()) {
console.log("Seshat: Deleting event index.");
this.crawlerRef.cancel();
await platform.deleteEventIndex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete index step would be more natural in Lifecycle.js as part of the stopMatrixClient. As above, maybe it happens in a new stop method on the indexer object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopMatrixClient is called on a soft logout as well, there is a function called onLoggedOut in Lifecycle.js which clears the storage. Would that be an ok place to put it as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough... We do have a bit of mess of lifecycle hooks... 😓 I guess _clearStorage (called by onLoggedOut) is likely the best place, as that's where we'd also clear the existing storage of message keys, etc.

@@ -1930,4 +2058,220 @@ export default createReactClass({
{view}
</ErrorBoundary>;
},

async addLiveEventToIndex(ev) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely separate this out to a src/EventIndexing.js or similar file.

@@ -1140,12 +1141,116 @@ module.exports = createReactClass({
}

debuglog("sending search request");
const platform = PlatformPeg.get();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bits would also be better off in some kind of "searching" module so that here we're just calling search effectively and the change will approximately be switching to the new module instead of the client directly.

@poljar poljar requested a review from jryans November 19, 2019 13:34
@poljar
Copy link
Contributor Author

poljar commented Nov 19, 2019

Everything should be fixed/updated, except the "M of N" log line and publicizing of the private js-sdk methods.

The merge conflicts should be easy to fix since they seem to be related to labs flags only.

src/indexing/EventIndex.js Outdated Show resolved Hide resolved
src/indexing/EventIndex.js Outdated Show resolved Hide resolved
src/settings/Settings.js Outdated Show resolved Hide resolved
// here.
await sleep(this._crawlerTimeout);

console.log("EventIndex: Running the crawler loop.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds like Matthew, Nad, and others will help advise on what's needed. Now that we have the feature flag, it feels safer to land with out it for now, since you have to opt-in.

Copy link
Collaborator

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued effort here! 😁

This looks ready to merge from my side (only the feature flag level seems like a required fix). Since we have both the feature flag and testing the platform at runtime, it's decoupled from the riot-web portion and disabled by default, so should be safe to land when you're ready.

import PlatformPeg from "../PlatformPeg";
import MatrixClientPeg from "../MatrixClientPeg";

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's tagging the property functions at this class level... I think the JSDoc lint is a bit confused, but maybe you can workaround it by making this not a JSDoc comment by changing this to a single star here...?

@poljar poljar merged commit 4bd46f9 into develop Nov 20, 2019
@poljar poljar deleted the poljar/seshat-pr branch June 21, 2020 17:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants