-
-
Notifications
You must be signed in to change notification settings - Fork 828
Add support for platform specific event indexing and search #3550
Conversation
This patch adds support to create an event index if the clients platform supports it and starts an event crawler. The event crawler goes through the room history of encrypted rooms and eventually indexes the whole room history of such rooms. It does this by first creating crawling checkpoints and storing them inside a database. A checkpoint consists of a room_id, direction and token. After the checkpoints are added the client starts a crawler method in the background. The crawler goes through checkpoints in a round-robin way and uses them to fetch historic room messages using the rooms/roomId/messages API endpoint. Every time messages are fetched a new checkpoint is created that will be stored in the database with the fetched events in an atomic way, the old checkpoint is deleted at the same time as well.
A sync call may not have all events that happened since the last time the client synced. In such a case the room is marked as limited and events need to be fetched separately. When such a sync call happens our event index will have a gap. To close the gap checkpoints are added to start crawling our room again. Unnecessary full re-crawls are prevented by checking if our current /room/roomId/messages request contains only events that were already present in our event index.
This patch extends our search to include our platform specific event index. There are 3 search scenarios and are handled differently when platform support for indexing is present: - Search a single non-encrypted room: Use the server-side search like before. - Search a single encrypted room: Search using our platform specific event index. - Search across all rooms: Search encrypted rooms using our local event index. Search non-encrypted rooms using the classic server-side search. Combine the results. The combined search will result in having twice the amount of search results since comparing the scores fairly wasn't deemed sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial style comments
src/BasePlatform.js
Outdated
@@ -151,4 +151,44 @@ export default class BasePlatform { | |||
async setMinimizeToTrayEnabled(enabled: boolean): void { | |||
throw new Error("Unimplemented"); | |||
} | |||
|
|||
supportsEventIndexing(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: perhaps we can put all the methods here related to event indexing into an object returned from the platform? E.g. platform.getEventIndexingManager
or whatever, returning null on unsupported platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, one way to abstract it. Depends on if we want to declare the event plumbing (convert the MatrixEvent into a plain json object) platform specific to Seshat or if we want it to be more of a guidance for other indexer implementations ("hey your platform specific indexer needs to accept these types and return such a search result").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @bwindels... Since it's expected that all these methods would be either implemented or unimplemented as a set, it would be good to package them in single object and have one accessor here to retrieve it.
@@ -1262,6 +1270,8 @@ export default createReactClass({ | |||
// to do the first sync | |||
this.firstSyncComplete = false; | |||
this.firstSyncPromise = Promise.defer(); | |||
this.crawlerChekpoints = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crawlerCheckpoints?
console.log("Request to reset timeline in room ", roomId, " viewing:", self.state.currentRoomId); | ||
// TODO is there a better place to plug this in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the Room.timelineReset
event on the client you could listen for. It's probably best to avoid making this callback async as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that event but I don't think it fired reliably, or maybe it doesn't like multiple event listeners? I can retry moving it there. Any more info why not make it async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that event but I don't think it fired reliably
Hmm, interesting. It does seem to be fired unconditionally in EventTimelineSet. Any particular case you found it didn't fire?
Any more info why not make it async?
Just feels like this "informational" callback shouldn't have too many side-effects, let alone async ones. There's a slight risk of introducing race conditions as well in case multiple syncs would come in fast. And it would need some plumbing in the js-sdk, as the callback currently isn't expected to return a promise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really remember the specifics anymore, might have imagined it or done something stupid. Will change it up and see if it works.
@@ -1930,4 +2058,220 @@ export default createReactClass({ | |||
{view} | |||
</ErrorBoundary>; | |||
}, | |||
|
|||
async addLiveEventToIndex(ev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to move most of this code that's been added to MatrixChat
into a separate file/class, given how large it is already, and how independent the seshat integration seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should happen. It's kept mostly in a single file for this PR to make it easier to review. @dbkr mentioned that already and I agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely separate this out to a src/EventIndexing.js
or similar file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start! 😁 It may take a few rounds to get the right architecture in place. I'm happy chat in the room etc. if that would help.
Once we've got the right design, we should be able to merge this React SDK side, since it's all guarded on having an indexer from the platform.
src/BasePlatform.js
Outdated
@@ -151,4 +151,44 @@ export default class BasePlatform { | |||
async setMinimizeToTrayEnabled(enabled: boolean): void { | |||
throw new Error("Unimplemented"); | |||
} | |||
|
|||
supportsEventIndexing(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @bwindels... Since it's expected that all these methods would be either implemented or unimplemented as a set, it would be good to package them in single object and have one accessor here to retrieve it.
src/MatrixClientPeg.js
Outdated
@@ -222,6 +223,9 @@ class MatrixClientPeg { | |||
|
|||
this.matrixClient = createMatrixClient(opts); | |||
|
|||
const platform = PlatformPeg.get(); | |||
if (platform.supportsEventIndexing()) platform.initEventIndex(creds.userId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you don't need this to happen precisely here for some reason, I'd suggest moving this to Lifecycle.js
which has a startMatrixClient
function that should work well for starting a service such as this.
You could perhaps craft a start
method on the new indexer object suggested above, and then internally call MatrixClientPeg.get().getUserId()
and pass it to initEventIndex
.
if (platform.supportsEventIndexing()) { | ||
console.log("Seshat: Deleting event index."); | ||
this.crawlerRef.cancel(); | ||
await platform.deleteEventIndex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete index step would be more natural in Lifecycle.js
as part of the stopMatrixClient
. As above, maybe it happens in a new stop
method on the indexer object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopMatrixClient
is called on a soft logout as well, there is a function called onLoggedOut
in Lifecycle.js
which clears the storage. Would that be an ok place to put it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough... We do have a bit of mess of lifecycle hooks... 😓 I guess _clearStorage
(called by onLoggedOut
) is likely the best place, as that's where we'd also clear the existing storage of message keys, etc.
@@ -1930,4 +2058,220 @@ export default createReactClass({ | |||
{view} | |||
</ErrorBoundary>; | |||
}, | |||
|
|||
async addLiveEventToIndex(ev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely separate this out to a src/EventIndexing.js
or similar file.
@@ -1140,12 +1141,116 @@ module.exports = createReactClass({ | |||
} | |||
|
|||
debuglog("sending search request"); | |||
const platform = PlatformPeg.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These bits would also be better off in some kind of "searching" module so that here we're just calling search effectively and the change will approximately be switching to the new module instead of the client directly.
We need to support the deletion of the event index even if it's not currently initialized, therefore the deletion ended up in the EventIndexPeg class.
Everything should be fixed/updated, except the "M of N" log line and publicizing of the private js-sdk methods. The merge conflicts should be easy to fix since they seem to be related to labs flags only. |
src/EventIndexing.js
Outdated
// here. | ||
await sleep(this._crawlerTimeout); | ||
|
||
console.log("EventIndex: Running the crawler loop."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds like Matthew, Nad, and others will help advise on what's needed. Now that we have the feature flag, it feels safer to land with out it for now, since you have to opt-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued effort here! 😁
This looks ready to merge from my side (only the feature flag level seems like a required fix). Since we have both the feature flag and testing the platform at runtime, it's decoupled from the riot-web
portion and disabled by default, so should be safe to land when you're ready.
src/indexing/EventIndex.js
Outdated
import PlatformPeg from "../PlatformPeg"; | ||
import MatrixClientPeg from "../MatrixClientPeg"; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's tagging the property functions at this class level... I think the JSDoc lint is a bit confused, but maybe you can workaround it by making this not a JSDoc comment by changing this to a single star here...?
This patch-set adds support for search in E2E encrypted rooms.
We start of by extending our BasePlatform to have the necessary function prototypes. After that, support to create an index and start room history crawling. Followed by adding live events, deleting the index, and finally extending our search to make use of our event index.
This is crucially still missing any type of UI but is in a usable and seemingly mostly-stable state and ready for a first look.