-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust: MemoryStore wedging? #77
Comments
Just before the encryption event response is returned, https://github.com/matrix-org/matrix-rust-sdk/blob/main/crates/matrix-sdk/src/sliding_sync/mod.rs#L294 is logged, which later on acquires the offending lock https://github.com/matrix-org/matrix-rust-sdk/blob/main/crates/matrix-sdk/src/sliding_sync/mod.rs#L331 so I'm fairly sure this is still blocked and is blocking |
There was another flake in https://github.com/matrix-org/matrix-rust-sdk/actions/runs/9503969571/job/26195597830?pr=3539 - this time we got past the encryption state and see "Sending encrypted event because the room is encrypted" for Charlie's message send, but then wedge after getting the let _sync_lock = self.sync_lock().lock().await;
let mut room_info = room.clone_info();
room_info.mark_members_synced();
changes.add_room(room_info);
self.store.save_changes(&changes).await?;
self.apply_changes(&changes, false); ..once again implicating either the sync lock or the data store (via |
The sync lock seems to be consistently held throughout calls to |
For https://github.com/matrix-org/matrix-rust-sdk/actions/runs/9684285011/job/26721675685 - same thing, wedging after |
* Add trace logging to memory store To help debug matrix-org/complement-crypto#77 * Missing import * Review comments
It reproduced with trace logging. It deadlocks somewhere in this code block |
..and then it wedges. So it blocks on |
This implicates this block of code: async fn get_state_events(
&self,
room_id: &RoomId,
event_type: StateEventType,
) -> Result<Vec<RawAnySyncOrStrippedState>> {
fn get_events<T>(
state_map: &HashMap<OwnedRoomId, HashMap<StateEventType, HashMap<String, Raw<T>>>>,
room_id: &RoomId,
event_type: &StateEventType,
to_enum: fn(Raw<T>) -> RawAnySyncOrStrippedState,
) -> Option<Vec<RawAnySyncOrStrippedState>> {
let state_events = state_map.get(room_id)?.get(event_type)?;
Some(state_events.values().cloned().map(to_enum).collect())
}
Ok(get_events(
&self.stripped_room_state.read().unwrap(),
room_id,
&event_type,
RawAnySyncOrStrippedState::Stripped,
)
.or_else(|| {
get_events(
&self.room_state.read().unwrap(),
room_id,
&event_type,
RawAnySyncOrStrippedState::Sync,
)
})
.unwrap_or_default())
} because this block acquires a read lock on stripped_room_state then potentially tries to acquire a read lock on room_state which it can't do - it's an ABBA deadlock. I need confirmation from rust folks if this is how the locking works. |
@poljar confirms that the stripped_room_state lock is still held when the room_state lock is acquired. As such, this would cause a deadlock. |
#115 moves us away from using the memory store |
See https://github.com/matrix-org/matrix-rust-sdk/actions/runs/9351153389/job/25736195990?pr=3496
Times out on
pushNotifEventID = bob.SendMessage(t, roomID, text)
.Upon investigation, the logs show that the rust SDK did not send the event. It seems to get stuck after
GET /state/m.room.encryption/
as there is no log line for "Sending encrypted event because the room is encrypted" but there is a 200 OK response to that GET. There is 1 lock that is taken between the two bits of code:https://github.com/matrix-org/matrix-rust-sdk/blob/2cf36c7a5ed60c75824bea80422a8dbf829b1bed/crates/matrix-sdk/src/room/mod.rs#L486
I guess something is stopping the
send_raw
span from taking that lock, probably thenext_sync_with_lock > sync_once{pos="1"}
span which seems to be doing stuff at the same time.The text was updated successfully, but these errors were encountered: