Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues after client restart #20862

Closed
dbkr opened this issue Feb 2, 2022 · 17 comments · Fixed by matrix-org/matrix-js-sdk#2150
Closed

Multiple issues after client restart #20862

dbkr opened this issue Feb 2, 2022 · 17 comments · Fixed by matrix-org/matrix-js-sdk#2150
Assignees
Labels
A-Threads O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Critical Prevents work, causes data loss and/or has no workaround T-Defect Z-Labs

Comments

@dbkr
Copy link
Member

dbkr commented Feb 2, 2022

Steps to reproduce

  1. Reload app from one of the develop builds on 02/02 to develop as of now (10:00 03/02)

Multiple reports of this have come in now, some reloading from yesterday's nightly build and some reloading develop. I saw it when restarting nightly this morning.

Clear cache & reload fixes it (which technically is a workaround, which would make this 'minor', but I'm going to apply common sense here and make it critical)

Outcome

  • Many rooms have lost their names (shown as 'empty room')
  • Often directed to old versions of rooms first, and then have to click through manually to their newest version
  • Room list is missing message previews until I switch to the room
  • Receiving notifications for strange rooms that don't usually notify me

Operating system

No response

Browser information

No response

URL for webapp

No response

Application version

No response

Homeserver

No response

Will you send logs?

No

@dbkr dbkr added T-Defect S-Critical Prevents work, causes data loss and/or has no workaround X-Release-Blocker O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience labels Feb 2, 2022
@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

I believe this list of commits has all the new things I applied this morning when the issues started, though I am not 100% certain.

@dbkr
Copy link
Member Author

dbkr commented Feb 2, 2022

Thanks. I got the bug with the nightly build which finished at 8:47: before any of the 3 commits today, so that rules out those three.

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

...though after scanning those changes, I can't see how they could trigger anything like these issues... 😰

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

Given the areas affected, I would have expected JS SDK changes, but I do not recall seeing any in the changelog before updating...

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

Using /devtools to try viewing room state makes it clear that state events seem to just be... missing from the room model.

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

I do notice that the IndexedDB store worker appears to startup twice, which doesn’t seem to happen on app.

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

The sync data in IndexedDB for the affected session is extremely minimal: it only seems to have some read receipts... perhaps 2 copies of the store logic are racing?

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

I'm only noticing duplicate runs of the store worker in my main Firefox session, though some are affected on the desktop build as well.

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

...and now the worker only starts up once. 😵

@jryans
Copy link
Collaborator

jryans commented Feb 2, 2022

Well, I've run out of theories to investigate, so I suppose I'll clear cache now.

@turt2live
Copy link
Member

Fwiw, I haven't yet checked nightly or web, but mobile (android) is suddenly using default naming on a handful of rooms and also lost their avatar. This might be a server issue.

@dbkr
Copy link
Member Author

dbkr commented Feb 2, 2022

Huh. I was just looking through the rageshake and one of them has an error while processing the cached sync:

2022-02-02T08:45:50.453Z E Error processing cached sync MatrixError@https://develop.element.io/bundles/ee543a11b18ddad85cd5/init.js:38884:5
parseErrorResponse@https://develop.element.io/bundles/ee543a11b18ddad85cd5/init.js:38823:13
requestCallback/<@https://develop.element.io/bundles/ee543a11b18ddad85cd5/init.js:38773:17
doRequest/req<@https://develop.element.io/bundles/ee543a11b18ddad85cd5/init.js:38697:18
on_end@https://develop.element.io/bundles/ee543a11b18ddad85cd5/vendors~init.js:210017:13
on_state_change@https://develop.element.io/bundles/ee543a11b18ddad85cd5/vendors~init.js:209967:7
instrumentXHR/</</</<@https://develop.element.io/bundles/ee543a11b18ddad85cd5/vendors~init.js:192313:41

It looks like when this happens, it isn't handled particularly gracefully. These are locations in http-api, so it looks like somewhere during processing a sync response, something's making an API call, all of which would be less than ideal.

@t3chguy
Copy link
Member

t3chguy commented Feb 2, 2022

{"source":"webpack:///matrix-js-sdk/src/http-api.ts","line":1041,"column":8,"name":null}
{"source":"webpack:///matrix-js-sdk/src/http-api.ts","line":975,"column":18,"name":null}
...
{"source":"webpack:///matrix-js-sdk/src/http-api.ts","line":842,"column":30,"name":"err"}

So it looks to be due to a web request made during parsing the cached sync, which I think only happens for threads code

The stack trace unfortunately begins in the guts of browser-request

{"source":"webpack:///matrix-js-sdk/node_modules/browser-request/index.js","line":294,"column":0,"name":null}

@t3chguy
Copy link
Member

t3chguy commented Feb 2, 2022

processThreadEvents can throw if there's no connection/any server failure and will explode the syncFromCache method causing the Error processing cached sync log line

@t3chguy
Copy link
Member

t3chguy commented Feb 2, 2022

I'm also not sure how I feel about restoring a cached sync needing to hit any Matrix APIs at all

@dbkr
Copy link
Member Author

dbkr commented Feb 2, 2022

ah, I found it making calls to /members to get room members for encrypted rooms, but those were all caught sensibly. Threads might do it though...

and yeah, shudder at things making API calls wile processing a cached sync :/

@dbkr
Copy link
Member Author

dbkr commented Feb 2, 2022

Right, successfully reproed by:

  1. sending an event
  2. sending 50 messages in that room to push that event out of the cached sync window
  3. making a thread from the first event
  4. waiting long enough for the client to save a cached sync response
  5. blocking the /event/ API in chrome dev tools
  6. reloading

It's https://github.com/matrix-org/matrix-js-sdk/blob/develop/src/models/room.ts#L1375 which makes an API call but doesn't catch errors

su-ex added a commit to SchildiChat/matrix-js-sdk that referenced this issue Feb 15, 2022
* Fix issue with rooms not getting marked as unread ([\matrix-org#2163](matrix-org#2163)). Fixes element-hq/element-web#20971.
* Don't store streams that are only used once ([\matrix-org#2157](matrix-org#2157)). Fixes element-hq/element-web#20932. Contributed by @SimonBrandner.
* Fix edge cases around RR calculations ([\matrix-org#2160](matrix-org#2160)). Fixes element-hq/element-web#20922.
* Account for encryption in `maySendMessage()` ([\matrix-org#2159](matrix-org#2159)). Contributed by @SimonBrandner.
* Send references to thread root to threads, even out of order ([\matrix-org#2156](matrix-org#2156)).
* Fix initial sync fail when event fetching unsuccessful ([\matrix-org#2150](matrix-org#2150)). Fixes element-hq/element-web#20862.
* Don't decrypt redacted messages ([\matrix-org#2143](matrix-org#2143)). Contributed by @SimonBrandner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Threads O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Critical Prevents work, causes data loss and/or has no workaround T-Defect Z-Labs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants