Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2277 #2294

Merged
merged 26 commits into from
Sep 5, 2024
Merged

Fix #2277 #2294

merged 26 commits into from
Sep 5, 2024

Conversation

corrideat
Copy link
Member

  • Fix reference count errors
  • Implement re-syncing contracts with foreign keys on OP_KEY_SHARE on foreign contract

@corrideat
Copy link
Member Author

corrideat commented Aug 11, 2024

TODO: Add comments explaining changes made

@corrideat corrideat closed this Aug 11, 2024
@corrideat corrideat reopened this Aug 11, 2024
Copy link

cypress bot commented Aug 11, 2024

group-income    Run #3028

Run Properties:  status check passed Passed #3028  •  git commit 35d59ee63e ℹ️: Merge 818d3b36914e94b90dedfa55c0a26d16fb5a27a5 into 9114ed8d686fdc2f6dd20a1ac805...
Project group-income
Branch Review refs/pull/2294/merge
Run status status check passed Passed #3028
Run duration 10m 28s
Commit git commit 35d59ee63e ℹ️: Merge 818d3b36914e94b90dedfa55c0a26d16fb5a27a5 into 9114ed8d686fdc2f6dd20a1ac805...
Committer Ricardo Iván Vieitez Parra
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 10
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 111
View all changes introduced in this branch ↗︎

- Fix reference count errors
- Implement re-syncing contracts with foreign keys on
  `OP_KEY_SHARE` on foreign contract
Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review

frontend/model/contracts/group.js Outdated Show resolved Hide resolved
Comment on lines 1284 to 1323
// The following queued invocation handles reference counting. It uses
// some transient (non-persistent) state to see whether the
// pairing `retain` to `release` was called. This situation could
// arise when syncing a group from scratch (e.g., from a new device)
// for chatrooms that have been joined and then left. In that case,
// there'll be no call to `retain` and therefore calling `release`
// is incorrect.
sbp('chelonia/queueInvocation', contractID, async () => {
// Has `retain` been skipped?
// Note: we check for 'skipped' instead of 'called' because
// side-effects cannot mutate state. The 'skipped' case can be
// checked without persistent storage, while the 'called' case
// would require some kind of persistent state (since the reference
// count is itself persisted)
const skippedRetain = sbp('okTurtles.data/get', `gi.contracts/group/chatroom-skipped-${contractID}-${data.chatRoomID}-${data.reference}`)
// There's no use of this transient value, so it's deleted to
// preserve memory
sbp('okTurtles.data/delete', `gi.contracts/group/chatroom-skipped-${contractID}-${data.chatRoomID}-${data.reference}`)

const { identityContractID } = sbp('state/vuex/state').loggedIn
// New user session
if (memberID !== identityContractID) {
return
}

const state = await sbp('chelonia/contract/state', contractID)

if (
state.chatRooms?.[data.chatRoomID]?.members[memberID]?.status === PROFILE_STATUS.ACTIVE ||
state.chatRooms?.[data.chatRoomID]?.members[memberID]?.joinedHeight !== data.reference ||
skippedRetain
) {
return
}

// Release chatroom if we're sure `retain` was called
sbp('chelonia/contract/release', data.chatRoomID).catch(e => {
console.error(`[gi.contracts/group/leaveChatRoom/sideEffect] Error releasing chatroom ${data.chatRoomID}`, e)
})
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you say that it might be possible to simplify this code by adding a new feature to okTurtles.eventQueue? If so, let's do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have specific suggestions right now for the changes needed to eventQueue. I can work on a specific proposal.

Copy link
Member Author

@corrideat corrideat Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of an issue is this code? I originally thought that changing the eventQueue could work, but upon closer examination, I think that that alone can't fix it.

I've already described the situation in Slack, but the gist of this is that either:

  1. We need to increment / decrement the reference count without using any event queue, which would simplify the code but result in potentially unnecessary contract syncs (for example, you join and leave a group; the retain will result in syncing that group only for the release causing it to be removed, and the situation is worsened for multiple join / leaves, and also by chatrooms that are synced as a side effect of groups), or
  2. We keep on using an event queue to alleviate this concern. However, for the reference count to be correct, we need to know that retain was called before calling release. This needs to be somewhere in the state, and since side-effects can't change the contract state, I'm using okTurtles.data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have any questions about what the specific issue is, please let me know, because understanding it well is crucial to discussing a solution.

Copy link
Member Author

@corrideat corrideat Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an alternative solution is something I discussed earlier (which would, however, complicate reference counting).

The solution in this case would be that references, instead of being a number, which needs to be managed very carefully, are an identifier. This way, you have an explicit 'reason' for being subscribed, which can be removed. This is less error prone, but perhaps sounds more complicated (or not).

In this particular case, since the reference already tracks the reason, we can call /release unconditionally, and it doesn't matter if /retain wasn't called. We could either make it not throw, or we could ignore the error.

Needless to say, this would be a breaking change.

frontend/model/contracts/group.js Show resolved Hide resolved
Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moar comments!

shared/domains/chelonia/internals.js Show resolved Hide resolved
throw new Error(`Cannot join already joined group ${groupContractID}`)
}

const inviteSecretId = await sbp('chelonia/crypto/keyId', new Secret(inviteSecret))

state.groups[groupContractID] = { hash, inviteSecretId }
state.groups[groupContractID] = { hash, inviteSecretId, active: true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather huge / significant change.

What is this active thing, why is it being added, and doesn't this have the potential to break all existing groups that have been created and joined and do not have this property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For correctness, state.groups[groupContractID] is no longer deleted. This is needed for the leave action to work correctly, as the reference property needs to be available, which it won't be if the state is removed.

doesn't this have the potential to break all existing groups that have been created and joined and do not have this property?

Yes, but I'd say all of the changes in this PR are breaking. Thanks for pointing that out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest change right now is to make this a negative property (e.g., inactive / left) so that it doesn't break existing groups.

'state/vuex/postUpgradeVerification' seems inappropriate to solve this because:

  • It deals with Vuex state instead of contract state
  • It is executed in the browsing context. This would need to run in the service worker.
  • It is entirely unaware of contract versions. Doing this correctly needs awareness of contract versions.

Implementing #2108 would address this to some extent. However, I think that the preferred solution to this would require a solid in-contract upgrade mechanism. Looking at how contracts are handled now, I see several deficiencies:

  1. It don't think versions are checked at all, which means that someone could downgrade a contract. This should probably be prevented.
  2. Because contracts can be downgraded, they need to be forward and backwards compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It don't think versions are checked at all, which means that someone could downgrade a contract. This should probably be prevented.

Yep there's #1957 to address exactly that.

Because contracts can be downgraded, they need to be forward and backwards compatible.

It's fine to assume contracts will never be downgraded in the sense that contracts should never be downgraded and we will never downgrade them. If someone hacks the app to do this they are responsible for any app breakage, not us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See new commit addressing this by using a negated property.

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions

frontend/controller/app/group.js Show resolved Hide resolved
frontend/model/contracts/chatroom.js Show resolved Hide resolved
memberID: optional(stringMax(MAX_HASH_LEN), 'memberID')
memberID: optional(stringMax(MAX_HASH_LEN), 'memberID'),
// `joinedHeight` is the height used in the corresponding join action
joinedHeight: numberRange(1, Number.MAX_SAFE_INTEGER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... in this action we can expect joinedHeight to be in data, but below where it says:

state.chatRooms[data.chatRoomID].members[memberID].joinedHeight !== data.joinedHeight

(And similar)

Is this a problem for existing clients who did not set joinedHeight because they used an older version of 'gi.contracts/group/joinChatRoom'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think it'd indeed be an issue. A way to possibly fix this could be to test for presence of this attribute (also, it won't be in data for old contracts). Those contracts may have issues with reference counts without this information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to implement contract upgrades. In this case, we'd set joinedHeight to the height of the update.

Comment on lines 1298 to 1301
const skippedRetain = sbp('okTurtles.data/get', `gi.contracts/group/chatroom-skipped-${contractID}-${data.chatRoomID}-${data.joinedHeight}`)
// There's no use of this transient value, so it's deleted to
// preserve memory
sbp('okTurtles.data/delete', `gi.contracts/group/chatroom-skipped-${contractID}-${data.chatRoomID}-${data.joinedHeight}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this gi.contracts/group/chatroom-skipped-${contractID}-${data.chatRoomID}-${data.joinedHeight} value should be saved to a const instead of re-typed here just to be safe and also DRY things

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

// preserve memory
sbp('okTurtles.data/delete', `gi.contracts/group/chatroom-skipped-${contractID}-${data.chatRoomID}-${data.joinedHeight}`)

const { identityContractID } = sbp('state/vuex/state').loggedIn
Copy link
Member

@taoeffect taoeffect Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already grabbed this value above... why are you re-grabbing it? Either shouldn't be done, or comment needs to justify/explain

EDIT: I see a comment below, but I would move that above here and elaborate further. e.g. this seems like a Cypress thing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a Cypress thing

This is not a Cypress thing (other than the fact that users will likely not be that fast).

Comment on lines 305 to 307
const skippedRetain = sbp('okTurtles.data/get', `gi.contracts/identity/group-skipped-${groupContractID}-${reference}`)
// No use for this value beyond this point.
sbp('okTurtles.data/delete', `gi.contracts/identity/group-skipped-${groupContractID}-${reference}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here I'd DRY gi.contracts/identity/group-skipped-${groupContractID}-${reference}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@taoeffect
Copy link
Member

taoeffect commented Aug 19, 2024

Based on conversion, another consideration is that this PR changes how the identity contract is retained, and so it's possible that old versions of the app could remove their own identity contract when they are leaving a room. This can be mitigated by checking if we're upgrading an old versions of the app to this new version of the app (whatever version number we give it), and then issuing an extra retain on the identity contract in, for example, postUpgradeVerification.

Alternatively, in postUpgradeVerification we can check if joinedHeight isn't defined and if so issue an action to the contract to perform the upgrade and in the side effect issue the retain.

Yet a third option may be reverting the changes in this PR and keeping the old behavior of retain/release on the user's identityContractID, this would avoid the need to implement a sideEffect on the upgrade action that's sent, and seems like the simplest option.

@corrideat
Copy link
Member Author

As I continue to work on this PR, I see some additional issues with correctness and the way we do state management.

We've had this conversation on Slack already about why I think the model of only process can change the state is limiting in the context of reference counting. I've found another situation that is even harder to handle correctly.

Let's say that you are leaving a group. When you leave a group, the group contract reference is decremented, and the leave action also decrements the reference count on the group members. I think I may have got that more or less working.

However, let's say that now the group contract is removed (by calling chelonia/contract/remove). In this case, the group contract should undo all of the changes it made to references, but doing so is rather difficult to do without knowing exactly which how the reference count was affected. One potential solution to this is making the remove method async, which would make removeImmediately also async, but I'm still unsure this solves the issue of correctly changing reference counts, because this needs to be done in a way that is stateless.

For example, let's say you're a member of the group, then you leave and then the group contract is removed.

In this case, by the time the leave process function is called, you're no longer an active member, and the state looks the same to both the sideEffect for the leave action and the remove function, which means that they both have to rely on one of them having reduced the reference count. I think in this case, it makes sense of it to be done in the leave action.

On the other hand, if the group contract is removed without first having left the group, it needs to look at the state and decrease the reference count for chatrooms and members. For this to work, the remove function would need to push something on the queue. If this isn't done, those references will be left dangling.

I'll be looking further into how to keep a correct reference count in all of these edge cases.

@corrideat
Copy link
Member Author

To update on this issue: I'm working on a fuzzer to properly design and test a long term solution.

@corrideat
Copy link
Member Author

I've fixed a bug that came from wrongly-done refactoring. Now the tests are passing again (12 times in a row, see b1b9c9c, a00ed0d and 9d0de13).

I think this is once again ready for review.

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question

Comment on lines 351 to 367
if (count && count !== Math.sign(count)) {
console.warn(`[${selector}] Unexpected value`, parentContractID, childContractID, count)
if (process.env.CI) {
Promise.reject(new Error(`Unexpected value ${parentContractID} ${childContractID}: ${count}`))
}
}
Copy link
Member

@taoeffect taoeffect Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this count !== Math.sign(count) a bit presumptive...?

It sounds like it's expecting referenceTally to be called only in situations where there are "guaranteed matching pairs of increments/decrements" or something to that effect.

Is that really always going to be the case wherever this function is ever used?

EDIT: I see - ok that makes sense. In that case can you add the ${selector} to the error message in the Error object? It might make it easier to track down on failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about changing the error message to be more descriptive, like Unexpected reference tally value? I could add the selector to the error message, but if you look closely, that is already logged above, so I think it should be easy to find.

Regardless, I'll add a comment explaining why an exception is thrown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah Unexpected reference tally value sounds good 👍

@@ -29,6 +29,7 @@ const initialState = {
contracts: {}, // contractIDs => { type:string, HEAD:string, height:number } (for contracts we've successfully subscribed to)
loggedIn: false, // false | { username: string, identityContractID: string }
namespaceLookups: Object.create(null), // { [username]: sbp('namespace/lookup') }
reverseNamespaceLookups: Object.create(null), // { [contractID]: username }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think down below in postUpgradeVerification you are missing adding this reverseNamespaceLookups

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with other comments (see above) left a few more:

Comment on lines 13 to 14
const cache = sbp('state/vuex/state').reverseNamespaceLookups
return cache?.[id] ?? null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After implementing postUpgradeVerification should be able to rewrite this as a 1-liner:

return sbp('state/vuex/state').reverseNamespaceLookups[id]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is essentially a copy-paste of the function above, handling the possibility of reverseNamespaceLookups (or namespaceLookups) not being defined. That was because the side-effects in the identity contract call the lookupCached selector, and I thought that an incorrect state should not cause the side-effect to fail. The ??null part also coerces undefined into null, so that the function returns only two types instead of three (string, undefined, null).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can get with ?? null but not cache?.[id]. That should just be cache[id].

Comment on lines +112 to +113
// End exclude contracts
sbp('state/vuex/postUpgradeVerification', state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Scary this was missed by both of us.

Yet another reason all of this must be cleaned up (#2271).

Object.fromEntries(
Object.entries(state.chatRooms[chatRoomID].members)
.map(([memberKey, profile]) => {
if (memberKey === member && (profile: any)?.status === PROFILE_STATUS.ACTIVE) {
return [memberKey, { ...profile, status: PROFILE_STATUS.REMOVED }]
return [memberKey, { ...profile, status: ourselvesLeaving ? PROFILE_STATUS.LEFT_GROUP : PROFILE_STATUS.REMOVED }]
Copy link
Member

@taoeffect taoeffect Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function written in such a bizarre way?

Wouldn't a straightforward implementation be a significantly more efficient O(1) 1-liner?

state.chatRooms[chatRoomID].members[member].status = ourselvesLeaving ? PROFILE_STATUS.LEFT_GROUP : PROFILE_STATUS.REMOVED

Also, did you check all related code that checks the status for whether or not it handles LEFT_GROUP appropriately?

For example, below on line 322 we have a check for status !== PROFILE_STATUS.REMOVED, but what if the status is LEFT_GROUP?

Likewise, on 1262 we have another check:

state.chatRooms?.[data.chatRoomID]?.members[memberID]?.status === PROFILE_STATUS.REMOVED

It's worth asking here: do we need this distinction between REMOVED and LEFT_GROUP? Or could we just have a single variable here to act as an opposite to ACTIVE? The logic could be greatly simplified in that case and these kinds of bugs would be less likely to occur.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a straightforward implementation be a significantly more efficient O(1) 1-liner?

Probably. I think the idea was not to mutate data, but yeah, your suggestion is simpler and more performant.

did you check all related code that checks the status for whether or not it handles LEFT_GROUP appropriately?

Yes. Most of that code is fine as is because this will only be set on our own profile. However, this change prevents us from sending our own leave actions to chatrooms when we are leaving a group (instead, someone else will need to send them). I think this was the intended change, but I'm not sure it's still relevant given the latest changes.

It's worth asking here: do we need this distinction between REMOVED and LEFT_GROUP?

As noted, the change was part of an earlier solution, and might no longer be needed. I'll try that out.

@@ -1030,7 +1030,7 @@ export default (sbp('sbp/selectors/register', {
...encryptedAction('gi.actions/group/updateSettings', L('Failed to update group settings.')),
...encryptedAction('gi.actions/group/updateAllVotingRules', (params, e) => L('Failed to update voting rules. {codeError}', { codeError: e.message })),
...encryptedAction('gi.actions/group/updateDistributionDate', L('Failed to update group distribution date.')),
...encryptedAction('gi.actions/group/upgradeFrom1.0.6', L('Failed to upgrade from version 1.0.6')),
...encryptedAction('gi.actions/group/upgradeFrom1.0.7', L('Failed to upgrade from version 1.0.6')),
Copy link
Member

@taoeffect taoeffect Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to bump the error message too 👆

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a tough one — amazing work @corrideat! 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants