Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Hashing #1027

Merged
merged 4 commits into from
Dec 13, 2023
Merged

Use Hashing #1027

merged 4 commits into from
Dec 13, 2023

Conversation

scott-rc
Copy link
Collaborator

@scott-rc scott-rc commented Dec 2, 2023

This improves how we detect and resolve file changes when ggt sync starts.

I've split this PR into 2 commits. I recommend reading them individually so the changes are easier to follow.

Before this PR

To detect local changes, we compared the largest mtime on the local filesystem to the largest mtime we saw when ggt sync stopped. If the largest mtime on the local filesystem was greater than the largest mtime we last saw, we assumed local files had changed.

To detect gadget (remote) changes, we compared the latest files version to the last files version we saw when ggt sync stopped. If the latest files version was greater than our local files version, we assumed gadget (remote) files had changed.

If local changes were detected, we prompted the user to either reset/undo all their local changes or keep their local changes and merge them with any gadget (remote) changes.

After this PR

First, we fetch the hashes of the files version we last saw when ggt sync stopped.

To detect local changes, we compare the files version hashes to the hashes on the local filesystem. If the hashes are different, we know local files have changed.

To detect gadget changes, we do the same thing and compare the files version hashes to the hashes of the latest files version. If the hashes are different, we know gadget files have changed.

If both the local filesystem and gadget changed the same file in conflicting ways, i.e.:

  • both updated the same file with different content
  • one deleted the file while the other updated it

Then we prompt the user to either keep their local conflicting changes or keep gadgets conflicting changes. All non-conflicting changes are always merged.

Copy link

changeset-bot bot commented Dec 2, 2023

⚠️ No Changeset found

Latest commit: 69dedaf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@scott-rc scott-rc changed the title Improve report setter types Use Hashing Dec 2, 2023
@scott-rc scott-rc changed the base branch from sc/more-pre-hashing to sc/context December 2, 2023 22:18
@scott-rc scott-rc force-pushed the sc/use-hashes branch 4 times, most recently from 7a6888d to 3cc236a Compare December 5, 2023 04:29
@scott-rc scott-rc changed the base branch from sc/context to sc/more-pre-hashing December 5, 2023 04:29
@scott-rc scott-rc mentioned this pull request Dec 5, 2023
Base automatically changed from sc/more-pre-hashing to main December 5, 2023 05:14
@scott-rc scott-rc force-pushed the sc/use-hashes branch 14 times, most recently from 64d840d to 6d39212 Compare December 8, 2023 20:11
@scott-rc scott-rc force-pushed the sc/use-hashes branch 2 times, most recently from 2d269f1 to 496773c Compare December 12, 2023 16:21
@scott-rc scott-rc marked this pull request as ready for review December 12, 2023 19:56
this.editGraphQL.query({ query: FILE_SYNC_HASHES_QUERY }).then((data) => ({
gadgetFilesVersion: BigInt(data.fileSyncHashes.filesVersion),
gadgetHashes: data.fileSyncHashes.hashes,
})),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your local files version matches the files version in Gadget, do we generate the hashes twice?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was just one graphQL query where we passed along the localFilesVersion we could skip generating 50% of the hashes a lot of the time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah we can turn this into a single query 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new TooManySyncAttemptsError(attempt);
}

const { filesVersionHashes, localHashes, gadgetHashes, gadgetFilesVersion } = await this._getHashes();
Copy link

@angelini angelini Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I getting this right? If you start a sync and there is 1 file that has been modified we will hash the project server side 4 times?

  • Attempt = 0, filesVersionHashes
  • Attempt = 0, gadgetFilesVersionHashes
  • Attempt = 1, filesVersionHashes
  • Attempt = 1, gadgetFilesVersionHashes

Because we always iterate through this at least once if there has been any change either locally or upstream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right.

choices: [Action.CANCEL, Action.MERGE, Action.RESET],
});
localChanges = withoutUnnecessaryChanges({ changes: localChanges, existing: gadgetHashes });
gadgetChanges = withoutUnnecessaryChanges({ changes: gadgetChanges, existing: localHashes });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to remove unnecessary changes before generating the conflicts between them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end result shouldn't change, but yes we could do this before checking for conflicts.

In fact, we could do this inside of getChanges if we made it take existing:

let localChanges = getChanges({ from: filesVersionHashes, to: localHashes, existing: gadetHashes, ignore: [".gadget/"] });
let gadgetChanges = getChanges({ from: filesVersionHashes, to: gadgetHashes, existing: localHashes });

What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think moving it into getChanges is a good call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we wanted to drop unnecessary changes before asking the user about conflicts in case an unnecessary change was a conflict - but moving this into getConflicts achieves the same thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@airhorns airhorns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I am a little unfamiliar but I couldnt see any issues. The core sync loop is nice and terse and digestable, great job!

process.exit(0);
}
// recursively call this function until we're in sync
return this.sync({ attempt: ++attempt });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempt limit is low enough that this won't ever stack overflow right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this, but I can't think of a scenario where this would run into a stack overflow with 100 attempts let alone 10.

Although we could prevent a stack overflow from ever happening if we change the function to this:

async sync(): Promise<void> {
  const maxAttempts = 10;

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const hashes = await this._getHashes();
    this.log.debug("syncing", hashes);

    if (isEqualHashes(hashes.localHashes, hashes.gadgetHashes)) {
      this.log.info("filesystem is in sync");
      await this._save(hashes.gadgetFilesVersion);
      return;
    }

    await this._sync(hashes);
  }

  throw new TooManySyncAttemptsError(maxAttempts);
}

private async _sync({
  filesVersionHashes,
  localHashes,
  gadgetHashes,
  gadgetFilesVersion,
}: {
  filesVersionHashes: Hashes;
  localHashes: Hashes;
  gadgetHashes: Hashes;
  gadgetFilesVersion: bigint;
}): Promise<void> {
  let localChanges = getChanges({ from: filesVersionHashes, to: localHashes, existing: gadgetHashes, ignore: [".gadget/"] });
  let gadgetChanges = getChanges({ from: filesVersionHashes, to: gadgetHashes, existing: localHashes });

  if (localChanges.size === 0 && gadgetChanges.size === 0) {
    // the local filesystem is missing .gadget/ files
    gadgetChanges = getChanges({ from: localHashes, to: gadgetHashes });
    assertAllGadgetFiles({ gadgetChanges });
  }

  const conflicts = getConflicts({ localChanges, gadgetChanges });
  if (conflicts.size > 0) {
    this.log.debug("conflicts detected", { conflicts });
    printConflicts({
      message: sprint`{bold You have conflicting changes with Gadget}`,
      conflicts,
    });

    const preference = await select({
      message: "How would you like to resolve these conflicts?",
      choices: Object.values(ConflictPreference),
    });

    switch (preference) {
      case ConflictPreference.CANCEL: {
        process.exit(0);
        break;
      }
      case ConflictPreference.LOCAL: {
        gadgetChanges = withoutConflictingChanges({ conflicts, changes: gadgetChanges });
        break;
      }
      case ConflictPreference.GADGET: {
        localChanges = withoutConflictingChanges({ conflicts, changes: localChanges });
        break;
      }
    }
  }

  assert(localChanges.size > 0 || gadgetChanges.size > 0, "there must be changes if hashes don't match");

  if (gadgetChanges.size > 0) {
    await this._getChangesFromGadget({ changes: gadgetChanges, filesVersion: gadgetFilesVersion });
  }

  if (localChanges.size > 0) {
    await this._sendChangesToGadget({ changes: localChanges, expectedFilesVersion: gadgetFilesVersion });
  }
}

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enh I think its fine as is

Copy link

@angelini angelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Do we want to do any more in house testing before this becomes the default?

@scott-rc
Copy link
Collaborator Author

This looks good to me.

Do we want to do any more in house testing before this becomes the default?

I feel confident enough to merge this into main, but I won't ship this until I've done another round of testing. I'll ship another experimental version and test that one while I wait for docs to finish.

@scott-rc scott-rc merged commit 6a5c88a into main Dec 13, 2023
7 checks passed
@scott-rc scott-rc deleted the sc/use-hashes branch December 13, 2023 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants