Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement repeated backup #3640

Closed
andrewkozlik opened this issue Mar 22, 2024 · 12 comments · Fixed by #3743
Closed

Implement repeated backup #3640

andrewkozlik opened this issue Mar 22, 2024 · 12 comments · Fixed by #3743
Assignees
Labels
core Trezor Core firmware. Runs on Trezor Model T and T2B1. feature Product related issue visible for end user protobuf Structure of messages exchanged between Trezor and the host

Comments

@andrewkozlik
Copy link
Contributor

andrewkozlik commented Mar 22, 2024

Currently the firmware allows seed backup to be created only once and only if the seed was generated on Trezor, i.e. not recovered. We want to make it possible for the user to create a backup repeatedly, even from a recovered seed. Before being allowed to do so, the user would be required to perform dry-run recovery. In case of BIP-39 this feature does not make sense, but it does make sense in case of SLIP-39 where the user might want to change their backup solution from, say, 1-of-1 to 2-of-3.

One question to consider is, should we allow this process for BIP-39? There is no security reason to not allow it, but there is also no reason to support it, since the user obviously has the backup if they were able to complete dry-run recovery. So supporting it is just confusing, e.g. the user thinks they will be able to convert their seed to SLIP-39, which is not possible. I would disallow it.

The main implementation problem is that the device can be powered-down during dry-run recovery and disconnected from the Suite instance. So in general we cannot expect the entire process (dry-run + repeated backup) to take place in one workflow or even in one session with Suite.

We have these options to consider:

  1. Do not allow powering down the device during this process.
    • Not user friendly and inconsistent with normal dry-run experience, but it can make the overall implementation easier.
    • The entire flow could simply be initiated by the BackupDevice message, possibly with a boolean flag repeated_backup set to true, but it's not necessary to have this flag, because it is implied if backup already took place (Features.needs_backup=False).
  2. Allow powering down during the dry-run part of the process, but once dry-run is complete, the repeated backup needs to take place immediately, i.e. no other workflows are allowed in between.
    • The flow would start with RecoveryDevice followed by BackupDevice.
  3. Allow powering down during the dry-run part of the process, but once dry-run is complete, the repeated backup needs to take place before powering down, i.e. other workflows are allowed in between.
    • The flow would start with RecoveryDevice followed by BackupDevice.
    • Introduce a homescreen banner "BACKUP AVAILABLE" and a global variable indicating that backup is available, which would trigger this banner.
  4. Allow powering down at any time during the entire process, i.e. the repeated backup can take place at any time in the future and other workflows are allowed in between the two phases.
    • The flow would start with RecoveryDevice followed by BackupDevice.
    • Extend the _NEEDS_BACKUP storage entry from boolean to a string "enum" {BACKUP_DONE="\x00", BACKUP_NEEDED="\x01", BACKUP_AVAILABLE="\x02"} and rename it to something like _BACKUP_STATE.
    • Introduce a homescreen banner "BACKUP AVAILABLE" which shows up when _BACKUP_STATE is BACKUP_AVAILABLE. The banner should be cancellable and when cancelled, _BACKUP_STATE should be set to BACKUP_DONE.

In cases 3 and 4 the device is essentially vulnerable to seed theft if left unattended, so if we want to allow the workflow to be split into two independent parts, we need to indicate this to the user in a banner notification.

In cases 2, 3 and 4 we should make some adjustments to the RecoveryDevice message. Currently we have:

message RecoveryDevice {
    ...
    optional bool dry_run = 10;  // perform dry-run recovery workflow (for safe mnemonic validation)
	...
}

We should make it possible to specify what kind of dry-run is taking place, because:

  • I think we shouldn't automatically activate the repeated backup at the end of just any dry-run. That could be unexpected and could lead to seed theft, if the user doesn't realize this.
  • We might want to adjust the messaging to differentiate it from regular dry-run, but it doesn't actually seem necessary.

Options:

  • Change the dry-run field to an enum {RECOVERY=0, DRY_RUN=1, UNLOCK_REPEATED_BACKUP=2}, which should be backwards compatible.
  • Add a new field optional bool unlock_repeated_backup = ...;.

Related to #3636.

@andrewkozlik andrewkozlik added core Trezor Core firmware. Runs on Trezor Model T and T2B1. protobuf Structure of messages exchanged between Trezor and the host feature Product related issue visible for end user labels Mar 22, 2024
@matejcik
Copy link
Contributor

@andrewkozlik's option 4' (more like 3+):

Allow powering down during the dry-run part of the process, but once dry-run is complete, the repeated backup needs to take place at worst, one reboot later.
IOW, the "Backup available" banner stays up until you unplug the device, plus on the next reboot until you unplug.

From my POV, this should be "1 backup or 1 reboot", whichever happens sooner, so:

  • once you create a backup, you can't do it again,
  • if you don't create a backup, the mode will disable itself after the second unplug (implementation-wise we'll drop the storage item on first boot-up, but keep the mode enabled in cache)

@andrewkozlik
Copy link
Contributor Author

Meeting notes from 2024-03-26 with @matejcik and @Hannsek.

We excluded Options 1 and 3 and agreed that Options 2 and 4, possibly 3+ mentioned in @matejcik's post, are on the table.

I think the main problem with Option 2 is that it is somewhat counterintuitive, given that the user can power down in between entering recovery shares, it might be surprising that they can't power down before the repeated recovery.

The problem with Option 4 is that it takes us even further away from Trezor's security model, which relies on not being able to pull out the seed from Trezor once onboarding is complete.

We also clarified some details about Option 2, which also applies to Option 5 below. Once dry-run is complete, Trezor would offer the user to do on-device recovery, i.e. an on-screen choice between basic and advanced. Trezor would await user input or a BackupDevice message from Suite with the backup details. Thus it could work in both detached state, e.g. with a power bank, as well as in a Suite-attached state.

In the future @Hannsek is considering making it possible to do Shamir backup of individual shares the same way as we do recovery, i.e. making it possible to unplug the device between the back-up of each share. (This idea needs more thought.) In light of these considerations I propose one more option:

  1. Allow powering down at any time during the entire process, but no workflows are allowed in between.

So it boils down to two questions:

  • Allow power-down after dry-run and allow continuing with repeated backup after powering up later?
    • Yes: Options 4 and 5.
    • No: Options 1, 2 and 3.
    • Possibly limit this to at most one power down (Option 3+).
  • Allow other workflows in between dry-run and repeated backup?
    • Yes: Options 3 and 4.
    • No: Options 1, 2 and 5.
  • Change the dry-run field to an enum {RECOVERY=0, DRY_RUN=1, UNLOCK_REPEATED_BACKUP=2}, which should be backwards compatible.

Let's go with this option.

@Hannsek Hannsek moved this to 🎯 To do in Firmware Apr 10, 2024
@andrewkozlik
Copy link
Contributor Author

andrewkozlik commented Apr 16, 2024

Meeting notes from 2024-04-16.

We narrowed down the options to Option 2 and Option 5, i.e. do not allow other workflows in between dry-run and repeated backup, but we did not reach a concensus about whether Trezor should continue with repeated backup if the user powers down after dry-run. Proposals that were discussed:

  • Allow continuing with repeated backup always. There is a chance that the user will accidentally leave the Trezor in this state. The obvious advantage is that they can do dry-run in airgapped mode and then they are able to relocate as well as connect to a computer to complete the repeated backup.
  • Allow continuing with repeated backup after power-cycling only if PIN is configured. Perhaps this is a separate issue and there are other scenarios which should only be allowed if PIN is configured.
  • Show a dialog offering the user to "Back up now", "Back up later" or "Cancel". "Back up later" would enable continuing with repeated backup after power-cycle and ask the user to unplug the device. Disconnecting without choosing anything would be equivalent to "Cancel".
  • Abort repeated backup if the user power-cycles after completing dry-run recovery and deferr any sort of improvements to a later PR.

Suite should be able to clearly identify from the Features message what state the Trezor is in

  • Normal operation.
  • Recovery in progress. Maybe even distinguish whether it's a normal recovery, dry-run recovery or unlock-repeated-backup dry-run?
  • Dry-run recovery complete and ready to accept a BackupDevice message.

Right now we have a flag to identify the recovery mode:

message Features {
    ...
    optional bool recovery_mode = 29;           // is recovery mode in progress
    ...
}

Perhaps we should extend this to an enum such as {NONE=0, RECOVERY=1, DRY_RUN=2, UNLOCK_REPEATED_BACKUP=3, REPEATED_BACKUP_READY=4}.

@ibz
Copy link
Contributor

ibz commented Apr 18, 2024

Seems pretty clear to me.

Perhaps the main question is how to signal that we want to start a "dry run" recovery followed by a backup.

It was suggested to:

Change the dry-run field to an enum {RECOVERY=0, DRY_RUN=1, UNLOCK_REPEATED_BACKUP=2}, which should be backwards compatible.

What I don't like about this is that we would still call that field "dry-run". Which it is not, really... "dry-run" implies nothing happens... it's more of a sanity check. Which in this case, it is not.

How if we see this process more like a backup request preceded by a verification rather than a recovery followed by a back up? Would it be more clear? So it would all happen using backup, not using recovery?

On the other hand, "recovery" does kinda sound right to me as well, and we could indeed also use that.

Just not "dry run" recovery!

Perhaps this is better then?

Add a new field optional bool unlock_repeated_backup = ...;.

If so, this should probably be orthogonal to and mutually exclusive with dry-run?

So basically a "recovery" request would either be:

A) normal recovery
B) dry run. A way to double-check that the backup is fine. This is something most people should do straight away after setting up a Trezor.
C) repeated backup [perhaps without unlock_ word - which makes sense only if coupling this flow with the dry run] This is something people would optionally do later on - definitely not right after setting up a new Trezor.

@matejcik
Copy link
Contributor

Change the dry-run field to an enum {RECOVERY=0, DRY_RUN=1, UNLOCK_REPEATED_BACKUP=2}, which should be backwards compatible.

renaming fields has no compatibility costs so we can just rename the field to recovery_action or something

If so, this should probably be orthogonal to and mutually exclusive with dry-run?

which is why introducing a new field is the strictly worse option

@ibz
Copy link
Contributor

ibz commented Apr 18, 2024

I see, renaming fields without changing the index means the wire protocol stays the same. Great then!

@x13215465
Copy link

x13215465 commented May 5, 2024

Before being allowed to do so, the user would be required to perform dry-run recovery.

why?

@prusnak
Copy link
Member

prusnak commented May 5, 2024

why?

To prove ownership of the seed stored in the device.

@x13215465
Copy link

why?

To prove ownership of the seed stored in the device.

you prove it by holding the device physically, providing the PIN and optionally password for the given wallet

if you require the dry run this important feature is devalued

yes, one will be able to change the backup scheme or create an alternative one

but you will not allow to "clone" the device which is very important and totally ignored scenario - the user should never be forced to touch his shares unless his last working device is dead

@prusnak
Copy link
Member

prusnak commented May 5, 2024

you prove it by holding the device physically, providing the PIN and optionally password for the given wallet

This changes the original threat model and we are not doing this. To create another set of shares you need to prove ownership of the seed. Having the device and PIN is not the same and is not enough.

the user should never be forced to touch his shares unless his last working device is dead

Says you, but we disagree here.

@x13215465
Copy link

you prove it by holding the device physically, providing the PIN and optionally password for the given wallet

This changes the original threat model and we are not doing this. To create another set of shares you need to prove ownership of the seed. Having the device and PIN is not the same and is not enough.

device possession and PIN is enough move the coins (from standard wallet) - hence it's equivalent to holding the seed in any form, I don't see any incompatibility with the thread model but you are the experts here

the user should never be forced to touch his shares unless his last working device is dead

Says you, but we disagree here.

what should a serious user do having two (primary and backup) devices when one of them stops working - travel (potentially to another continent) to fetch the shares and compromise their storage? or move the UTXOs, pay the fees, mess with coin control and redistribute the shares?
or is it reasonable to expect that one should be able to use his working device to clone new backup device?
it's not 2014, things are getting serious, people function globally, implement disaster recovery schemes and there are many sensible scenarios to consider

one would expect that bitcoiners will understand that protecting people against their own mistakes causes often more damage than the threat itself... just saying

@prusnak
Copy link
Member

prusnak commented May 5, 2024

I don't see any incompatibility with the thread model

threat model is modified - with device and PIN possession I can only steal coins now; if I could generate the seed I would be able to steal coins at any time in the future (even if I do not have access to the device+PIN anymore)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Trezor Core firmware. Runs on Trezor Model T and T2B1. feature Product related issue visible for end user protobuf Structure of messages exchanged between Trezor and the host
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants