Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the CopyFile resource to copy directories and work with assets and archives #414

Closed
wants to merge 4 commits into from

Conversation

thomas11
Copy link
Contributor

@thomas11 thomas11 commented Apr 23, 2024

Superceded by #423


This PR allows the CopyFile resource to recursively copy directories as well, similar to scp -r.

This PR enhances the CopyFile resource in a few ways.

  • It can now copy directories as well, recursively with their contents.
  • It now takes the standard Pulumi asserts and archives as input, allowing for seamless interop with other resources.
  • It now checks whether the specified file or directory have changed (in content, not timestamp) and copies only if they did.

In light of these changes, I've renamed CopyFile to just Copy.

Resolves #23
Resolves #33
Resolves #42

TODO

  • The preview operation doesn't tell the user whether the resource will copy or not, rendering it rather useless. Probably need to add an output property.
  • The current implementation fails as soon as a file or directory already exists on the remote, like the previous CopyFile. Before GA-ing the provider, we should think about whether users might need other behaviors and how to expose them, e.g., continue if exists or clear the whole remote destination before copying.
  • Before GA, should we rename CopyFile to Copy or RemoteCopy or insert your proposal here?
  • The integration test copies a bunch of EC2 setup code with TestEc2RemoteTs, which we should share instead. What would be a good way to do that, given the code is in the TS tests, not in the Go driver?

Design questions

  • Should we expose knobs such as "clear remote directory before copying" (to avoid leftover files from previous copies) or should we even do that by default? Impacts behavior and should be resolved before GA.
  • Should we expose a flag to allow the user to fail the copy if the remote exists, rather than silently overwriting?

Implementation notes

  • I've looked around for open source Go packages implementing scp of folders, but to my surprise, I couldn't find one with an acceptable license that 1) wasn't ancient and 2) was cross-platform.
  • The current implementation copies files sequentially and is therefore probably slow for large trees. If we replaced the implementation, I don't think the shape of the resource would change, so this shouldn't be a GA blocker.

@thomas11 thomas11 requested a review from a team April 23, 2024 14:09
@EronWright
Copy link

EronWright commented Apr 23, 2024

Sorry for the drive-by commentary, but I want to call out the potential ambiguity in the "from" and "to" arguments of a function that is able to copy files and/or directories. I don't know if this affects us here, sorry!

The COPY command of a Dockerfile has pretty clear semantics ("rules"), check it out:
https://docs.docker.com/reference/dockerfile/#copy

@thomas11 thomas11 requested a review from a team April 24, 2024 04:20
@thomas11 thomas11 force-pushed the tkappler/copy-dir branch from a23ef61 to cdaf251 Compare April 24, 2024 08:49
@t0yv0
Copy link
Member

t0yv0 commented Apr 25, 2024

The implementation looks good, great tests too thank you!

I want to point out there's https://github.com/pulumi/pulumi-synced-folder which looks similar on the tin.

Also syncing a LOT of files QUICKLY is a big and annoying problem, where I'd be reaching for a tool like rsync, I don't know if we have appetite to vendor it though or assume it on the host. Perhaps detecting on the host could kick it in if it can accomplish the same result. Notably pulumi-sync-folder can use gsutil -m rsync which presumably also does incremental sync.

This all can happily go on the backlog .

@t0yv0
Copy link
Member

t0yv0 commented Apr 25, 2024

The current implementation fails as soon as a file or directory already exists on the remote, like the previous CopyFile

I'd think the intuition is usually "I want to sync from here to there" which would rather want you to be idempotent and overwrite or leave identical file in-place.

@thomas11 thomas11 force-pushed the tkappler/copy-dir branch from cdaf251 to feb9ca2 Compare April 30, 2024 09:53
@thomas11 thomas11 changed the title Extend the CopyFile resource to copy directories Extend the CopyFile resource to copy directories and work with assets and archives May 7, 2024
@thomas11 thomas11 closed this May 7, 2024
@thomas11 thomas11 deleted the tkappler/copy-dir branch June 27, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants