-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config file option to speed up uploading #400
Conversation
Pull Request Test Coverage Report for Build 416
💛 - Coveralls |
I suggest to rename the new config to something like Feel free to suggest other names 😃 |
Thank you for the quick review! |
lib/asset_sync/storage.rb
Outdated
@@ -58,6 +62,32 @@ def local_files | |||
(get_local_files + config.additional_local_file_paths).uniq | |||
end | |||
|
|||
def remote_files | |||
return @remote_files if @remote_files | |||
return @remote_files = [] if ignore_existing_remote_files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe when ignore_existing_remote_files?
is true
, remote_files
should be an empty array no matter the value of remote_file_list_cache_file_path
.
Tell me if I should do otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your statement is correct.
But I think the code should be
return [] if ignore_existing_remote_files?
return @remote_files if @remote_files
# ...
@remote_files = get_remote_files
Since the original logic is ignore_existing_remote_files? ? [] : get_remote_files
which equals
return [] if ignore_existing_remote_files?
get_remote_files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought those two are identical in this case, but your suggestion is easier to understand.
Thanks.
@PikachuEXE would you take a look? |
lib/asset_sync/storage.rb
Outdated
@@ -58,6 +62,32 @@ def local_files | |||
(get_local_files + config.additional_local_file_paths).uniq | |||
end | |||
|
|||
def remote_files | |||
return @remote_files if @remote_files | |||
return @remote_files = [] if ignore_existing_remote_files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your statement is correct.
But I think the code should be
return [] if ignore_existing_remote_files?
return @remote_files if @remote_files
# ...
@remote_files = get_remote_files
Since the original logic is ignore_existing_remote_files? ? [] : get_remote_files
which equals
return [] if ignore_existing_remote_files?
get_remote_files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks fine now
Can you squash the commits?
Don't have to be 1 commit only but I would like to avoid commits like "fix this", "fix that"
When remote has many objects, `get_remote_files` takes more than several minutes. Using `remote_file_list_cache_file_path`, we can store uploaded file list at our local to save time to fully scan the remote.
9bccd0f
to
6b3da43
Compare
Squashing into one commit seemed to be the easiest way. I just made it a single commit. |
Releasing tomorrow or Friday |
|
WHY
Using asset_sync for more than 6 years in production in a repo with 800k lines of Rails, we faced a problem with uploading files to s3.
It takes more than 5 min to run asset sync, which is too long for us who deploy 30 times a day.
After a bit of investigation, we've found that
get_remote_files
is the bottleneck which takes 4 min.As a workaround, we started using a patched version of this gem, whose diff is here.
With this patch, our assets_sync finished less than 30s.
It basically caches a list of files that already uploaded to remote buckets, so that we don't have to scan the bucket in the next run.
WHAT
After using the patch for more than 1.5 years in production in a big project, I started thinking this is worth feedbacking to the upstream.
Maybe I should have created an issue first but I've already got a patch that's working. Creating a Pull Request might be a good idea.
If there is any concern or request to change please let me know.
Tasks left