Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config file option to speed up uploading #400

Merged
merged 1 commit into from
Mar 11, 2020

Conversation

potsbo
Copy link
Contributor

@potsbo potsbo commented Mar 10, 2020

WHY

Using asset_sync for more than 6 years in production in a repo with 800k lines of Rails, we faced a problem with uploading files to s3.
It takes more than 5 min to run asset sync, which is too long for us who deploy 30 times a day.
After a bit of investigation, we've found that get_remote_files is the bottleneck which takes 4 min.
As a workaround, we started using a patched version of this gem, whose diff is here.
With this patch, our assets_sync finished less than 30s.

It basically caches a list of files that already uploaded to remote buckets, so that we don't have to scan the bucket in the next run.

WHAT

After using the patch for more than 1.5 years in production in a big project, I started thinking this is worth feedbacking to the upstream.
Maybe I should have created an issue first but I've already got a patch that's working. Creating a Pull Request might be a good idea.

If there is any concern or request to change please let me know.

Tasks left

  • Update readme
  • Add specs

@coveralls
Copy link

coveralls commented Mar 10, 2020

Pull Request Test Coverage Report for Build 416

  • 20 of 21 (95.24%) changed or added relevant lines in 3 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+1.009%) to 65.94%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lib/asset_sync/engine.rb 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
lib/asset_sync/engine.rb 1 0%
Totals Coverage Status
Change from base Build 407: 1.009%
Covered Lines: 393
Relevant Lines: 596

💛 - Coveralls

@PikachuEXE
Copy link
Member

I suggest to rename the new config to something like remote_file_list_cache_file_path
cache_file doesn't explain it should store a file path or what it represents
(file path of a cache file which caches remote file list)

Feel free to suggest other names 😃

@potsbo
Copy link
Contributor Author

potsbo commented Mar 11, 2020

Thank you for the quick review!
remote_file_list_cache_file_path sounds great to me, bit long but I suppose no name can be too verbose in this context.

@potsbo potsbo changed the title [WIP] Add config file option to speed up uploading Add config file option to speed up uploading Mar 11, 2020
@@ -58,6 +62,32 @@ def local_files
(get_local_files + config.additional_local_file_paths).uniq
end

def remote_files
return @remote_files if @remote_files
return @remote_files = [] if ignore_existing_remote_files?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe when ignore_existing_remote_files? is true, remote_files should be an empty array no matter the value of remote_file_list_cache_file_path.
Tell me if I should do otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your statement is correct.

But I think the code should be

return [] if ignore_existing_remote_files?
return @remote_files if @remote_files
# ...
@remote_files = get_remote_files

Since the original logic is ignore_existing_remote_files? ? [] : get_remote_files
which equals

return [] if ignore_existing_remote_files?

get_remote_files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought those two are identical in this case, but your suggestion is easier to understand.
Thanks.

@potsbo potsbo marked this pull request as ready for review March 11, 2020 04:45
@potsbo
Copy link
Contributor Author

potsbo commented Mar 11, 2020

@PikachuEXE would you take a look?

@potsbo potsbo requested a review from PikachuEXE March 11, 2020 05:07
@@ -58,6 +62,32 @@ def local_files
(get_local_files + config.additional_local_file_paths).uniq
end

def remote_files
return @remote_files if @remote_files
return @remote_files = [] if ignore_existing_remote_files?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your statement is correct.

But I think the code should be

return [] if ignore_existing_remote_files?
return @remote_files if @remote_files
# ...
@remote_files = get_remote_files

Since the original logic is ignore_existing_remote_files? ? [] : get_remote_files
which equals

return [] if ignore_existing_remote_files?

get_remote_files

Copy link
Member

@PikachuEXE PikachuEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks fine now
Can you squash the commits?
Don't have to be 1 commit only but I would like to avoid commits like "fix this", "fix that"

When remote has many objects, `get_remote_files` takes more than several
minutes.
Using `remote_file_list_cache_file_path`, we can store uploaded file
list at our local to save time to fully scan the remote.
@potsbo
Copy link
Contributor Author

potsbo commented Mar 11, 2020

Squashing into one commit seemed to be the easiest way.

I just made it a single commit.
Thanks for the review.

@PikachuEXE PikachuEXE merged commit cc9d971 into AssetSync:master Mar 11, 2020
@PikachuEXE
Copy link
Member

Releasing tomorrow or Friday

@potsbo potsbo deleted the feedbackto-upstream branch March 11, 2020 08:53
@PikachuEXE
Copy link
Member

2.11.0 released guys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants