Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

refactor(backup): straighten the execution path of download_file #631

Merged
merged 12 commits into from
Oct 22, 2020

Conversation

neverchanje
Copy link
Contributor

@neverchanje neverchanje commented Sep 22, 2020

The block_service API provides only async interfaces. It causes the callers must write code from bottom-up unnaturally.
This PR turns the async calls (block_file::download) in block_service_manager::download_file into sync functions (download_block_file_sync).
It doesn't change the original behavior, because download_file is actually synchronous at all. But it reverses the code to top-down, which is more readable for humans and more debuggable.

Another significant change in this PR is that download_file does not verify checksum with remote FS anymore.

A md5 verification of local file with remote:

        // redundant process:
        std::string current_md5;
        error_code e = utils::filesystem::md5sum(local_file_name, current_md5);
        if (e != ERR_OK) {
            derror_f("calculate file({}) md5 failed", local_file_name);
            download_err = e;
            return;
        }
        if (current_md5 != bf->get_md5sum()) { // md5 of the remote file 
            derror_f("local file({}) is different from remote file({}), download failed, md5: "
                     "local({}) VS remote({})",
                     local_file_name,
                     bf->file_name(),
                     current_md5,
                     bf->get_md5sum());
            download_err = ERR_CORRUPTION;
            return;
        }

Since in both restore and bulk-load, there's a meta file storing the checksums, which we use to compare with the local file checksum after the download completes. We do not need another verification between the local file and the remote file.

The remote file md5 is generally provided by the filesystem service.

But it doesn't mean that every filesystem has such capability to produce md5. Some may use by native an uncommon checksum algorithm, HDFS for example: https://blog.woopi.org/wordpress/files/hadoop-2.6.0-javadoc/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.html.

So md5 should not be in our presumption of the filesystem abstraction. I may remove block_file::get_md5sum in future PR.

hycdong
hycdong previously approved these changes Sep 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants