AWS S3 cache miss should not issue warning #1404

petamas · 2024-05-09T15:32:54Z

This fixes microsoft/vcpkg#31932

Previously, when using the AWS S3 binary cache, every cache miss triggered the following message:

warning: error: aws failed with exit code: (1).

This was both noisy and misleading (because it was not an actual error).

Now AWS cache misses are treated silently, the same way as cache misses from the local cache are.

Note

I don't have access to either Google Cloud Storage or Tencent Cloud Object Storage sources, so I don't know whether the same kind of issue happens there too; in any case, fixing those should be easy using the newly introduced interface. (Their stat() and download_file() methods need to be adapted to return CacheResult::miss in the relevant cases.)

petamas · 2024-05-09T15:35:22Z

@microsoft-github-policy-service agree company="Formlabs"

BillyONeal · 2024-05-14T18:01:19Z

src/vcpkg/binarycaching.cpp

@@ -951,14 +951,22 @@ namespace
        static constexpr StringLiteral m_accept_header = "Accept: application/json;api-version=6.0-preview.1";
    };

+    enum class CacheResult


Can this use RestoreResult instead of making up a new value?

Absolutely, I just didn't find that enum.

BillyONeal · 2024-05-14T18:04:36Z

src/vcpkg/binarycaching.cpp

    struct IObjectStorageTool
    {
        virtual ~IObjectStorageTool() = default;

        virtual LocalizedString restored_message(size_t count,
                                                 std::chrono::high_resolution_clock::duration elapsed) const = 0;
-        virtual ExpectedL<Unit> stat(StringView url) const = 0;
-        virtual ExpectedL<Unit> download_file(StringView object, const Path& archive) const = 0;
+        virtual ExpectedL<CacheResult> stat(StringView url) const = 0;


Should stat return CacheAvailability instead?

Do you mean ExpectedL<CacheAvailability>, or plain CacheAvailability? In either case, how should we treat unknown in the call sites?

OK, I've found your suggestion for precheck(), but it's still unclear to me how to handle it in AwsStorageTool::download_file(). Should I simply return RestoreResult:: unavailable?

Do you mean ExpectedL<CacheAvailability>, or plain CacheAvailability? In either case, how should we treat unknown in the call sites?

Sorry, I meant ExpectedL<CacheAvailability> :)

BillyONeal · 2024-05-14T18:07:32Z

src/vcpkg/binarycaching.cpp

@@ -990,11 +998,11 @@ namespace
                const auto& abi = action.package_abi().value_or_exit(VCPKG_LINE_INFO);
                auto tmp = make_temp_archive_path(m_buildtrees, action.spec);
                auto res = m_tool->download_file(make_object_path(m_prefix, abi), tmp);
-                if (res)
+                if (res.value_or(CacheResult::miss) == CacheResult::hit)


I think this value_or is less clear than just spelling it out:

if (auto cache_result = res.get()) { if (*cache_result == CacheResult::hit) { out_zip_paths[idx].emplace(std::move(tmp), RemoveWhen::always); return; } } else { out_sink.println_warning(res.error()); }

BillyONeal · 2024-05-14T18:09:51Z

src/vcpkg/binarycaching.cpp

@@ -1007,7 +1015,8 @@ namespace
            {
                auto&& action = *actions[idx];
                const auto& abi = action.package_abi().value_or_exit(VCPKG_LINE_INFO);
-                if (m_tool->stat(make_object_path(m_prefix, abi)))
+                auto res = m_tool->stat(make_object_path(m_prefix, abi));


If stat returns CacheAvailability then this can be:

auto maybe_res = m_tool->stat(make_object_path(m_prefix, abi)); if (auto res = maybe_res.get()) { cache_status[idx] = *res; }

BillyONeal · 2024-05-14T18:15:04Z

src/vcpkg/binarycaching.cpp

-                cmd_execute_and_capture_output(Command{m_tool}.string_arg("-q").string_arg("stat").string_arg(url)),
-                Tools::GSUTIL);
+            auto cmd = Command{m_tool}.string_arg("-q").string_arg("stat").string_arg(url);
+            return flatten(cmd_execute_and_capture_output(cmd), Tools::GSUTIL).map(unit_to_cache_hit);


Is this a case where we should be looking at the output to confirm we are in the case we expect?

ExpectedL<CacheResult> stat_flatten_pseudocode(const ExpectedL<ExitCodeAndOutput>& maybe_exit, StringView tool_name) { if (auto exit = maybe_exit.get()) { if (exit->exit_code == 0) { return {CacheResult::hit}; } if (exit->exit_code == 1 && exit->output.empty()) { return {CacheResult::miss}; } return {msg::format_error( msgProgramReturnedNonzeroExitCode, msg::tool_name = tool_name, msg::exit_code = exit->exit_code) .append_raw('\n') .append_raw(exit->output)}; } return {msg::format_error(msgLaunchingProgramFailed, msg::tool_name = tool_name) .append_raw(' ') .append_raw(maybe_exit.error().to_string())}; }

(this would of course remove the need for unit_to_cache_hit)

I'm a bit nervous to treat all potentially failing exit codes as cache miss. For example, if there's an authentication failure, that probably prints something about what happened or similar, and probably also returns EXIT_FAILURE. But with this PR's suggestion that cache provider would be silently broken.

I think you might've misread my code. I also wouldn't want to treat all nonzero exit codes as cache misses, that's why I only changed the behaviour in the AWS provider, and only in case the output is empty, and the exit code is 0 or 1.

This specifically is the GCS provider, for which I tried to keep the original behaviour: it's either a cache hit, or an error with an error message. flatten() treats every nonzero exit code as an error, and returns a "right-filled" Expected with an error message, which I return as-is. Zero exit codes result in a Unit, which I treat as a hit like before. This function cannot return a cache miss at all, because I don't know what the right condition for that is in GCS caches.

Your function is almost equivalent to my AWS stat version, except that I treat "0 exit code, but empty stdout" as a cache miss, to handle if AWS maintainers decide to change aws ls behaviour to not return an error when there are no hits. However, I don't know how GCS reacts in the "not found" case, so I didn't want to change the behaviour from what it was before.

I see. Yes, I was reviewing what this was trying to do based on what your PR description said and I didn't notice that this wasn't the AWS one. (Or if I did notice, I didn't internalize the difference...)

GitHub only showing like 2 lines of context == review pain :(

BillyONeal · 2024-05-14T18:15:32Z

src/vcpkg/binarycaching.cpp

+            // When the file is not found, "aws s3 ls" prints nothing, and returns exit code 1.
+            // flatten() would treat this as an error, but we want to treat it as a (silent) cache miss instead.
+            // See https://github.com/aws/aws-cli/issues/5544 for the related aws-cli bug report.
+            if (auto exit = maybe_exit.get())
+            {
+                // We want to return CacheResult::miss even if aws-cli starts to return success
+                const bool success_or_exit_code_1 = exit->exit_code == 0 || exit->exit_code == 1;
+                const bool output_is_empty = Strings::trim(exit->output) == "";
+                if (success_or_exit_code_1 && output_is_empty)
+                {
+                    return CacheResult::miss;
+                }
+            }


Ditto stat_flatten_pseudocode above

BillyONeal · 2024-05-14T18:15:58Z

Thanks for the fix!

petamas · 2024-05-14T19:27:03Z

Thanks for your detailed review! I'll address your comments in a couple days. I added some questions in replies where something was unclear to me.

petamas · 2024-05-14T19:31:43Z

@BillyONeal
What's your preference on changes addressing review comments? Should I force-push the branch to rewrite history with the "correct" version incorporating the changes, or should I simply add my changes on top of the current ones in a separate commit?

BillyONeal · 2024-05-14T20:41:06Z

@BillyONeal What's your preference on changes addressing review comments? Should I force-push the branch to rewrite history with the "correct" version incorporating the changes, or should I simply add my changes on top of the current ones in a separate commit?

There isn't widespread surgery here so I think you can do whatever you want :)

petamas · 2024-05-16T17:59:03Z

@BillyONeal: I've updated my branch. I split the changes into two commits:

The first one does not change the behaviour at all, only introduces the ability to return CacheAvailability/RestoreResult, but all providers always either return available or error, just like before.
The second one makes AWS's stat() return unavailable instead of an error in case the standard output is empty, and the exit code is either 1 (current behaviour of aws-cli) or 0 (possible future behaviour if they decide the address this issue).

If you have a better name for flatten_generic(), I'm open to suggestions.

petamas · 2024-05-17T13:14:01Z

src/vcpkg/binarycaching.cpp

+                auto maybe_res = m_tool->stat(make_object_path(m_prefix, abi));
+                if (auto res = maybe_res.get())
                {
-                    cache_status[idx] = CacheAvailability::unavailable;
+                    cache_status[idx] = *res;
                }


Just realized that this is not an equivalent transformation: if stat() returns an error, the original version sets cache_status[idx] to CacheAvailability::unavailable, while the new version leaves it as CacheAvailability::unknown. Should I add an else branch to keep the behaviour the same as before? (My understanding is that leaving it unknown may at worst cause an extra query to the provider, but I'm not 100% sure I'm reading the rest of the code right.)

GitHub's not showing the diff properly.

Original:

if (m_tool->stat(make_object_path(m_prefix, abi))) { cache_status[idx] = CacheAvailability::available; } else { cache_status[idx] = CacheAvailability::unavailable; }

Current:

auto maybe_res = m_tool->stat(make_object_path(m_prefix, abi)); if (auto res = maybe_res.get()) { cache_status[idx] = *res; }

Possible fix to keep old behaviour:

auto maybe_res = m_tool->stat(make_object_path(m_prefix, abi)); if (auto res = maybe_res.get()) { cache_status[idx] = *res; } else { cache_status[idx] = CacheAvailability::unavailable; }

I decided to go with the real equivalent change (i.e. I readded the else branch), and force-pushed.

This commit does not change any behavior, just prepares the way for the next commit.

BillyONeal · 2024-06-05T19:29:18Z

I learned today that you can do this "note" thing

BillyONeal

This is great, thank you!

BillyONeal requested changes May 14, 2024

View reviewed changes

petamas force-pushed the aws-s3-cache-miss-should-not-issue-warning branch from c3c6df5 to d4b3097 Compare May 16, 2024 17:53

petamas commented May 17, 2024

View reviewed changes

petamas added 2 commits May 22, 2024 12:17

Enable differentiating cache misses from errors

6f3b762

This commit does not change any behavior, just prepares the way for the next commit.

AWS S3 binary cache miss should not issue warning

80dcf3f

petamas force-pushed the aws-s3-cache-miss-should-not-issue-warning branch from d4b3097 to 80dcf3f Compare May 22, 2024 10:18

petamas requested a review from BillyONeal May 22, 2024 10:19

empty nitpick

9353f16

BillyONeal approved these changes Jun 7, 2024

View reviewed changes

BillyONeal enabled auto-merge (squash) June 7, 2024 00:55

BillyONeal merged commit 071c997 into microsoft:main Jun 7, 2024
5 checks passed

petamas mentioned this pull request Jun 7, 2024

[vcpkg-tool] AWS CLI tool error when trying to download non-existing cache from AWS microsoft/vcpkg#31932

Closed

petamas deleted the aws-s3-cache-miss-should-not-issue-warning branch July 8, 2024 17:53

petamas mentioned this pull request Jul 8, 2024

aws s3 ls on non-existent prefix returns error code 1 aws/aws-cli#5544

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS S3 cache miss should not issue warning #1404

AWS S3 cache miss should not issue warning #1404

petamas commented May 9, 2024

petamas commented May 9, 2024

BillyONeal May 14, 2024

petamas May 14, 2024

BillyONeal May 14, 2024

petamas May 14, 2024

petamas May 14, 2024

BillyONeal May 14, 2024

BillyONeal May 14, 2024

BillyONeal May 14, 2024

BillyONeal May 14, 2024

petamas May 14, 2024

BillyONeal May 14, 2024

BillyONeal May 14, 2024

BillyONeal commented May 14, 2024

petamas commented May 14, 2024

petamas commented May 14, 2024

BillyONeal commented May 14, 2024

petamas commented May 16, 2024

petamas May 17, 2024

petamas May 17, 2024

petamas May 22, 2024

BillyONeal commented Jun 5, 2024 •

edited

Loading

BillyONeal left a comment

AWS S3 cache miss should not issue warning #1404

AWS S3 cache miss should not issue warning #1404

Conversation

petamas commented May 9, 2024

petamas commented May 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal commented May 14, 2024

petamas commented May 14, 2024

petamas commented May 14, 2024

BillyONeal commented May 14, 2024

petamas commented May 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal commented Jun 5, 2024 • edited Loading

BillyONeal left a comment

Choose a reason for hiding this comment

BillyONeal commented Jun 5, 2024 •

edited

Loading