Binary cache: async push_success #908

autoantwort · 2023-02-15T13:52:30Z

This results in ~10-20% faster build times on my machine.

For example building boost on my M1 mac went down from 2.948 min to 2.375 min

autoantwort · 2023-02-15T21:19:32Z

How or when should "upload messages" (like Uploaded binaries to {count} {vendor}.) be printed?

Thomas1664 · 2023-02-16T14:24:53Z

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

src/vcpkg/binarycaching.cpp

autoantwort · 2023-02-16T14:27:43Z

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

Thomas1664 · 2023-02-16T14:44:30Z

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

The problem is that we almost never can be sure that there isn't some nested API call that exits on failure. But it seems like #909 at least partially addresses this issue.

autoantwort · 2023-02-16T14:46:12Z

Yeah but in the binary cache are nearly no hard exists. It currently also only prints warnings.

# Conflicts: # src/vcpkg.cpp

ras0219-msft

I like this direction; unblocking I/O work has great potential for making vcpkg much faster.

However we need to be very careful about the impacts of concurrency -- deadlocks suck :(

include/vcpkg/binarycaching.h

src/vcpkg/install.cpp

src/vcpkg/binarycaching.cpp

include/vcpkg/binarycaching.h

ras0219-msft · 2023-03-01T01:29:11Z

src/vcpkg.cpp

@@ -156,6 +157,7 @@ namespace vcpkg::Checks
    // Implements link seam from basic_checks.h
    void on_final_cleanup_and_exit()
    {
+        BinaryCache::wait_for_async_complete();


I do not think we can do this here. This is on the critical path for ctrl-c handling and should only be used for extremely fast, emergency tear-down behavior (like restoring the console).

If there happens to be an exit anywhere in any BinaryCache implementation, this would deadlock. Importantly, this include any sort of assertion we might want to do, like checking pointers for null.

Unfortunately, the only path forward I see is to call this (or appropriately scope the BinaryCache itself) at the relevant callers. The consequence of possibly not uploading some set of binary caches in the case of some unhandled program error (such as permissions issue on a directory expected to be writable) is vastly preferable to deadlocks.

I have changed the BinaryCache::wait_for_async_complete() implementation so it does not deadlock anymore.

I also moved the call to Checks::exit_with_code which is not called when crtl+c is handled. (I personally would like to have a way to terminate vcpkg but wait until the binary cache is done so that I don't lose progress.)

And I prefer it when build packages are uploaded to the binary caches before vcpkg exits because of an error, otherwise I have to build the already build packages again at a later point when there is no cache entry.

Agreed that it is, desirable to finish uploading on "understood" errors. For example, if a package failed to build or failed to be installed.

I was also wrong about my original assessment of a deadlock. My concern was the call path of the binary upload thread calling Checks::unreachable() or .value_or_exit(), but it seems that std::thread::join() does have a carve-out to handle this specific case: it will throw a resource_deadlock_would_occur if you try to join yourself.

I've put some other concerns below, but I don't want those to distract from my main point: We must make it as trivial / correct-by-construction as possible to guarantee that the binary cache thread NEVER attempts to wait on itself. I think the best approach for vcpkg right now is to add calls from Install::perform() etc to BinaryCache::wait_for_async_complete() before any "user-facing" error, such as the exit guarded by result.code != BuildResult::SUCCEEDED && keep_going == KeepGoing::NO. This is motivated by the perspective that it's always safer to terminate than to join and possibly deadlock / race condition / etc.

There's still a UB data race if the main thread and binary upload thread attempt to exit at the same time:

Concurrently calling join() on the same thread object from multiple threads constitutes a data race that results in undefined behavior.
-- https://en.cppreference.com/w/cpp/thread/thread/join

There's also a serious "scalability" problem if we ever want a second background thread for whatever reason, because BGThread A would join on BGThread B, while BGThread B tries to join on BGThread A. This might be solvable with ever more complex structures, such as a thread ownership DAG that gets threads to join only on their direct children, but I don't think the benefit is worth the cost.

The UB and the joining itself could simply be prevented by doing a if (std::this_thread::get_id() == instance->push_thread.get_id()). My concern with the explicit approach is that it is easy to forget to call the waiting function of the BinaryCache and every time you want to exit you have to remember to call it. This seems to me to be very prone to human error.

I have now implemented your request

@ras0219-msft Is there anything left that is preventing this PR from being merged?

src/vcpkg/binarycaching.cpp

Co-authored-by: Robert Schumacher <roschuma@microsoft.com>

…ages between package installs Co-authored-by: Robert Schumacher <roschuma@microsoft.com>

See microsoft#908 (comment)

# Conflicts: # src/vcpkg/build.cpp

# Conflicts: # src/vcpkg/base/messages.cpp

# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp

# Conflicts: # src/vcpkg/commands.ci.cpp

include/vcpkg/base/batch-quere.h

# Conflicts: # src/vcpkg/commands.ci.cpp # src/vcpkg/commands.install.cpp # src/vcpkg/commands.set-installed.cpp # src/vcpkg/commands.upgrade.cpp

Thomas1664

Mostly nitpicks.
I think we should use size_t instead of int when we're using a variable effectively as an index inside a loop.
Furthermore, I noticed that some variable names are just letters. For me, this is confusing because I don't know what the variable is for a few lines later.

Thomas1664 · 2023-10-04T08:35:51Z

include/vcpkg/binarycaching.h

+
+        BGMessageSink m_bg_msg_sink;
+        BGThreadBatchQueue<ActionToPush> m_actions_to_push;
+        std::atomic_int m_remaining_packages_to_push = 0;


Suggested change

std::atomic_int m_remaining_packages_to_push = 0;

std::atomic_size_t m_remaining_packages_to_push = 0;

I think this should be size_t because we effectively could have size_t packages.

This member existing at all is a smell to me. The number of items in the queue should be a part of the queue, not something tracked externally. Moreover, the queue already contains locks and stuff so I'm not sure why we need an atomic here.

src/vcpkg/binarycaching.cpp

src/vcpkg/base/message_sinks.cpp

BillyONeal

I'm going to tackle the structural error reporting etc. thing now that I think this change is really exposing / crying out for. In particular, I think we should consider a design which more formally associates chunks of console output with the package in question rather than a free for all on the message sink type requiring a lot of 'reverse engineer from stuff the caller already knew' kind of behavior.
I think the discussion of how we are achieving thread safety across several large components like this needs the comment like I described in a comment.
The comment about 'BinaryCache is supposed to hide unique_ptr' should get fixed, which will also eliminate a lot of the changes in this PR attempting to adapt to that change.

When I have a solution for (1) are you interested in me just pushing changes into this PR or do you want a PR for your PR?

(By and large I think this change is structurally good)

BillyONeal · 2023-10-23T21:55:52Z

include/vcpkg/binarycaching.h


        BinaryCache(const Filesystem& fs);
        BinaryCache(const BinaryCache&) = delete;
-        BinaryCache(BinaryCache&&) = default;
+        BinaryCache(BinaryCache&&) = delete;


I disagree with this design change. BinaryCache itself already contains several unique_ptr-alikes, and intends to firewall that storage decision from its customers. If there are immovable bits the unique_ptr juggling should happen inside rather than outside.

The comment about 'BinaryCache is supposed to hide unique_ptr' should get fixed, which will also eliminate a lot of the changes in this PR attempting to adapt to that change.

How would you implement that? The function push_thread_main uses the members of the class, which gets moved away if you move the BinaryCache. So I guess using pimpl is the only solution here?

I have now put all data in an extra struct that is hold via a std::unique_ptr by the BinaryCache class.

BillyONeal · 2023-10-23T22:06:13Z

include/vcpkg/binarycaching.h

@@ -196,23 +200,41 @@ namespace vcpkg

    struct BinaryCache : ReadOnlyBinaryCache


When adding threading functionality to the codebase that is not effectively 'do a single function but faster', I think there needs to be a discussion in a comment of 'this is how the threads and communication between them work'.

For instance:

// compression and upload of binary cache entries happens on a single 'background' thread, `m_push_thread` // Thread safety is achieved within the binary cache providers by: // 1. Only using one thread in the background for this work. // 2. Forming a queue of work for that thread to consume in `m_actions_to_push`, which maintains its own thread safety // 3. Sending any replies from the background thread through `m_bg_msg_sink` // 4. Ensuring any supporting data, such as tool exes, is provided before the background thread is started. // 5. Ensuring that work is not submitted to the background thread until the corresponding `packages` directory to upload is no longer being actively touched by the foreground thread.

Does your approach here survive #1076 ? Hard links mean that we have a lot less certainty on the packages directory being 'hermetic'.

#1076 does not have an impact here. The files "copied" to the installed dir are not later changed by another package. Even if they are overwritten, the hard link in the packages folder would still link to the same original file.

It has an impact on windows if #802 gets merged before this. Because when the packages folder gets compressed by 7z, the next feature gets tested and stuff gets removed from the installed dir and on windows you cant remove a hard link even if the linked file is only opened via another hard link -.-

PS: Not sure, but I noticed this behavior only on the windows dev drive, but maybe the normal filesystem was simply too slow so that this never happened

I was finally able to catch this situation that happens in combination of this PR with #802:

Interestingly the catch seems to only happen with this specific rocksdb.lib file and error nearly only happens with lib files.

And in general it only happens on the dev drive and not on a normal NTFS drive. Strage 😕

BillyONeal · 2023-10-23T22:13:56Z

include/vcpkg/base/message_sinks.h

+        // buffers messages until newline is reached
+        // guarded by m_print_directly_lock
+        std::vector<std::pair<Color, std::string>> m_unpublished;


Something like that? Maybe? I think what this really argues is that we really need a semi-standardized 'document' error type @ras0219-msft has been asking for ages... we're almost done getting rid of ParseControlErrorInfo which means we will finally have a unified way of handling errors and can finally look at messing with that.

BillyONeal · 2023-10-23T22:16:44Z

include/vcpkg/binarycaching.h

+
+        BGMessageSink m_bg_msg_sink;
+        BGThreadBatchQueue<ActionToPush> m_actions_to_push;
+        std::atomic_int m_remaining_packages_to_push = 0;


This member existing at all is a smell to me. The number of items in the queue should be a part of the queue, not something tracked externally. Moreover, the queue already contains locks and stuff so I'm not sure why we need an atomic here.

BillyONeal · 2023-10-23T22:18:36Z

include/vcpkg/base/batch-queue.h

+
+    bool empty() const { return forward.empty(); }
+
+    void pop(std::vector<T>& out)


I don't like this thing being called queue given that this is how it works. Given that we expect this to be a multi producer single consumer queue, can we instead put the vector inside and note that only one thread may call pop but any number of threads may call push? That would also resolve the criticism over separate tracking atomics below.

BatchQueue alone is not thread safe.

Given that we expect this to be a multi producer single consumer queue, can we instead put the vector inside and note that only one thread may call pop but any number of threads may call push?

I don't get what you have in mind here 😅

I don't like this thing being called queue given that this is how it works.

Do you have an idea for a better name? :)

BillyONeal · 2023-10-23T22:19:13Z

src/vcpkg/base/message_sinks.cpp

+        std::lock_guard<std::mutex> print_lk(m_print_directly_lock);
+        std::lock_guard<std::mutex> lk(m_published_lock);


Suggested change

std::lock_guard<std::mutex> print_lk(m_print_directly_lock);

std::lock_guard<std::mutex> lk(m_published_lock);

std::lock_guard<std::mutex, std::mutex> lk(m_print_directly_lock, m_published_lock);

if this survives

I think you actually mean std::scoped_lock.

autoantwort · 2023-10-23T22:35:24Z

When I have a solution for (1) are you interested in me just pushing changes into this PR or do you want a PR for your PR?

You can just push to this PR. I will look through the other comments in the following days :)

# Conflicts: # include/vcpkg/base/message-data.inc.h # locales/messages.json

…oved while the data is accessed from a thread

# Conflicts: # src/vcpkg/binarycaching.cpp

…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is. Consider the following: ```c++ ExpectedL<T> example_api(int a); ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text, StringView control_path, MessageSink& warning_sink); ``` The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface. Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter. 1. Distinguish whether an overall operation succeeded or failed in the return value, but 2. record any errors or warnings via an out parameter. Applying this to the above gives: ```c++ Optional<T> example_api(MessageContext& context, int a); // unique_ptr is already 'optional' std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context, StringView text, StringView control_path); ``` Issues this new mechanism fixes: * Errors and warnings can share the same channel and thus be printed together * The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 ) * This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components Known issues that are not fixed by this change: * This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed. * Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it. * Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.

Binary cache: async push_success

95f0438

autoantwort marked this pull request as draft February 15, 2023 20:04

Thomas1664 reviewed Feb 16, 2023

View reviewed changes

src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved

autoantwort mentioned this pull request Feb 16, 2023

Unified object provider backend #911

Draft

Merge branch 'main' into feature/async-binary-cache-push-success

9d999d8

# Conflicts: # src/vcpkg.cpp

ras0219-msft requested changes Mar 1, 2023

View reviewed changes

autoantwort and others added 14 commits March 2, 2023 21:34

Merge branch 'main' into feature/async-binary-cache-push-success

163d9cd

Apply suggestions from code review

2a54205

Co-authored-by: Robert Schumacher <roschuma@microsoft.com>

Adapt code review

0912655

Update src/vcpkg/binarycaching.cpp

5d7288c

Co-authored-by: Robert Schumacher <roschuma@microsoft.com>

Adapt code review

10189ac

Remove unnecessary actions_to_push_notifier.notify_all()

2567607

Prevent deadlock and don't be on the crtl+c path

ecdd000

Add and use BGMessageSink to print IBinaryProvider::push_success mess…

8e7ae61

…ages between package installs Co-authored-by: Robert Schumacher <roschuma@microsoft.com>

Restore old upload message

850d7c9

Don't join yourself

548be38

Print messages about remaining packages to upload

6dbbf06

Localization

74b86fd

Improve messages

5171d3e

No singleton and explicit calls to wait_for_async_complete()

d69ed8f

See microsoft#908 (comment)

autoantwort marked this pull request as ready for review March 5, 2023 20:28

autoantwort added 4 commits March 8, 2023 19:22

Merge branch 'main' into feature/async-binary-cache-push-success

2df42d5

# Conflicts: # src/vcpkg/build.cpp

Merge branch 'main' into feature/async-binary-cache-push-success

5f1786e

# Conflicts: # src/vcpkg/base/messages.cpp

Merge branch 'main' into feature/async-binary-cache-push-success

93303c3

Merge branch 'main' into feature/async-binary-cache-push-success

8a26c8b

# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp

BillyONeal removed the depends:different-pr This PR depends on a different PR which has been filed label Jun 22, 2023

autoantwort mentioned this pull request Jul 2, 2023

[vcpkg-ci-llvm] Find cacheable feature sets microsoft/vcpkg#32352

Merged

7 tasks

Merge branch 'main' into feature/async-binary-cache-push-success

ca5f2b1

# Conflicts: # src/vcpkg/commands.ci.cpp

Thomas1664 reviewed Aug 20, 2023

View reviewed changes

include/vcpkg/base/batch-quere.h Outdated Show resolved Hide resolved

include/vcpkg/base/batch-quere.h Outdated Show resolved Hide resolved

include/vcpkg/base/batch-quere.h Outdated Show resolved Hide resolved

autoantwort added 3 commits August 24, 2023 16:37

Fix typo

eccd9ee

Fix typo in file name

e7837e0

Merge branch 'main' into feature/async-binary-cache-push-success

969e7fc

# Conflicts: # src/vcpkg/commands.ci.cpp # src/vcpkg/commands.install.cpp # src/vcpkg/commands.set-installed.cpp # src/vcpkg/commands.upgrade.cpp

autoantwort mentioned this pull request Sep 5, 2023

Test features in ci #802

Open

BillyONeal self-assigned this Oct 4, 2023

Thomas1664 reviewed Oct 4, 2023

View reviewed changes

autoantwort added 3 commits October 11, 2023 15:31

Merge branch 'main' into feature/async-binary-cache-push-success

2d5586f

Renamings

809d0b6

format

455e29b

BillyONeal reviewed Oct 23, 2023

View reviewed changes

autoantwort added 2 commits November 4, 2023 15:47

Merge branch 'main' into feature/async-binary-cache-push-success

03fdfea

# Conflicts: # include/vcpkg/base/message-data.inc.h # locales/messages.json

BinaryCache and std::unique_ptr

f4bad8c

autoantwort marked this pull request as draft November 12, 2023 02:19

autoantwort added 4 commits November 12, 2023 03:27

Merge branch 'main' into feature/async-binary-cache-push-success

26bbbd5

BinaryCache: save data in std::unique_ptr so that the object can be m…

814e434

…oved while the data is accessed from a thread

fix

290e586

Merge branch 'main' into feature/async-binary-cache-push-success

3cc3378

# Conflicts: # src/vcpkg/binarycaching.cpp

BillyONeal mentioned this pull request Jan 10, 2024

Introduce DiagnosticContext, an 'error document' like type. #1323

Open

Thomas1664 mentioned this pull request Apr 30, 2024

Make individual print calls thread safe #1290

Open

BillyONeal mentioned this pull request Jun 7, 2024

Binary Cache: Add write-back support #1406

Open

Merge branch 'main' into feature/async-binary-cache-push-success

978ceae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary cache: async push_success #908

Binary cache: async push_success #908

autoantwort commented Feb 15, 2023 •

edited

Loading

autoantwort commented Feb 15, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

ras0219-msft left a comment

ras0219-msft Mar 1, 2023

autoantwort Mar 2, 2023

ras0219-msft Mar 3, 2023 •

edited

Loading

autoantwort Mar 4, 2023

autoantwort Mar 5, 2023 •

edited

Loading

autoantwort Mar 18, 2023

Thomas1664 left a comment

Thomas1664 Oct 4, 2023

BillyONeal Oct 23, 2023

BillyONeal left a comment

BillyONeal Oct 23, 2023

autoantwort Nov 14, 2023

autoantwort Nov 14, 2023

BillyONeal Oct 23, 2023

BillyONeal Oct 23, 2023

autoantwort Nov 14, 2023

autoantwort Nov 25, 2023 •

edited

Loading

autoantwort Dec 2, 2023

autoantwort Dec 3, 2023

BillyONeal Oct 23, 2023

BillyONeal Oct 23, 2023

BillyONeal Oct 23, 2023

autoantwort Nov 4, 2023

BillyONeal Oct 23, 2023

Thomas1664 Nov 7, 2023

autoantwort commented Oct 23, 2023

	std::atomic_int m_remaining_packages_to_push = 0;
	std::atomic_size_t m_remaining_packages_to_push = 0;

		@@ -196,23 +200,41 @@ namespace vcpkg

		struct BinaryCache : ReadOnlyBinaryCache


		bool empty() const { return forward.empty(); }

		void pop(std::vector<T>& out)

		std::lock_guard<std::mutex> print_lk(m_print_directly_lock);
		std::lock_guard<std::mutex> lk(m_published_lock);

	std::lock_guard<std::mutex> print_lk(m_print_directly_lock);
	std::lock_guard<std::mutex> lk(m_published_lock);
	std::lock_guard<std::mutex, std::mutex> lk(m_print_directly_lock, m_published_lock);

Binary cache: async push_success #908

Are you sure you want to change the base?

Binary cache: async push_success #908

Conversation

autoantwort commented Feb 15, 2023 • edited Loading

autoantwort commented Feb 15, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

Thomas1664 commented Feb 16, 2023

autoantwort commented Feb 16, 2023

ras0219-msft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ras0219-msft Mar 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

autoantwort Mar 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thomas1664 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BillyONeal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

autoantwort Nov 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

autoantwort commented Oct 23, 2023

autoantwort commented Feb 15, 2023 •

edited

Loading

ras0219-msft Mar 3, 2023 •

edited

Loading

autoantwort Mar 5, 2023 •

edited

Loading

autoantwort Nov 25, 2023 •

edited

Loading