-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Bazel cache download retry #11238
Conversation
8d976d3
to
80846af
Compare
changelog_begin changelog_end
b80d840
to
1628c9e
Compare
This reverts commit b62f16c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you so much for all your work on this!
+ return; | ||
+ } | ||
+ } | ||
+ downloadSucceeded = full_content || partial_content; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just use an else
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downloadSucceeded
variable is used further down in Bazel's code. So, we'll still need the binding.
I was considering to move the out = ...
part into the else
. But, I decided against it because it makes it clearer that this is specifically about the failure case. However, I don't feel strongly about this. So, happy to be convinced otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, makes sense to keep it the way it is.
This PR has been created by a script, which is not very smart and does not have all the context. Please do double-check that the version prefix is correct before merging. @akshayshirahatti-da is in charge of this release. Commit log: ``` e474b2d [JSON-API] Websockets fix for matchedQueries (#11361) 4a34b68 KV: port V2 errors to self-service errors framework [KVL-1143] (#11326) 811a6d3 Fixed AuthorizationInterceptorSpec again (#11418) c8006b8 ScenarioRunner: enrich incomplete transactions (#11384) d9c7031 ACS testing - payload support [DPP-661] (#11308) d87d3d4 deal with deadlocks while fetching contracts in json-api Oracle (#11391) 8212c0b Make submission ID optional [KVL-1107] (#11011) 3587eb8 Use Timestamp instead of Instant (#11356) ea5f09e sandbox: Deprecate the `--eager-package-loading` flag. (#11404) 9f882f2 remove search index on json fields that harm insert and pruning performance (#11041) 70b90f4 optimize max event_sequential_id query for oracle (#11297) b1fed31 Fix missing script results (#11395) 03db0aa Auto run/check security evidence generation in ./fmt.sh (#11407) c928f0e [Short] Typo (#11400) ba6c2be Add missing TransactionId to com.daml.error.ErrorResource (#11396) a2a1571 Generate security evidence by documenting security testcases (#11306) 8d17882 Allocate parties sequentially in script export tests (#11389) 1309c2f DPP-587 Use Timestamp instead of Instant (#11183) 82f9873 Rotate release rotation (#11394) b14077a Fix AuthorizationInterceptorSpec flake (#11387) 7090f2d update NOTICES file (#11367) ad42dfa Update gRPC to the latest (1.41.0) and Protobuf (#11380) 54c400a Update wording in Deploying to a generic Daml ledger (#11327) 4461ed1 Fix log output (#11374) 613aac3 Add support for non-star-kinded type synonyms in data-dependencies (#11293) f89ecc6 interfaces: add an experimental `toTypeRep` builtin. (#11378) 5654d5c fix es ingest for missing files (#11375) 03cfd12 Configurable assertions in Ledger API test tool by feature descriptors (#11328) 96b7b58 [DPP-648][Self-service error codes] Adopt ApiPartyManagementService (#11338) 9e94ae0 LF: move repl exception-auth test from dev to stable (#11369) 5365d68 LF: Remove PartialTransaction out from ScenarioRunner/IdeLedgerClient (#11368) 79037c8 [DPP-646][Self-service error codes] Adopt ApiPackageManagementService (#11314) 0ee59f5 Command submission in the ledger-api-bench-tool. (#11296) 8d5cab5 LF: Simplify seeds generation in scenario runnner (#11353) 9e5b788 Speedup daml repl integration tests (#11335) 3bc0db3 fix contract_tpid_fkey-related race condition (#11330) ab8a863 [docs] Add Daml Driver for VMBC to the commercial integrations section (#11360) c95db72 Fix Bazel cache download retry (#11238) e8d0ccb [DPP-611][Self-service error codes] Adapt ApiCommandService (#11325) a89079b [DPP-647][Self-service error codes] Adopt ApiParticipantPruningService (#11324) cc8ec28 [Self-service error codes] Adapt GrpcHealthService (#11354) c60c94b [DPP-645][Self-service error codes] Adapt ApiConfigManagementService (#11312) e6da1f7 Add step in ghc-lib guide for getting submodules to work (#11351) f3057ea Increase timeout for non-repudation tests on Postgres (#11340) 176f470 interface: adding interfaces to the TS codegen (#11280) 355352f DPP-650 Remove the mutating schema (#11211) 443b64d [DPP-621][Self-service error codes] Adopt error codes in ApiVersionService (#11302) ed9dbed interfaces: Add fixed choice collision check in typechecker (Haskell) (#11337) c37ecd1 [Short] Pass correct loggingContext to withValidatedPackageId (#11307) 0d305cf [Short] Move field before logging statement (ApiTimeService) (#11313) 73c94b5 Increase timeout for non-repudation test (#11281) 88c607b [Self-service error codes] Adapt ApiTransactionService [DPP-613] (#11094) 07ad3e0 Suport multi-party readAs in triggers (#11299) 76eb165 Interface fixed choices: ghc parser (#11275) da27a1e [DPP-619][Self-service error codes] Adopt error codes in ApiVersionService (#11303) 5f5af30 [DPP-628][Self-service error codes] Adapt error codes in ApiTimeService (#11295) f9e67ad [Self-service error codes] Adapt error responses in ledger-api-auth [DPP-617] (#11223) 7282965 Fix component status for triggers (#11311) 17776f3 Factor out npm install step of create-daml-app tests (#11294) 3a8b685 [Short] Fix docs for Dispatcher#startingAt (#11304) f315a90 release 1.18.0-snapshot.20211019.8113.0.8ff347d8 (#11300) 8a3abce [DPP-618][Self-service error codes] Adapt error codes in ApiPackageService (#11284) 50ea92f Use ApiTypes.Party instead of String in the trigger runner (#11298) 2267429 [DPP-656] Assert on self-service error code details in ErrorFactoriesSpec (#11289) c06faf2 LF: remove imperative environment from Speedy compiler (#11285) d3dad75 [DPP-592] Generate docs for self-service error codes. (#11129) ``` Changelog: ``` [JSON-API] fixes a bug related to the matchedQueries value returned for websocket multiqueries, this only happens for patterns where the multiqueries contain a mixture of queries with and without offsets. - [Integration Kit] - ledger-api-bench-tool can generate test contracts with configurable payload size. - [Integration Kit] - Added multi-template support for command submission in the ledger-api-bench-tool - [Sandbox] The ``--eager-package-loading`` flag has been deprecated. It hasn't actually done anything for many releases; packages are always loaded eagerly. This does not affect Daml on SQL, which does support lazy package loading. - [Daml Studio] Fix a bug where script results in Daml Studio sometimes do not show up. - [Integration Kit] - The ledger-api-bench-tool is now capable of generating test contracts for testing purposes. - [JSON API] Fixed a rare error that manifested as ‘violates foreign key constraint "contract_tpid_fkey" Detail: Key (tpid)=(...) is not present in table’ when attempting to run queries and goes away on JSON API restart. See `issue #11330 <https://github.com/digital-asset/daml/pull/11330>`__. - [Participant] All participants now use the new append-only schema. Existing databases will automatically upgrade to the new schema the first time a participant/ledger is started. - [Daml Triggers] Triggers now support readAs parties. They can be specified via `--ledger-readas a,b,c`. As part of this change ``testRule`` gained an extra argument to specify the `readAs` parties. If you previously used ``` testRule trigger party acsBuilder commandsInFlight s ``` you now need to use ``` testRule trigger party [] acsBuilder commandsInFlight s ``` ``` CHANGELOG_BEGIN CHANGELOG_END
This PR has been created by a script, which is not very smart and does not have all the context. Please do double-check that the version prefix is correct before merging. @akshayshirahatti-da is in charge of this release. Commit log: ``` e474b2d [JSON-API] Websockets fix for matchedQueries (#11361) 4a34b68 KV: port V2 errors to self-service errors framework [KVL-1143] (#11326) 811a6d3 Fixed AuthorizationInterceptorSpec again (#11418) c8006b8 ScenarioRunner: enrich incomplete transactions (#11384) d9c7031 ACS testing - payload support [DPP-661] (#11308) d87d3d4 deal with deadlocks while fetching contracts in json-api Oracle (#11391) 8212c0b Make submission ID optional [KVL-1107] (#11011) 3587eb8 Use Timestamp instead of Instant (#11356) ea5f09e sandbox: Deprecate the `--eager-package-loading` flag. (#11404) 9f882f2 remove search index on json fields that harm insert and pruning performance (#11041) 70b90f4 optimize max event_sequential_id query for oracle (#11297) b1fed31 Fix missing script results (#11395) 03db0aa Auto run/check security evidence generation in ./fmt.sh (#11407) c928f0e [Short] Typo (#11400) ba6c2be Add missing TransactionId to com.daml.error.ErrorResource (#11396) a2a1571 Generate security evidence by documenting security testcases (#11306) 8d17882 Allocate parties sequentially in script export tests (#11389) 1309c2f DPP-587 Use Timestamp instead of Instant (#11183) 82f9873 Rotate release rotation (#11394) b14077a Fix AuthorizationInterceptorSpec flake (#11387) 7090f2d update NOTICES file (#11367) ad42dfa Update gRPC to the latest (1.41.0) and Protobuf (#11380) 54c400a Update wording in Deploying to a generic Daml ledger (#11327) 4461ed1 Fix log output (#11374) 613aac3 Add support for non-star-kinded type synonyms in data-dependencies (#11293) f89ecc6 interfaces: add an experimental `toTypeRep` builtin. (#11378) 5654d5c fix es ingest for missing files (#11375) 03cfd12 Configurable assertions in Ledger API test tool by feature descriptors (#11328) 96b7b58 [DPP-648][Self-service error codes] Adopt ApiPartyManagementService (#11338) 9e94ae0 LF: move repl exception-auth test from dev to stable (#11369) 5365d68 LF: Remove PartialTransaction out from ScenarioRunner/IdeLedgerClient (#11368) 79037c8 [DPP-646][Self-service error codes] Adopt ApiPackageManagementService (#11314) 0ee59f5 Command submission in the ledger-api-bench-tool. (#11296) 8d5cab5 LF: Simplify seeds generation in scenario runnner (#11353) 9e5b788 Speedup daml repl integration tests (#11335) 3bc0db3 fix contract_tpid_fkey-related race condition (#11330) ab8a863 [docs] Add Daml Driver for VMBC to the commercial integrations section (#11360) c95db72 Fix Bazel cache download retry (#11238) e8d0ccb [DPP-611][Self-service error codes] Adapt ApiCommandService (#11325) a89079b [DPP-647][Self-service error codes] Adopt ApiParticipantPruningService (#11324) cc8ec28 [Self-service error codes] Adapt GrpcHealthService (#11354) c60c94b [DPP-645][Self-service error codes] Adapt ApiConfigManagementService (#11312) e6da1f7 Add step in ghc-lib guide for getting submodules to work (#11351) f3057ea Increase timeout for non-repudation tests on Postgres (#11340) 176f470 interface: adding interfaces to the TS codegen (#11280) 355352f DPP-650 Remove the mutating schema (#11211) 443b64d [DPP-621][Self-service error codes] Adopt error codes in ApiVersionService (#11302) ed9dbed interfaces: Add fixed choice collision check in typechecker (Haskell) (#11337) c37ecd1 [Short] Pass correct loggingContext to withValidatedPackageId (#11307) 0d305cf [Short] Move field before logging statement (ApiTimeService) (#11313) 73c94b5 Increase timeout for non-repudation test (#11281) 88c607b [Self-service error codes] Adapt ApiTransactionService [DPP-613] (#11094) 07ad3e0 Suport multi-party readAs in triggers (#11299) 76eb165 Interface fixed choices: ghc parser (#11275) da27a1e [DPP-619][Self-service error codes] Adopt error codes in ApiVersionService (#11303) 5f5af30 [DPP-628][Self-service error codes] Adapt error codes in ApiTimeService (#11295) f9e67ad [Self-service error codes] Adapt error responses in ledger-api-auth [DPP-617] (#11223) 7282965 Fix component status for triggers (#11311) 17776f3 Factor out npm install step of create-daml-app tests (#11294) 3a8b685 [Short] Fix docs for Dispatcher#startingAt (#11304) f315a90 release 1.18.0-snapshot.20211019.8113.0.8ff347d8 (#11300) 8a3abce [DPP-618][Self-service error codes] Adapt error codes in ApiPackageService (#11284) 50ea92f Use ApiTypes.Party instead of String in the trigger runner (#11298) 2267429 [DPP-656] Assert on self-service error code details in ErrorFactoriesSpec (#11289) c06faf2 LF: remove imperative environment from Speedy compiler (#11285) d3dad75 [DPP-592] Generate docs for self-service error codes. (#11129) ``` Changelog: ``` [JSON-API] fixes a bug related to the matchedQueries value returned for websocket multiqueries, this only happens for patterns where the multiqueries contain a mixture of queries with and without offsets. - [Integration Kit] - ledger-api-bench-tool can generate test contracts with configurable payload size. - [Integration Kit] - Added multi-template support for command submission in the ledger-api-bench-tool - [Sandbox] The ``--eager-package-loading`` flag has been deprecated. It hasn't actually done anything for many releases; packages are always loaded eagerly. This does not affect Daml on SQL, which does support lazy package loading. - [Daml Studio] Fix a bug where script results in Daml Studio sometimes do not show up. - [Integration Kit] - The ledger-api-bench-tool is now capable of generating test contracts for testing purposes. - [JSON API] Fixed a rare error that manifested as ‘violates foreign key constraint "contract_tpid_fkey" Detail: Key (tpid)=(...) is not present in table’ when attempting to run queries and goes away on JSON API restart. See `issue #11330 <https://github.com/digital-asset/daml/pull/11330>`__. - [Participant] All participants now use the new append-only schema. Existing databases will automatically upgrade to the new schema the first time a participant/ledger is started. - [Daml Triggers] Triggers now support readAs parties. They can be specified via `--ledger-readas a,b,c`. As part of this change ``testRule`` gained an extra argument to specify the `readAs` parties. If you previously used ``` testRule trigger party acsBuilder commandsInFlight s ``` you now need to use ``` testRule trigger party [] acsBuilder commandsInFlight s ``` ``` CHANGELOG_BEGIN CHANGELOG_END Co-authored-by: Azure Pipelines Daml Build <support@digitalasset.com>
Bazel's previous behavior was to rebuild an artifact locally if fetching it from an HTTP remote cache failed. This behavior is different from GRPC remote cache case where Bazel will retry the fetch. The lack of retry is an issue for multiple reasons: On one hand rebuilding locally can be slower than fetching from the remote cache, on the other hand if a build action is not bit reproducible, as is the case with some compilers, then the local rebuild will trigger cache misses on further build actions that depend on the current artifact. This change aims to avoid theses issues by retrying the fetch in the HTTP cache case similarly to how the GRPC cache client does it. Some care needs to be taken due to the design of Bazel's internal remote cache client API. For a fetch the client is given an `OutputStream` object that it is expected to write the fetched data to. This may be a temporary file on disk that will be moved to the final location after the fetch completed. On retry, we need to be careful to not duplicate previously written data when writing into this `OutputStream`. Due to the generality of the `OutputStream` interface we cannot reset the file handle or write pointer to start fresh. Instead, this change follows the same pattern used in the GRPC cache client. Namely, keep track of the data previously written and continue from that offset on retry. With this change the HTTP cache client will attempt to fetch the data from the remote cache via an HTTP range request. So that the server only needs to send the data that is still missing. If the server replies with a 206 Partial Content response, then we write the received data directly into the output stream, if the server does not support range requests and instead replies with the full data, then we drop the duplicate prefix and only write into the output stream from the required offset. This patch has been running successfully in production [here](digital-asset/daml#11238). cc @cocreature Closes #14258. PiperOrigin-RevId: 508604846 Change-Id: I10a5d2a658e9c32a9d9fcd6bd29f6a0b95e84566
Bazel's previous behavior was to rebuild an artifact locally if fetching it from an HTTP remote cache failed. This behavior is different from GRPC remote cache case where Bazel will retry the fetch. The lack of retry is an issue for multiple reasons: On one hand rebuilding locally can be slower than fetching from the remote cache, on the other hand if a build action is not bit reproducible, as is the case with some compilers, then the local rebuild will trigger cache misses on further build actions that depend on the current artifact. This change aims to avoid theses issues by retrying the fetch in the HTTP cache case similarly to how the GRPC cache client does it. Some care needs to be taken due to the design of Bazel's internal remote cache client API. For a fetch the client is given an `OutputStream` object that it is expected to write the fetched data to. This may be a temporary file on disk that will be moved to the final location after the fetch completed. On retry, we need to be careful to not duplicate previously written data when writing into this `OutputStream`. Due to the generality of the `OutputStream` interface we cannot reset the file handle or write pointer to start fresh. Instead, this change follows the same pattern used in the GRPC cache client. Namely, keep track of the data previously written and continue from that offset on retry. With this change the HTTP cache client will attempt to fetch the data from the remote cache via an HTTP range request. So that the server only needs to send the data that is still missing. If the server replies with a 206 Partial Content response, then we write the received data directly into the output stream, if the server does not support range requests and instead replies with the full data, then we drop the duplicate prefix and only write into the output stream from the required offset. This patch has been running successfully in production [here](digital-asset/daml#11238). cc @cocreature Closes #14258. PiperOrigin-RevId: 508604846 Change-Id: I10a5d2a658e9c32a9d9fcd6bd29f6a0b95e84566
Bazel's previous behavior was to rebuild an artifact locally if fetching it from an HTTP remote cache failed. This behavior is different from GRPC remote cache case where Bazel will retry the fetch. The lack of retry is an issue for multiple reasons: On one hand rebuilding locally can be slower than fetching from the remote cache, on the other hand if a build action is not bit reproducible, as is the case with some compilers, then the local rebuild will trigger cache misses on further build actions that depend on the current artifact. This change aims to avoid theses issues by retrying the fetch in the HTTP cache case similarly to how the GRPC cache client does it. Some care needs to be taken due to the design of Bazel's internal remote cache client API. For a fetch the client is given an `OutputStream` object that it is expected to write the fetched data to. This may be a temporary file on disk that will be moved to the final location after the fetch completed. On retry, we need to be careful to not duplicate previously written data when writing into this `OutputStream`. Due to the generality of the `OutputStream` interface we cannot reset the file handle or write pointer to start fresh. Instead, this change follows the same pattern used in the GRPC cache client. Namely, keep track of the data previously written and continue from that offset on retry. With this change the HTTP cache client will attempt to fetch the data from the remote cache via an HTTP range request. So that the server only needs to send the data that is still missing. If the server replies with a 206 Partial Content response, then we write the received data directly into the output stream, if the server does not support range requests and instead replies with the full data, then we drop the duplicate prefix and only write into the output stream from the required offset. This patch has been running successfully in production [here](digital-asset/daml#11238). cc @cocreature Closes bazelbuild#14258. PiperOrigin-RevId: 508604846 Change-Id: I10a5d2a658e9c32a9d9fcd6bd29f6a0b95e84566
Bazel's HTTP remote cache client does not support retry on download (or upload) failure. Instead, it will rebuild the artifact locally. This is potentially slower than fetching from cache, and worse if the build action is not bit reproducible (like GHC compilation tends to be) then the local rebuild will trigger cache misses due to changed inputs down the line.
We already have a patch in place to enable retry in the HTTP cache client. Unfortunately, it has a bug: Bazel continues appending to the same file (
OutputStream
in general) without resetting the write offset to the beginning of the file. Meaning, after retry the file contains a duplicate prefix and Bazel's digest check fails and Bazel falls back to a local rebuild after all.This PR fixes that issue. Bazel's internal API is structured in a way that doesn't allow to reset the write offset. Instead, this PR takes the same approach as Bazel's already existing GRPC cache client and only downloads the missing data with a range request, or discards the duplicate prefix if a range request is not supported by the HTTP server.
I have tested this patch locally against a bazel-remote HTTP cache (which doesn't support range requests) and an nginx HTTP cache (which does support range requests). I have also tested this patch on daml CI here. Retries are logged with the prefix
RETRYING
. E.g.Errors that don't trigger a retry are logged with the prefix
NOT RETRYING
(apart from cache misses). No such cases are expected and also not found in the CI logs. However, if they ever appear we can consider extending the retry logic to cover these errors as well.This PR changes a patch, which can be awkward to review. The full diff against Bazel can be seen here.
changelog_begin
changelog_end
Pull Request Checklist
CHANGELOG_BEGIN
andCHANGELOG_END
tagsNOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with
/AzurePipelines run
totrigger the build.