Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Sync() on every upload #67

Open
buchgr opened this issue Jan 12, 2019 · 0 comments
Open

Don't Sync() on every upload #67

buchgr opened this issue Jan 12, 2019 · 0 comments

Comments

@buchgr
Copy link
Owner

buchgr commented Jan 12, 2019

We found that on some hardware this absolutely kills the performance of the system e.g. a Mac Pro became completely unresponsive. In general, there is no need for calling Sync() on every write as in the worst case it's ok to lose data. We should only ensure that on loading the entries from disk that there is data integrity.

@buchgr buchgr added this to the 1.0 milestone Jan 12, 2019
mostynb added a commit to mostynb/bazel-remote that referenced this issue Dec 19, 2019
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Dec 30, 2019
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Jan 1, 2020
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Jan 28, 2020
This pattern is used in the filepath.Walk godocs.

Discovered while investigating buchgr#67, committing separately so it doesn't
get lost in a larger PR.
mostynb added a commit that referenced this issue Jan 28, 2020
This pattern is used in the filepath.Walk godocs.

Discovered while investigating #67, committing separately so it doesn't
get lost in a larger PR.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 10, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 10, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 10, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 10, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 11, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 14, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 14, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 18, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 26, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Feb 26, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Mar 4, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Mar 8, 2020
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (buchgr#67).

This also includes a slightly refactored version of PR buchgr#123
(load the items from disk concurrently) by @bdittmer.
@mostynb mostynb removed this from the 1.0 milestone Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants