Skip to content

Commit

Permalink
disk cache: store a data integrity header for non-CAS blobs
Browse files Browse the repository at this point in the history
The header is made up of three fields:
1) Little-endian int32 (4 bytes) representing the REAPIv2
   DigestFunction.
2) Little-endian int64 (8 bytes) representing the number
   of bytes in the blob.
3) The hash bytes from the digest, length determined by
   the particular DigestFunction.
   (32 for SHA256. 20 for SHA1, 16 for MD5).

Note that we currently only support SHA256, however.

This header is simple to parse, and does not require buffering the
entire blob in memory if you just want the data.

To distinguish blobs with and without this header, we use new
directories for the affected blobs: ac.v2/ instead of ac/ and
similarly for raw/.

We do not use this header to actually verify data yet, and we
still os.File.Sync() after file writes (#67).

This also includes a slightly refactored version of PR #123
(load the items from disk concurrently) by @bdittmer.
  • Loading branch information
mostynb committed Feb 14, 2020
1 parent 40bd979 commit 6c1660d
Show file tree
Hide file tree
Showing 5 changed files with 845 additions and 150 deletions.
2 changes: 2 additions & 0 deletions cache/disk/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ go_library(
name = "go_default_library",
srcs = [
"disk.go",
"load.go",
"lru.go",
],
importpath = "github.com/buchgr/bazel-remote/cache/disk",
Expand All @@ -29,5 +30,6 @@ go_test(
"//cache:go_default_library",
"//cache/http:go_default_library",
"//utils:go_default_library",
"@com_github_bazelbuild_remote_apis//build/bazel/remote/execution/v2:go_default_library",
],
)
Loading

0 comments on commit 6c1660d

Please sign in to comment.