Merge pull request #14 from oreiche/release-1.3

Release 1.3
oreiche · May 8, 2024 · f120428 · f120428
2 parents c631da7 + a7be241
commit f120428
Show file tree

Hide file tree

Showing 10 changed files with 198 additions and 154 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,4 @@
-## Release `1.3.0` (UNRELEASED)
+## Release `1.3.0` (2024-05-08)
 
 A feature release on top of `1.2.0`, backwards compatible.
 
@@ -9,8 +9,8 @@ A feature release on top of `1.2.0`, backwards compatible.
 - `just-mr` is able to back up and retrieve distribution files
   from a remote execution endpoint. This simplifies usage in an
   environment with restricted internet access.
-- `just execute` now supports blob splitting as new RPC call. `just
-  install` uses this call to reduce traffic if the remote-execution
+- `just execute` now supports blob splitting as new RPC call.
+  `just install` uses this call to reduce traffic if the remote-execution
   endpoint supports blob splitting and the `--remember` option is given.
   In this way, traffic from the remote-execution endpoint can be reduced
   when subsequently installing artifacts with only small local
@@ -36,10 +36,14 @@ A feature release on top of `1.2.0`, backwards compatible.
 - Support for fetching archives from FTP and TFTP was added to `just-mr`
   if it was built with bundled curl. For package builds, libcurl has
   enabled whatever the distro considers suitable.
-- The `gc` subcommand supports an option `--no-rotate` to carry out
-  only local clean up.
+- The `gc` subcommand supports an option `--no-rotate` to carry
+  out only local clean up. Part of that local clean up, that is
+  also done as part of a full `gc`, is splitting large files. Note
+  that stable versions before `1.3.0` cannot use those split files.
+  Hence a downgrade after a `gc` with `1.3.0` (or higher) requires
+  cleaning of cache and CAS.
 - The expression language has been extended and, in particular,
-  allows indexed access to an arry (basically using it as a tuple)
+  allows indexed access to an array (basically using it as a tuple)
   and a generic form of assertion (to report user errors).
 - The `analyse` subcommand supports a new flag `--dump-result` to dump
   the analysis result to a file or stdout (if `-` is given).
@@ -100,6 +104,15 @@ A feature release on top of `1.2.0`, backwards compatible.
   repeated multiple times to list all the properties, but only the
   last one was retained. This is fixed now.
 
+### Changes since `1.3.0~beta1`
+
+- The `["CC/pkgconfig", "system_library"]` rule now propagates
+  `ENV` correctly, fixing the build on systems where the default
+  paths pulled in by `env` do not contain `cat`.
+- In case of a build failure, the description of the failing action
+  in the error message is now more verbose, including the environment.
+- Various minor fixes in the documentation.
+
 ## Release `1.3.0~beta1` (2024-05-02)
 
 First beta release for the upcoming `1.3.0` release; see release

diff --git a/doc/concepts/garbage.md b/doc/concepts/garbage.md
@@ -84,3 +84,113 @@ starve local garbage collection. Moreover, it should be noted that the
 actual removal of no-longer-needed files from the file system happens
 without any lock being held. Hence the disturbance of builds caused by
 garbage collection is small.
+
+Compactification as part of garbage collection
+----------------------------------------------
+
+When building locally, all intermediate artifacts end up in the
+local CAS. Depending on the nature of the artifacts built, this
+can include large ones (like disk images) that only differ in small
+parts from the ones created in previous builds for a code base
+with only small changes. In this way, the disk can fill up quite
+quickly. Reclaiming this disk space by the described generation
+rotation would limit how far the cache reaches back. Therefore,
+large blobs are split in a content-defined way; when storing the
+chunks the large blobs are composed of, duplicate blobs have to be
+stored only once.
+
+### Large-objects CAS
+
+Large objects are stored in a separated from of CAS,
+called large-objects CAS. It follows
+the same generation regime as the regular CAS; more precisely,
+next to the `casf`, `casx`, and `cast` two additional
+entries are generated, `cas-large-f` and `cas-large-t`, where the
+latter is only available in native mode (i.e., when trees are hashed
+differently than blobs).
+
+The entries in the large-objects CAS are keyed by hash of the
+large object and the value of an entry is the concatenation of the
+hashes of the chunks the large object is composed of. An entry in
+a large-object CAS promises
+- that the chunks the large object is composed of are in the main
+  CAS, more precisely `casf` in the same generation,
+- the concatenation of the specified chunks indeed gives the
+  requested object,
+- if the object is a tree, the parts are also in the same generation,
+  in main or larger-object CAS, and
+- the object is strictly larger than the maximal size a chunk
+  obtained by splitting can have.
+
+Here, the last promise avoids that chunks of a large object later
+can be replaced by a large-object entry themselves.
+
+### Using objects from the large-objects CAS
+
+Whenever an object is not found in the main CAS, the large-objects
+CAS is inspected. If found there, then, in this order,
+- if the entry is not already in the youngest generation, the chunks
+  are promoted to the youngest generation,
+- the object itself is spliced on disk in a temproary file,
+- if the object is a tree, the parts are promoted to the youngest
+  generation (only necessary if the large-object entry was not
+  found in the youngest generation anyway),
+- the large object is added to the respective main CAS, and finally
+- the large-objects entry is added to the youngest generation (if
+  not already there).
+
+The promoting of the chunks ensures that they are already present
+in the youngest generation at almost no storage cost, as promoting
+is implemented using hard links. Therefore, the overhead of a later
+splitting of that blob is minimal.
+
+### Blob splitting uses large-objects CAS as cache
+
+When `just execute` is asked to split an object, first the
+large-objects CAS is inspected. If the entry is present in the
+youngest generation, the answer is directly served from there;
+if found in an older generation, it is served from there after
+appropriate promotion (chunks; parts of the tree, if the object to
+split was a tree; large-objects entry) to the youngest generation.
+
+When `just execute` actually performs a split and the object that
+was to be splitted was larger than the maximal size of a chunk,
+after having added the chunks to the main CAS, it will also write
+a corresponding entry to the large-objects CAS. In this way,
+subsequent requests for the same object can be served from there
+without having to inspect the object again.
+
+Similarly, if a blob is split in order to be transferred to an
+endpoint that supports blob-splicing, a corresponding entry is
+added to the large-objects CAS (provided the original blob was
+larger than the maximal size of a chunk, but we will transfer by
+blob splice only for those objects anyway).
+
+### Compactification of a CAS generation
+
+During garbage collection, while already holding the exclusive
+garbage-collection lock, the following compactification steps are
+performed on the youngest generation before doing the generation
+rotation.
+- For every entry in the large-objects CAS the corresponding entries
+  in the main CAS are removed (for an entry in `cas-large-f` the
+  entries in both, `casf` and `casx` are removed, as files and
+  executable files are both hashed the same way, as blobs).
+- For every entry in the main CAS that is larger than the
+  compactification threshold, the object is split, the chunks are
+  added to the main CAS, the list of chunks the object is composed
+  of is added to the large-objects CAS, and finally the object is
+  removed from the main CAS.
+It should be noted that these steps do not modify the objects
+that can be obtained from that CAS generation. In particular, all
+invariants are kept.
+
+As compactification threshold we chose a fixed value larger than
+the maximal size a chunk obtained from splitting can have. More
+precisely, we take the maximal value where we still can transfer
+a blob via `grpc` without having to use the streaming interface,
+i.e, we chose 2MB. In this way, we already have correct blobs split
+for transfer to an end point that supports blob splicing.
+
+The compactification step will also be carried out if the `--no-rotate`
+option is given to `gc`.
diff --git a/doc/future-designs/large-blobs.md b/doc/future-designs/large-blobs.md
diff --git a/rules/CC/pkgconfig/EXPRESSIONS b/rules/CC/pkgconfig/EXPRESSIONS
@@ -162,6 +162,7 @@
                   ]
                 }
               ]
+            , "env": {"type": "var", "name": "ENV"}
             , "outs": [{"type": "var", "name": "ldflags-filename"}]
             }
           }

diff --git a/share/man/just.1.md b/share/man/just.1.md
@@ -277,6 +277,17 @@ to the youngest generation; therefore, upon a call to **`gc`** everything
 not referenced since the last call to **`gc`** is purged and the
 corresponding disk space reclaimed.
 
+Additionally, and before doing generation rotation,
+ - left-over temporary directories (e.g., from interrupted `just`
+   invocations) are removed, and
+ - large files are split and only the chunks and the information
+   how to assemble the file from the chunks are kept; in this way
+   disk space is saved without losing information.
+
+As the non-rotating tasks can be useful in their own right, the
+`--no-rotate` option can be used to request only the clean-up tasks
+that do not lose information.
+
 **`execute`**
 -------------
 

diff --git a/src/buildtool/execution_engine/executor/executor.hpp b/src/buildtool/execution_engine/executor/executor.hpp
@@ -524,7 +524,7 @@ class ExecutorImpl {
         }
         progress->TaskTracker().Stop(action->Content().Id());
 
-        PrintInfo(logger, action->Command(), response);
+        PrintInfo(logger, action, response);
         bool should_fail_outputs = false;
         for (auto const& [local_path, node] : action->Dependencies()) {
             should_fail_outputs |= node->Content().Info()->failed;
@@ -573,34 +573,36 @@ class ExecutorImpl {
 
     /// \brief Write out if response is empty and otherwise, write out
     /// standard error/output if they are present
-    void static PrintInfo(Logger const& logger,
-                          std::vector<std::string> const& command,
-                          IExecutionResponse::Ptr const& response) noexcept {
+    void static PrintInfo(
+        Logger const& logger,
+        gsl::not_null<DependencyGraph::ActionNode const*> const& action,
+        IExecutionResponse::Ptr const& response) noexcept {
         if (!response) {
             logger.Emit(LogLevel::Error, "response is empty");
             return;
         }
         auto const has_err = response->HasStdErr();
         auto const has_out = response->HasStdOut();
-        auto build_message =
-            [has_err, has_out, &logger, &command, &response]() {
-                using namespace std::string_literals;
-                auto message = ""s;
-                if (has_err or has_out) {
-                    message += (has_err and has_out ? "Stdout and stderr"s
-                                : has_out           ? "Stdout"s
-                                                    : "Stderr"s) +
-                               " of command: ";
-                }
-                message += nlohmann::json(command).dump() + "\n";
-                if (response->HasStdOut()) {
-                    message += response->StdOut();
-                }
-                if (response->HasStdErr()) {
-                    message += response->StdErr();
-                }
-                return message;
-            };
+        auto build_message = [has_err, has_out, &logger, &action, &response]() {
+            using namespace std::string_literals;
+            auto message = ""s;
+            if (has_err or has_out) {
+                message += (has_err and has_out ? "Stdout and stderr"s
+                            : has_out           ? "Stdout"s
+                                                : "Stderr"s) +
+                           " of command ";
+            }
+            message += nlohmann::json(action->Command()).dump() +
+                       " in environment " +
+                       nlohmann::json(action->Env()).dump() + "\n";
+            if (response->HasStdOut()) {
+                message += response->StdOut();
+            }
+            if (response->HasStdErr()) {
+                message += response->StdErr();
+            }
+            return message;
+        };
         logger.Emit((has_err or has_out) ? LogLevel::Info : LogLevel::Debug,
                     std::move(build_message));
     }
@@ -612,6 +614,8 @@ class ExecutorImpl {
         std::ostringstream msg{};
         msg << "Failed to execute command ";
         msg << nlohmann::json(action->Command()).dump();
+        msg << " in evenironment ";
+        msg << nlohmann::json(action->Env()).dump();
         auto const& origin_map = progress->OriginMap();
         auto origins = origin_map.find(action->Content().Id());
         if (origins != origin_map.end() and !origins->second.empty()) {
-Original file line number
+Diff line change
@@ Expand Up / @@ -162,6 +162,7 @@ @@
                       ]
                     }
                   ]
+                , "env": {"type": "var", "name": "ENV"}
                 , "outs": [{"type": "var", "name": "ldflags-filename"}]
                 }
               }
@@ Expand Down @@