Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: implement flushCompileCache() #54971

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions doc/api/module.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,13 @@ Compilation cache generated by one version of Node.js can not be reused by a dif
version of Node.js. Cache generated by different versions of Node.js will be stored
separately if the same base directory is used to persist the cache, so they can co-exist.

At the moment, when the compile cache is enabled and a module is loaded afresh, the
code cache is generated from the compiled code immediately, but will only be written
to disk when the Node.js instance is about to exit. This is subject to change. The
[`module.flushCompileCache()`][] method can be used to ensure the accumulated code cache
is flushed to disk in case the application wants to spawn other Node.js instances
and let them share the cache long before the parent exits.

### `module.getCompileCacheDir()`

<!-- YAML
Expand Down Expand Up @@ -1101,6 +1108,21 @@ added:
`path` is the resolved path for the file for which a corresponding source map
should be fetched.

### `module.flushCompileCache()`

<!-- YAML
added:
- REPLACEME
-->

> Stability: 1.1 - Active Development

Flush the [module compile cache][] accumulated from modules already loaded
in the current Node.js instance to disk. This returns after all the flushing
file system operations come to an end, no matter they succeed or not. If there
are any errors, this will fail silently, since compile cache misses should not
interfer with the actual operation of the application.

### Class: `module.SourceMap`

<!-- YAML
Expand Down Expand Up @@ -1216,6 +1238,7 @@ returned object contains the following keys:
[`initialize`]: #initialize
[`module.constants.compileCacheStatus`]: #moduleconstantscompilecachestatus
[`module.enableCompileCache()`]: #moduleenablecompilecachecachedir
[`module.flushCompileCache()`]: #moduleflushcompilecache
[`module.getCompileCacheDir()`]: #modulegetcompilecachedir
[`module`]: #the-module-object
[`os.tmpdir()`]: os.md#ostmpdir
Expand Down
2 changes: 2 additions & 0 deletions lib/internal/modules/helpers.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ const {
enableCompileCache: _enableCompileCache,
getCompileCacheDir: _getCompileCacheDir,
compileCacheStatus: _compileCacheStatus,
flushCompileCache,
} = internalBinding('modules');

let debug = require('internal/util/debuglog').debuglog('module', (fn) => {
Expand Down Expand Up @@ -485,6 +486,7 @@ module.exports = {
assertBufferSource,
constants,
enableCompileCache,
flushCompileCache,
getBuiltinModule,
getCjsConditions,
getCompileCacheDir,
Expand Down
3 changes: 3 additions & 0 deletions lib/module.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ const { SourceMap } = require('internal/source_map/source_map');
const {
constants,
enableCompileCache,
flushCompileCache,
getCompileCacheDir,
} = require('internal/modules/helpers');

Expand All @@ -15,5 +16,7 @@ Module.register = register;
Module.SourceMap = SourceMap;
Module.constants = constants;
Module.enableCompileCache = enableCompileCache;
Module.flushCompileCache = flushCompileCache;

Module.getCompileCacheDir = getCompileCacheDir;
module.exports = Module;
122 changes: 103 additions & 19 deletions src/compile_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,8 @@ CompileCacheEntry* CompileCacheHandler::GetOrInsert(
return loaded->second.get();
}

// If the code hash mismatches, the code has changed, discard the stale entry
// and create a new one.
auto emplaced =
compiler_cache_store_.emplace(key, std::make_unique<CompileCacheEntry>());
auto* result = emplaced.first->second.get();
Expand Down Expand Up @@ -283,23 +285,33 @@ void CompileCacheHandler::MaybeSave(CompileCacheEntry* entry,
MaybeSaveImpl(entry, func, rejected);
}

// Layout of a cache file:
// [uint32_t] magic number
// [uint32_t] code size
// [uint32_t] code hash
// [uint32_t] cache size
// [uint32_t] cache hash
// .... compile cache content ....
/**
* Persist the compile cache accumulated in memory to disk.
*
* To avoid race conditions, the cache file includes hashes of the original
* source code and the cache content. It's first written to a temporary file
* before being renamed to the target name.
*
* Layout of a cache file:
* [uint32_t] magic number
* [uint32_t] code size
* [uint32_t] code hash
* [uint32_t] cache size
* [uint32_t] cache hash
* .... compile cache content ....
*/
void CompileCacheHandler::Persist() {
DCHECK(!compile_cache_dir_.empty());

// NOTE(joyeecheung): in most circumstances the code caching reading
// writing logic is lenient enough that it's fine even if someone
// overwrites the cache (that leads to either size or hash mismatch
// in subsequent loads and the overwritten cache will be ignored).
// Also in most use cases users should not change the files on disk
// too rapidly. Therefore locking is not currently implemented to
// avoid the cost.
// TODO(joyeecheung): do this using a separate event loop to utilize the
// libuv thread pool and do the file system operations concurrently.
// TODO(joyeecheung): Currently flushing is triggered by either process
// shutdown or user requests. In the future we should simply start the
// writes right after module loading on a separate thread, and this method
// only blocks until all the pending writes (if any) on the other thread are
// finished. In that case, the off-thread writes should finish long
// before any attempt of flushing is made so the method would then only
// incur a negligible overhead from thread synchronization.
for (auto& pair : compiler_cache_store_) {
auto* entry = pair.second.get();
if (entry->cache == nullptr) {
Expand All @@ -312,6 +324,11 @@ void CompileCacheHandler::Persist() {
entry->source_filename);
continue;
}
if (entry->persisted == true) {
Debug("[compile cache] skip %s because cache was already persisted\n",
entry->source_filename);
continue;
}

DCHECK_EQ(entry->cache->buffer_policy,
v8::ScriptCompiler::CachedData::BufferOwned);
Expand All @@ -328,27 +345,94 @@ void CompileCacheHandler::Persist() {
headers[kCodeHashOffset] = entry->code_hash;
headers[kCacheHashOffset] = cache_hash;

Debug("[compile cache] writing cache for %s in %s [%d %d %d %d %d]...",
// Generate the temporary filename.
// The temporary file should be placed in a location like:
//
// $NODE_COMPILE_CACHE_DIR/v23.0.0-pre-arm64-5fad6d45-501/e7f8ef7f.cache.tcqrsK
//
// 1. $NODE_COMPILE_CACHE_DIR either comes from the $NODE_COMPILE_CACHE
// environment
// variable or `module.enableCompileCache()`.
// 2. v23.0.0-pre-arm64-5fad6d45-501 is the sub cache directory and
// e7f8ef7f is the hash for the cache (see
// CompileCacheHandler::Enable()),
// 3. tcqrsK is generated by uv_fs_mkstemp() as a temporary indentifier.
uv_fs_t mkstemp_req;
auto cleanup_mkstemp =
OnScopeLeave([&mkstemp_req]() { uv_fs_req_cleanup(&mkstemp_req); });
std::string cache_filename_tmp = entry->cache_filename + ".XXXXXX";
Debug("[compile cache] Creating temporary file for cache of %s...",
entry->source_filename);
int err = uv_fs_mkstemp(
nullptr, &mkstemp_req, cache_filename_tmp.c_str(), nullptr);
if (err < 0) {
Debug("failed. %s\n", uv_strerror(err));
continue;
}
Debug(" -> %s\n", mkstemp_req.path);
Debug("[compile cache] writing cache for %s to temporary file %s [%d %d %d "
"%d %d]...",
entry->source_filename,
entry->cache_filename,
mkstemp_req.path,
headers[kMagicNumberOffset],
headers[kCodeSizeOffset],
headers[kCacheSizeOffset],
headers[kCodeHashOffset],
headers[kCacheHashOffset]);

// Write to the temporary file.
uv_buf_t headers_buf = uv_buf_init(reinterpret_cast<char*>(headers.data()),
headers.size() * sizeof(uint32_t));
uv_buf_t data_buf = uv_buf_init(cache_ptr, entry->cache->length);
uv_buf_t bufs[] = {headers_buf, data_buf};

int err = WriteFileSync(entry->cache_filename.c_str(), bufs, 2);
uv_fs_t write_req;
auto cleanup_write =
OnScopeLeave([&write_req]() { uv_fs_req_cleanup(&write_req); });
err = uv_fs_write(
anonrig marked this conversation as resolved.
Show resolved Hide resolved
nullptr, &write_req, mkstemp_req.result, bufs, 2, 0, nullptr);
if (err < 0) {
Debug("failed: %s\n", uv_strerror(err));
continue;
}

uv_fs_t close_req;
auto cleanup_close =
OnScopeLeave([&close_req]() { uv_fs_req_cleanup(&close_req); });
err = uv_fs_close(nullptr, &close_req, mkstemp_req.result, nullptr);

if (err < 0) {
Debug("failed: %s\n", uv_strerror(err));
continue;
}

Debug("success\n");

// Rename the temporary file to the actual cache file.
uv_fs_t rename_req;
auto cleanup_rename =
OnScopeLeave([&rename_req]() { uv_fs_req_cleanup(&rename_req); });
std::string cache_filename_final = entry->cache_filename;
Debug("[compile cache] Renaming %s to %s...",
mkstemp_req.path,
cache_filename_final);
err = uv_fs_rename(nullptr,
&rename_req,
mkstemp_req.path,
cache_filename_final.c_str(),
nullptr);
if (err < 0) {
Debug("failed: %s\n", uv_strerror(err));
} else {
Debug("success\n");
continue;
}
Debug("success\n");
entry->persisted = true;
}

// Clear the map at the end in one go instead of during the iteration to
// avoid rehashing costs.
Debug("[compile cache] Clear deserialized cache.\n");
compiler_cache_store_.clear();
}

CompileCacheHandler::CompileCacheHandler(Environment* env)
Expand Down
2 changes: 2 additions & 0 deletions src/compile_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ struct CompileCacheEntry {
std::string source_filename;
CachedCodeType type;
bool refreshed = false;
bool persisted = false;

// Copy the cache into a new store for V8 to consume. Caller takes
// ownership.
v8::ScriptCompiler::CachedData* CopyCache() const;
Expand Down
23 changes: 14 additions & 9 deletions src/env.cc
Original file line number Diff line number Diff line change
Expand Up @@ -847,14 +847,12 @@ Environment::Environment(IsolateData* isolate_data,
}
}

// We are supposed to call builtin_loader_.SetEagerCompile() in
// snapshot mode here because it's beneficial to compile built-ins
// loaded in the snapshot eagerly and include the code of inner functions
// that are likely to be used by user since they are part of the core
// startup. But this requires us to start the coverage collections
// before Environment/Context creation which is not currently possible.
// TODO(joyeecheung): refactor V8ProfilerConnection classes to parse
// JSON without v8 and lift this restriction.
// Compile builtins eagerly when building the snapshot so that inner functions
// of essential builtins that are loaded in the snapshot can have faster first
// invocation.
if (isolate_data->is_building_snapshot()) {
builtin_loader()->SetEagerCompile();
Copy link
Member Author

@joyeecheung joyeecheung Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er, I meant to open a separate PR about this (this was why I opened #54633 which removed a coverage regression that made me not include this and left the TODO above in #51672), somehow this ended up in the checkout of this PR due to local git mess up 😅 but it was approved and landed anyway..

Copy link
Member Author

@joyeecheung joyeecheung Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: the short summary of this small diff that accidentally get landed in this unrelated PR is that it speeds up core startup by not having to compile any inner functions in essential builtins. Previously, only the top-level functions were cached in the snapshot, so during bootstrap, when the inner functions were executed to do the loading, they still needed to be compiled without cache. This diff changed to cache the inner functions too, which adds ~630KB to the binary, in exchange of a slightly faster core startup (and also speeding up the first invocation of most functions provided by the essential builtins).

❯ hyperfine "./node_pr ./test/fixtures/semicolon.js" "./node_main ./test/fixtures/semicolon.js"
Benchmark 1: ./node_pr ./test/fixtures/semicolon.js
  Time (mean ± σ):      34.2 ms ±   1.5 ms    [User: 30.1 ms, System: 2.5 ms]
  Range (min … max):    33.1 ms …  43.7 ms    80 runs

Benchmark 2: ./node_main ./test/fixtures/semicolon.js
  Time (mean ± σ):      35.1 ms ±   1.2 ms    [User: 31.0 ms, System: 2.5 ms]
  Range (min … max):    34.1 ms …  42.6 ms    77 runs

Summary
  './node_pr ./test/fixtures/semicolon.js' ran
    1.03 ± 0.06 times faster than './node_main ./test/fixtures/semicolon.js'

}

// We'll be creating new objects so make sure we've entered the context.
HandleScope handle_scope(isolate);
Expand Down Expand Up @@ -1143,7 +1141,7 @@ CompileCacheEnableResult Environment::EnableCompileCache(
compile_cache_handler_ = std::move(handler);
AtExit(
[](void* env) {
static_cast<Environment*>(env)->compile_cache_handler()->Persist();
static_cast<Environment*>(env)->FlushCompileCache();
},
this);
}
Expand All @@ -1160,6 +1158,13 @@ CompileCacheEnableResult Environment::EnableCompileCache(
return result;
}

void Environment::FlushCompileCache() {
if (!compile_cache_handler_ || compile_cache_handler_->cache_dir().empty()) {
return;
}
compile_cache_handler_->Persist();
}

void Environment::ExitEnv(StopFlags::Flags flags) {
// Should not access non-thread-safe methods here.
set_stopping(true);
Expand Down
1 change: 1 addition & 0 deletions src/env.h
Original file line number Diff line number Diff line change
Expand Up @@ -1041,6 +1041,7 @@ class Environment final : public MemoryRetainer {
// Enable built-in compile cache if it has not yet been enabled.
// The cache will be persisted to disk on exit.
CompileCacheEnableResult EnableCompileCache(const std::string& cache_dir);
void FlushCompileCache();

void RunAndClearNativeImmediates(bool only_refed = false);
void RunAndClearInterrupts();
Expand Down
26 changes: 25 additions & 1 deletion src/node_modules.cc
Original file line number Diff line number Diff line change
Expand Up @@ -435,11 +435,33 @@ void BindingData::GetPackageScopeConfig(
.ToLocalChecked());
}

void FlushCompileCache(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
Local<Context> context = isolate->GetCurrentContext();
Environment* env = Environment::GetCurrent(context);

if (!args[0]->IsBoolean() && !args[0]->IsUndefined()) {
THROW_ERR_INVALID_ARG_TYPE(env,
"keepDeserializedCache should be a boolean");
return;
}
Debug(env,
DebugCategory::COMPILE_CACHE,
"[compile cache] module.flushCompileCache() requested.\n");
env->FlushCompileCache();
Debug(env,
DebugCategory::COMPILE_CACHE,
"[compile cache] module.flushCompileCache() finished.\n");
}

void EnableCompileCache(const FunctionCallbackInfo<Value>& args) {
CHECK(args[0]->IsString());
Isolate* isolate = args.GetIsolate();
Local<Context> context = isolate->GetCurrentContext();
Environment* env = Environment::GetCurrent(context);
if (!args[0]->IsString()) {
THROW_ERR_INVALID_ARG_TYPE(env, "cacheDir should be a string");
return;
}
Utf8Value value(isolate, args[0]);
CompileCacheEnableResult result = env->EnableCompileCache(*value);
std::vector<Local<Value>> values = {
Expand Down Expand Up @@ -477,6 +499,7 @@ void BindingData::CreatePerIsolateProperties(IsolateData* isolate_data,
SetMethod(isolate, target, "getPackageScopeConfig", GetPackageScopeConfig);
SetMethod(isolate, target, "enableCompileCache", EnableCompileCache);
SetMethod(isolate, target, "getCompileCacheDir", GetCompileCacheDir);
SetMethod(isolate, target, "flushCompileCache", FlushCompileCache);
}

void BindingData::CreatePerContextProperties(Local<Object> target,
Expand Down Expand Up @@ -509,6 +532,7 @@ void BindingData::RegisterExternalReferences(
registry->Register(GetPackageScopeConfig);
registry->Register(EnableCompileCache);
registry->Register(GetCompileCacheDir);
registry->Register(FlushCompileCache);
}

} // namespace modules
Expand Down
21 changes: 21 additions & 0 deletions test/fixtures/compile-cache-flush.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
'use strict';

const { flushCompileCache, getCompileCacheDir } = require('module');
const { spawnSync } = require('child_process');
const assert = require('assert');

if (process.argv[2] !== 'child') {
// The test should be run with the compile cache already enabled and NODE_DEBUG_NATIVE=COMPILE_CACHE.
assert(getCompileCacheDir());
assert(process.env.NODE_DEBUG_NATIVE.includes('COMPILE_CACHE'));

flushCompileCache();

const child1 = spawnSync(process.execPath, [__filename, 'child']);
console.log(child1.stderr.toString().trim().split('\n').map(line => `[child1]${line}`).join('\n'));

flushCompileCache();

const child2 = spawnSync(process.execPath, [__filename, 'child']);
console.log(child2.stderr.toString().trim().split('\n').map(line => `[child2]${line}`).join('\n'));
}
11 changes: 11 additions & 0 deletions test/parallel/test-compile-cache-api-error.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
'use strict';

// This tests module.enableCompileCache() throws when an invalid argument is passed.

require('../common');
const { enableCompileCache } = require('module');
const assert = require('assert');

for (const invalid of [0, null, false, () => {}, {}, []]) {
assert.throws(() => enableCompileCache(invalid), { code: 'ERR_INVALID_ARG_TYPE' });
}
Loading
Loading