Skip to content

Commit

Permalink
Add support for mounting .tgz files with filesystem metadata append…
Browse files Browse the repository at this point in the history
…ed (#477)

* Bump dev version

* Correct VFS image file extension in lzfs.R script

* Remove `.data` assumption in `webr::mount()`

* Add support for v2.0 VFS filesystem image format

* Avoid Emscripten WORKERFS `mount()` under Node

Instead, use our own `mountImageData()` function to create VFS nodes for
each file in the VFS metadata package.

TODO: This currently handles only metadata given in the form of the
`packages` property. Emscripten supports additional `files` and `blobs`
properties, and in the future we should also support those here.

Fixes #328.

* Update webr::mount() to default to v2.0 VFS images

* Update documentation for VFS v2.0

* Update NEWS.md

* Reorganise VFS mounting into TS module `mount.ts`

* Mount as URL under Node if source begins http[s]

* Add unit tests for mounting WORKERFS and NODEFS

* Export types from webr-chan.ts

* Fallback to mounting `.data` before using archive

Also improves warning messaging during fallback(s).

* Interpret metadata values as signed integers

* Read metadata from tar contents if hint is missing

* Add unit test for .tgz with no metadata hint
  • Loading branch information
georgestagg authored Sep 11, 2024
1 parent d455321 commit 4655e96
Show file tree
Hide file tree
Showing 21 changed files with 455 additions and 163 deletions.
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# webR (development version)

## New features

* Added support for directly mounting (optionally compressed) `.tar` archives as filesystem images. Archives must be pre-processed using the `rwasm` R package to append filesystem image metadata to `.tar` archive data.

## Breaking changes

* When installing binary R packages, webR will now default to mounting the R package binary `.tgz` file as a filesystem image. If this fails (e.g. the `.tgz` has not been processed to add filesystem image metadata) webR will fall back to a traditional install by extracting the contents of the `.tgz` file.

## Bug Fixes

* Mounting filesystem images using the `WORKERFS` filesystem type now works correctly under Node.js (#328).

# webR 0.4.1

## New features
Expand Down
2 changes: 1 addition & 1 deletion flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# cd src; prefetch-npm-deps package-lock.json
srcNpmDeps = pkgs.fetchNpmDeps {
src = "${self}/src";
hash = "sha256-bENxHgVxA2G31l7NR66braWIEwybDe2qAf12x3V5JUY=";
hash = "sha256-KjG55UsbDIMxc5lRzSpqmmfc/tGKOwxXD6Gb+3lVLYU=";
};

inherit system;
Expand Down
4 changes: 2 additions & 2 deletions packages/webr/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: webr
Title: WebR Support Package
Version: 0.4.1
Version: 0.4.1.9000
Authors@R: c(
person("George", "Stagg", , "[email protected]", role = c("aut", "cre")),
person("Lionel", "Henry", , "[email protected]", role = "aut"),
Expand All @@ -17,4 +17,4 @@ Imports:
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
34 changes: 18 additions & 16 deletions packages/webr/R/install.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,21 +67,23 @@ install <- function(packages,
if (!quiet) message(paste("Downloading webR package:", pkg))

if (mount) {
# Try package.data URL, fallback to .tgz download if unavailable
tryCatch(
{
install_vfs_image(repo, lib, pkg, pkg_ver)
},
error = function(cnd) {
if (!grepl("Unable to download", conditionMessage(cnd))) {
stop(cnd)
}
install_tgz(repo, lib, pkg, pkg_ver)
}
)
} else {
install_tgz(repo, lib, pkg, pkg_ver)
# Try mounting `.tgz` as v2.0 image, fallback to `.data` v1.0 image
tryCatch({
install_vfs_image(repo, lib, pkg, pkg_ver, ".tgz")
next
}, error = function(cnd) {
warning(paste(cnd$message, "Falling back to `.data` filesystem image."))
})

tryCatch({
install_vfs_image(repo, lib, pkg, pkg_ver, ".data")
next
}, error = function(cnd) {
warning(paste(cnd$message, "Falling back to copying archive contents."))
})
}

install_tgz(repo, lib, pkg, pkg_ver)
}
invisible(NULL)
}
Expand All @@ -100,8 +102,8 @@ install_tgz <- function(repo, lib, pkg, pkg_ver) {
)
}

install_vfs_image <- function(repo, lib, pkg, pkg_ver) {
data_url <- file.path(repo, paste0(pkg, "_", pkg_ver, ".data"))
install_vfs_image <- function(repo, lib, pkg, pkg_ver, ext) {
data_url <- file.path(repo, paste0(pkg, "_", pkg_ver, ext))
mountpoint <- file.path(lib, pkg)
mount(mountpoint, data_url)
}
14 changes: 8 additions & 6 deletions packages/webr/R/mount.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@
#' directory in the virtual filesystem. The mountpoint will be created if it
#' does not already exist.
#'
#' When mounting an Emscripten "workerfs" type filesystem the `source` should
#' be the URL for a filesystem image with filename ending `.data`, as produced
#' by Emscripten's `file_packager` tool. The filesystem image and metadata will
#' be downloaded and mounted onto the directory `mountpoint`.
#' When mounting an Emscripten "workerfs" type filesystem the `source` should be
#' the URL or path to a filesystem image, as produced by Emscripten's
#' `file_packager` tool or as the result of appending filesystem metadata to an
#' `.tar` archive using [rwasm::add_tar_index()]. The filesystem image may be
#' gzip compressed, indicated by the property `gzip: true` in the associated
#' filesystem metadata. The filesystem metadata and contents will be loaded and
#' mounted onto the directory `mountpoint`.
#'
#' When mounting an Emscripten "nodefs" type filesystem, the `source` should be
#' the path to a physical directory on the host filesystem. The host directory
Expand Down Expand Up @@ -37,8 +40,7 @@ mount <- function(mountpoint, source, type = "workerfs") {

# Mount specified Emscripten filesystem type onto the given mountpoint
if (tolower(type) == "workerfs") {
base_url <- gsub(".data$", "", source)
invisible(.Call(ffi_mount_workerfs, base_url, mountpoint))
invisible(.Call(ffi_mount_workerfs, source, mountpoint))
} else if (tolower(type) == "nodefs") {
invisible(.Call(ffi_mount_nodefs, source, mountpoint))
} else if (tolower(type) == "idbfs") {
Expand Down
11 changes: 7 additions & 4 deletions packages/webr/man/mount.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions packages/webr/src/mount.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,14 @@ SEXP ffi_mount_workerfs(SEXP source, SEXP mountpoint) {
CHECK_STRING(mountpoint);

EM_ASM({
const baseUrl = UTF8ToString($0);
const source = UTF8ToString($0);
const mountpoint = UTF8ToString($1);
try {
Module.mountImageUrl(`${baseUrl}.data`, mountpoint);
if (ENVIRONMENT_IS_NODE && !/^https?:/.test(source)) {
Module.mountImagePath(source, mountpoint);
} else {
Module.mountImageUrl(source, mountpoint);
}
} catch (e) {
let msg = e.message;
if (e.name === "ErrnoError" && e.errno === 10) {
Expand Down
38 changes: 25 additions & 13 deletions src/docs/mounting.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,19 @@ Emscripten's API allows for several types of virtual filesystem, depending on th

| Filesystem | Description | Web Browser | Node.js |
|------|-----|------|------|
| `WORKERFS` | Mount Emscripten filesystem images. | &#x2705; | &#x2705;[^workerfs] |
| `WORKERFS` | Mount Emscripten filesystem images. | &#x2705; | &#x2705; |
| `NODEFS` | Mount existing host directories. | &#x274C; | &#x2705; |
| `IDBFS` | Browser-based persistent storage using the [IndexedDB API](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). | &#x2705;[^idbfs] | &#x274C; |

[^workerfs]: Be aware of the current GitHub issue [#328](https://github.com/r-wasm/webr/issues/328).
[^idbfs]: Using the `PostMessage` [communication channel](communication.qmd) only.

## Emscripten filesystem images
## Filesystem images

Emscripten filesystem images can be mounted using the `WORKERFS` filesystem type.
Filesystem images are pre-prepared files containing a collection of files and associated metadata. The `WORKERFS` filesystem type can be used to efficiently make the contents of a filesystem image available to the WebAssembly R process.

The [`file_packager`](https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool) tool, provided by Emscripten, takes in a directory structure as input and produces webR compatible filesystem images as output. The [`file_packager`](https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool) tool may be invoked from R using the [rwasm](https://r-wasm.github.io/rwasm/) R package:
### Emscripten's `file_packager` tool

The [`file_packager`](https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool) tool, provided by Emscripten, takes in a directory structure as input and produces a webR compatible filesystem image as output. The [`file_packager`](https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool) tool may be invoked from R using the [rwasm](https://r-wasm.github.io/rwasm/) R package:

```{r eval=FALSE}
> rwasm::file_packager("./input", out_dir = ".", out_name = "output")
Expand All @@ -40,12 +41,25 @@ $ file_packager output.data --preload ./input@/ \

In the above examples, the files in the directory `./input` are packaged and an output filesystem image is created^[When using the `file_packager` CLI, a third file named `output.js` will also be created. If you only plan to mount the image using webR, this file may be discarded.] consisting of a data file, `output.data`, and a metadata file, `output.js.metadata`.

To prepare for mounting the filesystem image with webR, ensure that both files have the same basename (in this example, `output`) and are deployed to static file hosting^[e.g. GitHub Pages, Netlify, AWS S3, etc.]. The resulting URLs for the two files should differ only by the file extension.
To prepare for mounting the filesystem image with webR, ensure that both files have the same basename (in this example, `output`). The resulting URLs or relative paths for the two files should differ only by the file extension.

#### Compression

Filesystem image `.data` files may optionally be `gzip` compressed prior to deployment. The file extension for compressed filesystem images should be `.data.gz`, and compression should be indicated by setting the property `gzip: true` on the metadata JSON stored in the `.js.metadata` file.

## Mount a filesystem image from URL
### Process archives with the `rwasm` package

By default, the [`webr::mount()`](api/r.qmd#mount) function downloads and mounts a filesystem image from a URL source, using the `WORKERFS` filesystem type.
Archives in `.tar` format, optionally gzip compressed as `.tar.gz` or `.tgz` files, can also be used as filesystem images by pre-processing the `.tar` archive using the [rwasm](https://r-wasm.github.io/rwasm/) R package. The `rwasm::add_tar_index()` function reads the archive contents and appends the required filesystem metadata to the end of the `.tar` archive data in a way that is understood by webR.

```{r eval=FALSE}
> rwasm::add_tar_index("./path/to/archive.tar.gz")
```

Once processed by the `rwasm` R package, the archive can be deployed and used directly as a filesystem image.

## Mounting a filesystem image

When running in a web browser, the [`webr::mount()`](api/r.qmd#mount) function downloads and mounts a filesystem image from a URL source, using the `WORKERFS` filesystem type.

```{r eval=FALSE}
webr::mount(
Expand All @@ -54,17 +68,15 @@ webr::mount(
)
```

A URL for the filesystem image `.data` file should be provided as the source argument, and the image will be mounted in the virtual filesystem under the path given by the `mountpoint` argument. If the `mountpoint` directory does not exist, it will be created prior to mounting.

### Compression
Filesystem images should be deployed to static file hosting^[e.g. GitHub Pages, Netlify, AWS S3, etc.] and the resulting URL provided as the source argument. The image will be mounted in the virtual filesystem under the path given by the `mountpoint` argument. If the `mountpoint` directory does not exist, it will be created prior to mounting.

Filesystem image `.data` files may optionally be `gzip` compressed prior to deployment. The file extension for compressed filesystem images should be `.data.gz`, and compression should be indicated by setting the property `gzip: true` on the metadata JSON stored in the `.js.metadata` file.
When running under Node.js, the source may also be provided as a relative path to a filesystem image on disk.

### JavaScript API

WebR's JavaScript API includes the [`WebR.FS.mount()`](api/js/classes/WebR.WebR.md#fs) function, a thin wrapper around Emscripten's own [`FS.mount()`](https://emscripten.org/docs/api_reference/Filesystem-API.html#FS.mount). The JavaScript API provides more flexibility but requires a little more set up, including creating the `mountpoint` directory if it does not already exist.

The filesystem type should be provided as a `string`, with the `options` argument a JavaScript object of type [`FSMountOptions`](api/js/modules/WebR.md#fsmountoptions). The filesystem image data should be provided as a JavaScript `Blob` and the metadata as a JavaScript object deserialised from the underlying JSON content.
The filesystem type should be provided as a `string`, with the `options` argument of type [`FSMountOptions`](api/js/modules/WebR.md#fsmountoptions). The filesystem image data should be provided either as a JavaScript `Blob` object or an `ArrayBuffer`-like object, and the metadata provided as a JavaScript object that has been deserialised from the underlying JSON content.

::: {.panel-tabset}
## JavaScript
Expand Down
12 changes: 6 additions & 6 deletions src/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "webr",
"version": "0.4.1",
"version": "0.4.2-dev",
"description": "The statistical programming language R compiled into WASM for use in a web browser and node.",
"keywords": [
"webR",
Expand Down
6 changes: 6 additions & 0 deletions src/tests/webR/data/test_image.data
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
a, b, c
9, 8, 7
4, 5, 6
x, y, z
1, 2, 3
7, 8, 9
1 change: 1 addition & 0 deletions src/tests/webR/data/test_image.js.metadata
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"files":[{"filename":"/abc/bar.csv","start":0,"end":24},{"filename":"/abc/foo.csv","start":24,"end":48}],"remote_package_size":48}
Binary file added src/tests/webR/data/test_image.tar.gz
Binary file not shown.
Binary file added src/tests/webR/data/test_image_no_hint.tgz
Binary file not shown.
3 changes: 3 additions & 0 deletions src/tests/webR/data/testing/foo.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
x, y, z
1, 2, 3
7, 8, 9
101 changes: 101 additions & 0 deletions src/tests/webR/mount.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import { FSMetaData, WebR } from '../../webR/webr-main';
import fs from 'fs';

const webR = new WebR({
baseUrl: '../dist/',
RArgs: ['--quiet'],
});

beforeAll(async () => {
await webR.init();
await webR.evalRVoid('dir.create("/mnt")');
});

async function cleanupMnt() {
try {
await webR.FS.unmount("/mnt");
} catch (e) {
const err = e as Error;
if (err.message !== "FS error") throw err;
}
}

describe('Mount filesystem using R API', () => {
test('Mount v1.0 filesystem image', async () => {
await expect(webR.evalRVoid(
'webr::mount("/mnt", "tests/webR/data/test_image.data", "workerfs")'
)).resolves.not.toThrow();
expect(await webR.evalRString("list.files('/mnt/abc')[2]")).toEqual("foo.csv");
expect(await webR.evalRString("readLines('/mnt/abc/bar.csv')[1]")).toEqual("a, b, c");
await cleanupMnt();
});

test('Mount v2.0 filesystem image', async () => {
await expect(webR.evalRVoid(
'webr::mount("/mnt", "tests/webR/data/test_image.tar.gz", "workerfs")'
)).resolves.not.toThrow();
expect(await webR.evalRString("list.files('/mnt/abc')[2]")).toEqual("foo.csv");
expect(await webR.evalRString("readLines('/mnt/abc/bar.csv')[1]")).toEqual("a, b, c");
await cleanupMnt();
});

test('Mount v2.0 filesystem image - no metadata hint', async () => {
await expect(webR.evalRVoid(
'webr::mount("/mnt", "tests/webR/data/test_image_no_hint.tgz", "workerfs")'
)).resolves.not.toThrow();
expect(await webR.evalRString("list.files('/mnt/abc')[2]")).toEqual("foo.csv");
expect(await webR.evalRString("readLines('/mnt/abc/bar.csv')[1]")).toEqual("a, b, c");
await cleanupMnt();
});

test('Mount filesystem image from URL', async () => {
const url = "https://repo.r-wasm.org/bin/emscripten/contrib/4.4/cli_3.6.3.js.metadata";
await expect(webR.evalRVoid(`
webr::mount("/mnt", "${url}", "workerfs")
`)).resolves.not.toThrow();
expect(await webR.evalRString("readLines('/mnt/DESCRIPTION')[1]")).toEqual("Package: cli");
await cleanupMnt();
});

test('Mount NODEFS filesystem type', async () => {
await expect(webR.evalRVoid(`
webr::mount("/mnt", "tests/webR/data/testing", "nodefs")
`)).resolves.not.toThrow();
expect(await webR.evalRString("readLines('/mnt/foo.csv')[2]")).toEqual("1, 2, 3");
await cleanupMnt();
});
});

describe('Mount filesystem using JS API', () => {
test('Mount filesystem image using Buffer', async () => {
const data = fs.readFileSync("tests/webR/data/test_image.data");
const buf = fs.readFileSync("tests/webR/data/test_image.js.metadata");
const metadata = JSON.parse(new TextDecoder().decode(buf)) as FSMetaData;
await expect(
webR.FS.mount("WORKERFS", { packages: [{ blob: data, metadata: metadata }] }, '/mnt')
).resolves.not.toThrow();
expect(await webR.evalRString("list.files('/mnt/abc')[2]")).toEqual("foo.csv");
expect(await webR.evalRString("readLines('/mnt/abc/bar.csv')[1]")).toEqual("a, b, c");
await cleanupMnt();
});

test('Mount filesystem image using Blob', async () => {
const data = new Blob([fs.readFileSync("tests/webR/data/test_image.data")]);
const buf = fs.readFileSync("tests/webR/data/test_image.js.metadata");
const metadata = JSON.parse(new TextDecoder().decode(buf)) as FSMetaData;
await expect(
webR.FS.mount("WORKERFS", { packages: [{ blob: data, metadata: metadata }] }, '/mnt')
).resolves.not.toThrow();
expect(await webR.evalRString("list.files('/mnt/abc')[2]")).toEqual("foo.csv");
expect(await webR.evalRString("readLines('/mnt/abc/bar.csv')[1]")).toEqual("a, b, c");
await cleanupMnt();
});

test('Mount NODEFS filesystem type', async () => {
await expect(
webR.FS.mount("NODEFS", { root: 'tests/webR/data/testing' }, '/mnt')
).resolves.not.toThrow();
expect(await webR.evalRString("readLines('/mnt/foo.csv')[2]")).toEqual("1, 2, 3");
await cleanupMnt();
});
});
Loading

0 comments on commit 4655e96

Please sign in to comment.