-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explorations: Stream API #851
Conversation
…them into PHP(). Also, explore using file iterators as a basic abstraction for passing the file trees around. Work in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice to see the elimination of the intermediate unpacked folder. This should bring a nice improvement in the latency of installing ZIP files, leading to things finishing quite a bit faster during installs and boot.
entry['uncompressedSize'] = await stream.readUint32(); | ||
entry['fileNameLength'] = await stream.readUint16(); | ||
entry['extraLength'] = await stream.readUint16(); | ||
entry['fileName'] = new TextDecoder().decode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears that Emscripten is the real culprit here, but filenames are not encoded. they are bytes separated by ASCII /
. this will end up breaking filenames and inserting �
in places.
}); | ||
} | ||
|
||
const data = await stream.read(header.compressedSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: what happens if we attempt to read more bytes than are available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the custom Stream
class implementation. Right now we read n
bytes using the native ReadableStreamBYOBReader class which sends a "read request" to the data source. Every data source is free to process it how it wants, but typically it will only the buffer up to Math.min(Buffer.size, numberOfAvailableDataBytes)
.
…end of the central directory
…nfig, and the sqlite-database-integration plugin (#872) ## Description Includes the `wp-config.php` file and the default WordPress theme (like `twentytwentythree`) in the zip file exported via the "Download as .zip" button. Without those files, the exported bundle isn't self-contained. It cannot be hosted, and any the importer needs to provide the missing files on its own. The theme and the plugins are easy to backfill, but the data stored in `wp-config.php` is lost. ## How is the problem addressed? This PR adds a temporary private `selfContained` option to the `importWpContent`. It is `false` by default to ensure those files are not exported with GitHub PRs (the export flow relies on the same logic). The zip download button sets it to `true`. This is a temporary workaround as we currently don't have any better tools to deal with this problem. Once the streaming/iterators API ships in #851, we'll be able to get rid of this hack and just filter the stream of files. ## Testing Instructions Unfortunately, this PR ships no unit tests as without #895 there isn't an easy way to test the `zipWpContent` function. Here's the manual testing steps to take: 1. Open Playground 2. Make a change in the site content 3. Export Playground into a zip file 4. Confirm that zip file contains the `wp-content.php` file as well as the `twentytwentyfour` theme and the `sqlite-database-integration` plugin 5. Refresh the Playground tab and import that zip 6. Confirm it worked and the website is functional and has the content update from before 7. Export it to GitHub, check the "include zip file" checkbox 8. Confirm the GitHub PR has no `twentytwentyfour` theme, or the `wp-config.php` file, or the `sqlite-database-integration` plugin. 9. Do the same for the zip bundled with the GitHub PR 10. Import that PR and confirm it imports cleanly
Adds a new `@php-wasm/node-polyfills` package to polyfill the features missing in Node 18 and/or JSDOM environments. The goal is to make wp-now and other Playground-based Node.js packages work in Node 18, which is the current LTS release. The polyfilled JavaScript features are: * `CustomEvent` class * `File` class * `Blob.text()` and `Blob.arrayBuffer()` methods * `Blob.arrayBuffer()` and `File.text()` methods * `Blob.stream()` and `File.stream()` methods * Ensures `File.stream().getReader({ mode: 'byob' })` is supported – this is relevant for #851 I adapted the Blob methods from https://github.com/bjornstar/blob-polyfill/blob/master/Blob.js as they seemed to provide just the logic needed here and they also worked right away. This PR is a part of #851 split out into a separate PR to make it easier to review and reason about. Supersedes #865 ## Testing instructions Confirm the unit tests pass. This PR ships a set of vite tests to confirm the polyfills work both in vanilla Node.js and in jsdom runtime environments.
Stream Compression introduced in #851 has no dependencies on WordPress and can be used in any JavaScript project. It also makes sense as a dependency for some `@php-wasm` packages. This commit, therefore, moves it from the `wp-playground` to the `php-wasm` npm namespace, making it reusable across the entire project. In addition, this adds a new `iterateFiles` function to the `@php-wasm/universal` package, which allows iterating over the files in the PHP filesystem. It uses the `stream-compression` package, which was some of the motivation for the move. ## Testing instructions Since the package isn't used anywhere yet, only confirm if the CI checks pass.
Stream Compression introduced in #851 has no dependencies on WordPress and can be used in any JavaScript project. It also makes sense as a dependency for some `@php-wasm` packages. This commit, therefore, moves it from the `wp-playground` to the `php-wasm` npm namespace, making it reusable across the entire project. In addition, this adds a new `iterateFiles` function to the `@php-wasm/universal` package, which allows iterating over the files in the PHP filesystem. It uses the `stream-compression` package, which was some of the motivation for the move. ## Testing instructions Since the package isn't used anywhere yet, only confirm if the CI checks pass.
) Stream Compression introduced in #851 has no dependencies on WordPress and can be used in any JavaScript project. It also makes sense as a dependency for some `@php-wasm` packages. This commit, therefore, moves it from the `wp-playground` to the `php-wasm` npm namespace, making it reusable across the entire project. In addition, this adds a new `iterateFiles` function to the `@php-wasm/universal` package, which allows iterating over the files in the PHP filesystem. It uses the `stream-compression` package, which was some of the motivation for the move. This PR also ships eslint rules to keep the `stream-compression` package independent from the heavy `@php-wasm/web` and `@php-wasm/node` packages. This should enable using it in other project with a minimal dependency overhead of just `@php-wasm/util` and `@php-wasm/node-polyfills`. ## Testing instructions Since the package isn't used anywhere yet, only confirm if the CI checks pass.
This small commit brings a part of #851 into trunk for easier review. ## Testing instructions Confirm the CI tests pass
## What is this PR doing? #851 explores migrating file handling in Playground from buffering to JS-native streams, and this PR brings over small, administrative changes from the main branch to unclutter it. This is mostly about updating `package.json` files, updating configs, and imports. ## Testing Instructions Confirm the CI checks pass.
Safari doesn't support BYOB streams which is a blocker here. Perhaps there's a way to polyfill them? I experimented with that at 2b0acd0#diff-888d17d202646c67c560411a8e150946f30fec742ff592a7915d6c8370ab030e |
Surfacing the performance concerns from #919 (comment). Buffering the entire .zip file and passing it to PHP.wasm in one go takes around 300ms on my machine. However, using the browser streams and writing each file as it becomes available takes around 2s where:
There's potentially a lot of room for improvement, but I'm not too keen about tweaking this. Tweaking the stream-based implementation would take a lot of time, and I'm not convinced about the benefits. The lower bound for the execution time seems to be set by the native version of libzip and libz. My intuition is that:
All of this is hand wavy and based on my intuition. I don't have any actual measurements and perhaps I could spend a dozen or more hours here and either prove or disprove those assumptions, but I think there's a more promising avenue to explore instead. I wonder if we could stream all the downloaded bytes directly to WASM memory, and stream-handle them there. JavaScript would become an API layer over WASM operations, much like php
.acceptDataStream( fetch( pluginFileURL ).body )
.unzip()
.writeTo( "/wordpress/wp-content/plugins/gutenberg" ); |
Blueprints v2 make integrating JavaScript ZIP streaming into Blueprint steps unnecessary as the data is processed using PHP streams instead of JavaScript ones. Let's still keep the stream processing package around, though, as it's independent and useful to have. |
common.ts
file into smaller files #898Description
Implements stream-based ZIP encodes and decoder using the CompressionStream and DecompressionStream class.
Here's what we get:
ZipArchive
To that last point:
ZIP as a remote, virtual filesystem
This change enables fast previewing of even 10GB-large zipped exports via partial downloads.
Imagine previewing a large site export with many photos and videos. The
decodeRemoteZip
function knows how to request just the list of files first, filter out the large ones, and then issue multiple fetch() requests to download the rest.Effectively, we would only download ~5MB - 10MB of data for the initial preview, and then only download these larger assets once they're needed.
Technical details
Here's a few interesting functions shipped by this PR. Note the links point to a specific commit and may get outdated:
Remaining work
There's a few more things to do here, but I still wanted to get some reviews in before spending the time on these just in case the API would substantially change:
ReadableStream.tee()
instead of the workaround we use now – once https://bugs.chromium.org/p/chromium/issues/detail?id=1512548 is fixed in chromium.API changes
Breaking changes
This PR isn't a breaking change yet. One of the follow-up PRs will very likely propose some breaking changes, but this one only extends the available API.
Without this PR
Without this PR, unzipping a file requires writing it to Playground, calling PHP's
unzip
, and removing the temporaryzip
file:With this PR
With this PR, unzipping works like this
More examples
Here's what else the streaming API unlocks. Not all of these functions are shipped here, but they are quite easy to implement:
Open questions
How can streams be expressed in Blueprints?
Solving this is out of scope for this PR, but it's one of the next problems to explore so we might as well use this PR as a space to contemplate. I'll move this into a separate issue once we solidify the shape of the API here and ship this PR.
URI notation
Perhaps the Blueprint steps could accept a
protocol://URI
notation? This is the option I'm leaning towards, unless a better alternative comes up.Upsides:
git+https://git@...
Downsides:
Object-based DSL
We can have unlimited flexibility with a custom DSL like below
Upsides:
Downsides:
Piping DSL
How about we mimic the JavaScript API using the JSON notation?
Upsides:
Downsides:
cc @dmsnell