Skip to content

Commit

Permalink
Rewrite AsyncHttp\Client for cleaner API and Transfer-Encoding support (
Browse files Browse the repository at this point in the history
#113)

Refactors the `AsyncHttp\Client` to simplify the usage and the internal
implementation. This will be helpful for [rewriting URLs in WordPress
posts and downloading the related
assets](WordPress/data-liberation#74).

## Changes

* Handle errors at each step of the HTTP request lifecycle.
* Drops support for PHP 7.0 and 7.1 since WordPress is dropping that
support, too.
* Provide `await_next_event()` as a single, filterable interface for
consuming all the HTTP activity. Remove the `onProgress` callback and
various other ways of waiting for information on specific requests.
* Introduce an internal `event_loop_tick()` function that runs all the
available non-blocking operations.
* Move all the logic from functions into the `Client` class. It is now
less generic, but I'd argue it already wasn't that generic and at least
now we can avoid going back and froth between functions and that class.
* Support `Transfer-Encoding: chunked`, `Transfer-Encoding: gzip`, and
`Content-Encoding: gzip` via stream wrappers.
* Remove most of the complexity associated with making PHP streams
central to how the library works. In this version, the focus is on the
`Client` object so we no longer have to go out of our way to store data
in stream context, struggle with stream filters, passthrough data
between stream wrappers layers etc.

This PR also ships an implementation of a HTTP proxy built with this
client library – it could come handy for running an [in-browser Git
client](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/):


https://github.com/WordPress/blueprints-library/blob/http-client-api-refactir/http_proxy.php

## Usage example

```php
$requests = [
	new Request( "https://wordpress.org/latest.zip" ),
	new Request( "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml" ),
];

$client = new Client();
$client->enqueue( $requests );

while ( $client->await_next_event() ) {
	$request = $client->get_request();
	echo "Request " . $request->id . ": " . $client->get_event() . " ";
	switch ( $client->get_event() ) {
		case Client::EVENT_BODY_CHUNK_AVAILABLE:
			echo $request->response->received_bytes . "/". $request->response->total_bytes ." bytes received";
			file_put_contents( 'downloads/' . $request->id, $client->get_response_body_chunk(), FILE_APPEND);
			break;
		case Client::EVENT_REDIRECT:
		case Client::EVENT_GOT_HEADERS:
		case Client::EVENT_FINISHED:
			break;
		case Client::EVENT_FAILED:
			echo "– ❌ Failed request to " . $request->url . " – " . $request->error;
			break;
	}
	echo "\n";
}
```

## HTTP Proxy example

```php
// Encode the current request details in a Request object
$requests = [
	new Request(
		$target_url,
		[
			'method' => $_SERVER['REQUEST_METHOD'],
			'headers' => [
				...getallheaders(),
				// Ensure we won't receive an unsupported content encoding
				// just because the client browser supports it.
				'Accept-Encoding' => 'gzip, deflate',
				'Host' => parse_url($target_url, PHP_URL_HOST),
			],
			// Naively assume only POST requests have body
			'body_stream' => $_SERVER['REQUEST_METHOD'] === 'POST' ? fopen('php://input', 'r') : null,
		]
	),
];

$client = new Client();
$client->enqueue( $requests );

$headers_sent = false;
while ( $client->await_next_event() ) {
    // Pass the response headers and body to the client,
    // Consult the previous example for the details.
}
```

## Future work

* Unit tests.
* Abundant inline documentation with examples and explanation of
technical decisions.
* Standard way of piping HTTP responses into ZIP processor, XML
processor, HTML tag processor etc.
* Find a useful way of treating HTTP error codes such as 404 or 501.
Currently these requests are marked as "finished", not "failed", because
the connection was successfully created and the server replied with a
valid HTTP response. Perhaps it's fine not to do that. This could be a
lower-level library and that behavior could belong to a higher-level
client.

cc @dmsnell @MayPaw @reimic
  • Loading branch information
adamziel authored Jul 15, 2024
1 parent 239f43e commit 9a26c5e
Show file tree
Hide file tree
Showing 20 changed files with 1,459 additions and 804 deletions.
100 changes: 100 additions & 0 deletions chunked_encoding_server.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/**
* Use with `http_api.php` to test chunked transfer encoding:
*
* ```php
* $requests = [
* new Request( "http://127.0.0.1:3000/", [
* 'http_version' => '1.1'
* ] ),
* new Request( "http://127.0.0.1:3000/", [
* 'http_version' => '1.0',
* 'headers' => [
* 'please-redirect' => 'yes',
* ],
* ] ),
* ];
*/

const http = require('http');
const zlib = require('zlib');

const server = http.createServer((req, res) => {
// Check if the client is using HTTP/1.1
const isHttp11 = req.httpVersion === '1.1';
res.useChunkedEncodingByDefault = false

// Check if the client accepts gzip encoding
const acceptEncoding = req.headers['accept-encoding'];
const useGzip = acceptEncoding && acceptEncoding.includes('gzip');

if (req.headers['please-redirect']) {
res.writeHead(301, { Location: req.url });
res.end();
return;
}

// Set headers for chunked transfer encoding if HTTP/1.1
if (isHttp11) {
res.setHeader('Transfer-Encoding', 'chunked');
}

res.setHeader('Content-Type', 'text/plain');

// Create a function to write chunks
const writeChunks = (stream) => {
stream.write(`<!DOCTYPE html>
<html lang=en>
<head>
<meta charset='utf-8'>
<title>Chunked transfer encoding test</title>
</head>\r\n`);

stream.write('<body><h1>Chunked transfer encoding test</h1>\r\n');

setTimeout(() => {
stream.write('<h5>This is a chunked response after 100 ms.</h5>\n');

setTimeout(() => {
stream.write('<h5>This is a chunked response after 1 second. The server should not close the stream before all chunks are sent to a client.</h5></body></html>\n');
stream.end();
}, 1000);
}, 100);
};

if (useGzip) {
res.setHeader('Content-Encoding', 'gzip');
const gzip = zlib.createGzip();
gzip.pipe(res);

if (isHttp11) {
writeChunks({
write(data) {
gzip.write(data);
gzip.flush();
},
end() {
gzip.end();
}
});
} else {
gzip.write('Chunked transfer encoding test\n');
gzip.write('This is a chunked response after 100 ms.\n');
gzip.write('This is a chunked response after 1 second.\n');
gzip.end();
}
} else {
if (isHttp11) {
writeChunks(res);
} else {
res.write('Chunked transfer encoding test\n');
res.write('This is a chunked response after 100 ms.\n');
res.write('This is a chunked response after 1 second.\n');
res.end();
}
}
});

const port = 3000;
server.listen(port, () => {
console.log(`Server is listening on http://127.0.0.1:${port}`);
});
3 changes: 1 addition & 2 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@
"files": [
"src/WordPress/Blueprints/functions.php",
"src/WordPress/Zip/functions.php",
"src/WordPress/Streams/stream_str_replace.php",
"src/WordPress/AsyncHttp/async_http_streams.php"
"src/WordPress/Streams/stream_str_replace.php"
]
},
"autoload-dev": {
Expand Down
72 changes: 25 additions & 47 deletions http_api.php
Original file line number Diff line number Diff line change
@@ -1,57 +1,35 @@
<?php

use WordPress\AsyncHttp\Client;
use WordPress\AsyncHttp\ClientEvent;
use WordPress\AsyncHttp\Request;

require __DIR__ . '/vendor/autoload.php';

$client = new Client();
$client->set_progress_callback( function ( Request $request, $downloaded, $total ) {
echo "$request->url – Downloaded: $downloaded / $total\n";
} );

$streams1 = $client->enqueue( [
new Request( "https://downloads.wordpress.org/plugin/gutenberg.17.7.0.zip" ),
new Request( "https://downloads.wordpress.org/theme/pendant.zip" ),
] );
// Enqueuing another request here is instant and won't start the download yet.
//$streams2 = $client->enqueue( [
// new Request( "https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip" ),
//] );
$requests = [
new Request( "https://wordpress.org/latest.zip" ),
new Request( "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml" ),
];

// Stream a single file, while streaming all the files
file_put_contents( 'output-round1-0.zip', stream_get_contents( $streams1[0] ) );
//file_put_contents( 'output-round1-1.zip', stream_get_contents( $streams1[1] ) );
die();
// Initiate more HTTPS requests
$streams3 = $client->enqueue( [
new Request( "https://downloads.wordpress.org/plugin/akismet.4.1.12.zip" ),
new Request( "https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip" ),
new Request( "https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip" ),
] );

// Download the rest of the files. Foreach() seems like downloading things
// sequentially, but we're actually streaming all the files in parallel.
$streams = array_merge( $streams2, $streams3 );
foreach ( $streams as $k => $stream ) {
file_put_contents( 'output-round2-' . $k . '.zip', stream_get_contents( $stream ) );
$client = new Client();
$client->enqueue( $requests );

while ( $client->await_next_event() ) {
$request = $client->get_request();
echo "Request " . $request->id . ": " . $client->get_event() . " ";
switch ( $client->get_event() ) {
case Client::EVENT_BODY_CHUNK_AVAILABLE:
echo $request->response->received_bytes . "/". $request->response->total_bytes ." bytes received";
file_put_contents( 'downloads/' . $request->id, $client->get_response_body_chunk(), FILE_APPEND);
break;
case Client::EVENT_REDIRECT:
case Client::EVENT_GOT_HEADERS:
case Client::EVENT_FINISHED:
break;
case Client::EVENT_FAILED:
echo "– ❌ Failed request to " . $request->url . "" . $request->error;
break;
}
echo "\n";
}

echo "Done! :)";

// ----------------------------
//
// Previous explorations:

// Non-blocking parallel processing – the fastest method.
//while ( $results = sockets_http_response_await_bytes( $streams, 8096 ) ) {
// foreach ( $results as $k => $chunk ) {
// file_put_contents( 'output' . $k . '.zip', $chunk, FILE_APPEND );
// }
//}

// Blocking sequential processing – the slowest method.
//foreach ( $streams as $k => $stream ) {
// stream_set_blocking( $stream, 1 );
// file_put_contents( 'output' . $k . '.zip', stream_get_contents( $stream ) );
//}
87 changes: 87 additions & 0 deletions http_proxy.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
<?php
/**
* HTTP Proxy implemented using AsyncHttp\Client
*
* This could be a replacement for the curl-based PHPProxy shipped
* in https://github.com/WordPress/wordpress-playground/pull/1546.
*/

use WordPress\AsyncHttp\Client;
use WordPress\AsyncHttp\ClientEvent;
use WordPress\AsyncHttp\Request;

require __DIR__ . '/vendor/autoload.php';

function get_target_url($server_data=null) {
if ($server_data === null) {
$server_data = $_SERVER;
}
$requestUri = $server_data['REQUEST_URI'];
$targetUrl = $requestUri;

// Remove the current script name from the beginning of $targetUrl
if (strpos($targetUrl, $server_data['SCRIPT_NAME']) === 0) {
$targetUrl = substr($targetUrl, strlen($server_data['SCRIPT_NAME']));
}

// Remove the leading slash
if ($targetUrl[0] === '/' || $targetUrl[0] === '?') {
$targetUrl = substr($targetUrl, 1);
}

return $targetUrl;
}
$target_url = get_target_url();
$host = parse_url($target_url, PHP_URL_HOST);
$requests = [
new Request(
$target_url,
[
'method' => $_SERVER['REQUEST_METHOD'],
'headers' => [
...getallheaders(),
'Accept-Encoding' => 'gzip, deflate',
'Host' => $host,
],
'body_stream' => $_SERVER['REQUEST_METHOD'] === 'POST' ? fopen('php://input', 'r') : null,
]
),
];

$client = new Client();
$client->enqueue( $requests );

$headers_sent = false;
while ( $client->await_next_event() ) {
$request = $client->get_request();
switch ( $client->get_event() ) {
case Client::EVENT_GOT_HEADERS:
http_response_code($request->response->status_code);
foreach ( $request->response->get_headers() as $name => $value ) {
if(
$name === 'transfer-encoding' ||
$name === 'set-cookie' ||
$name === 'content-encoding'
) {
continue;
}
header("$name: $value");
}
$headers_sent = true;
break;
case Client::EVENT_BODY_CHUNK_AVAILABLE:
echo $client->get_response_body_chunk();
break;
case Client::EVENT_FAILED:
if(!$headers_sent) {
http_response_code(500);
echo "Failed request to " . $request->url . "" . $request->error;
}
break;
case Client::EVENT_REDIRECT:
case Client::EVENT_FINISHED:
break;
}
echo "\n";
}

Loading

0 comments on commit 9a26c5e

Please sign in to comment.