diff --git a/README.md b/README.md index 881a188..3ee88da 100644 --- a/README.md +++ b/README.md @@ -20,3 +20,9 @@ # Apache Arrow Experiments This repository is for collaborative prototyping and research in the Apache Arrow project. + +| Directory | Contents | +| --------- | -------- | +| **[data](./data)** | Various datasets that are used by the experiments in this repository or intended to be used in future Arrow experiments | +| **[dissociated-ipc](./dissociated-ipc)** | Reference example implementation of the experimental [Arrow Dissociated IPC Protocol](https://arrow.apache.org/docs/dev/format/DissociatedIPC.html) | +| **[http](./http)** | Examples demonstrating ways of sending and receiving data in Arrow IPC stream format (IANA media type `application/vnd.apache.arrow.stream`) over HTTP APIs | diff --git a/data/README.md b/data/README.md index f598c4a..3b6229b 100644 --- a/data/README.md +++ b/data/README.md @@ -19,13 +19,13 @@ # Apache Arrow Data Experiments -This subdirectory contains experimental Arrow data whose purpose has not -yet become clear but may be useful in the future. This currently includes -data used to generate compelling examples that is more realistic than -generated data or the testing data found in +This directory contains various datasets that are used by the experiments +in this repository or intended to be used in future Arrow experiments. +This currently includes data used to generate compelling examples that is +more realistic than generated data or the testing data found in [apache/arrow-testing](http://github.com/apache/arrow-testing). This -subdirectory is intended as a semi-temporary staging area: eventually, -data here should find a permanent home elsewhere or be removed. +directory is intended as a semi-temporary staging area; eventually, much +of the data here should find a permanent home elsewhere. > [!IMPORTANT] > Please install and use [Git LFS](https://git-lfs.com) when contributing to this subdirectory. Add any new large file extensions to [`.gitattributes`](https://github.com/apache/arrow-experiments/blob/main/.gitattributes). diff --git a/http/README.md b/http/README.md index 164c54e..2e8231b 100644 --- a/http/README.md +++ b/http/README.md @@ -19,7 +19,20 @@ # Apache Arrow HTTP Data Transport -This area of the Apache Arrow Experiments repository is for collaborative prototyping and research on the subject of sending and receiving Arrow-formatted data over HTTP APIs. +This area of the Apache Arrow Experiments repository is for collaborative prototyping and research on the subject of sending and receiving data in Arrow IPC stream format (IANA media type `application/vnd.apache.arrow.stream`) over HTTP APIs. + +The subdirectories beginning with **get** demonstrate clients receiving data from servers (HTTP GET request). Those beginning with **post** demonstrate clients sending data to servers (HTTP POST request). + +| Subdirectory | Purpose | +| ------------ | ------- | +| **[get_compressed](get_compressed)** | Demonstrates various ways of using compression when sending and receiving Arrow IPC stream data over HTTP | +| **[get_indirect](get_indirect)** | Demonstrates a two-step sequence for fetching Arrow data from a server, in which a JSON document provides the URIs for the Arrow data | +| **[get_multipart](get_multipart)** | Demonstrates how to send and receive a multipart HTTP response body (`multipart/mixed`) containing Arrow IPC stream data and other data | +| **[get_range](get_range)** | Demonstrates how to use HTTP range requests to download Arrow IPC stream data of known length in multiple requests | +| **[get_simple](get_simple)** | Contains a large set of examples demonstrating the basics of fetching an Arrow IPC stream from a server to a client in 12+ languages | +| **[post_multipart](post_multipart)** | Demonstrates how to send and receive a multipart HTTP request body (`multipart/form-data`) containing Arrow IPC stream data and other data | +| **[post_simple](post_simple)** | Demonstrates the basics of sending Arrow IPC stream data from a client to a server | + The intent of this work is to: - Ensure excellent interoperability across languages. diff --git a/http/get_simple/README.md b/http/get_simple/README.md index 5f9c552..e6be795 100644 --- a/http/get_simple/README.md +++ b/http/get_simple/README.md @@ -25,7 +25,7 @@ This directory contains a set of minimal examples of HTTP clients and servers im The examples here assume that the server cannot determine the exact length in bytes of the full Arrow IPC stream before sending it, so they cannot set the `Content-Length` header or serve Range requests. -The client examples here assume that the client needs to hold the full received data in memory in an Arrow data structure for further in-memory processing. (The case in which the client simply writes the result directly to a file is much simpler and can be achieved trivially by using [curl](https://curl.se) or similar.) +Most of the client examples here assume that the client needs to hold the full received data in memory in an Arrow data structure for further in-memory processing. The case in which the client simply writes the result directly to a file is much simpler and is demonstrated by the [curl client example](curl/client). To enable performance comparisons to Arrow Flight RPC, the server examples generate the data in exactly the same way as in [`flight_benchmark.cc`](https://github.com/apache/arrow/blob/7346bdffbdca36492089f6160534bfa2b81bad90/cpp/src/arrow/flight/flight_benchmark.cc#L194-L245) as cited in the [original blog post introducing Flight RPC](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/). But note that Flight example sends four concurrent streams. diff --git a/http/get_simple/matlab/README.md b/http/get_simple/matlab/client/README.md similarity index 100% rename from http/get_simple/matlab/README.md rename to http/get_simple/matlab/client/README.md diff --git a/http/get_simple/matlab/client.m b/http/get_simple/matlab/client/client.m similarity index 100% rename from http/get_simple/matlab/client.m rename to http/get_simple/matlab/client/client.m