Skip to content

Commit

Permalink
Improve READMEs and organization (#46)
Browse files Browse the repository at this point in the history
  • Loading branch information
ianmcook authored Jan 24, 2025
1 parent 7c03915 commit dfd45f9
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 8 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,9 @@
# Apache Arrow Experiments

This repository is for collaborative prototyping and research in the Apache Arrow project.

| Directory | Contents |
| --------- | -------- |
| **[data](./data)** | Various datasets that are used by the experiments in this repository or intended to be used in future Arrow experiments |
| **[dissociated-ipc](./dissociated-ipc)** | Reference example implementation of the experimental [Arrow Dissociated IPC Protocol](https://arrow.apache.org/docs/dev/format/DissociatedIPC.html) |
| **[http](./http)** | Examples demonstrating ways of sending and receiving data in Arrow IPC stream format (IANA media type `application/vnd.apache.arrow.stream`) over HTTP APIs |
12 changes: 6 additions & 6 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@

# Apache Arrow Data Experiments

This subdirectory contains experimental Arrow data whose purpose has not
yet become clear but may be useful in the future. This currently includes
data used to generate compelling examples that is more realistic than
generated data or the testing data found in
This directory contains various datasets that are used by the experiments
in this repository or intended to be used in future Arrow experiments.
This currently includes data used to generate compelling examples that is
more realistic than generated data or the testing data found in
[apache/arrow-testing](http://github.com/apache/arrow-testing). This
subdirectory is intended as a semi-temporary staging area: eventually,
data here should find a permanent home elsewhere or be removed.
directory is intended as a semi-temporary staging area; eventually, much
of the data here should find a permanent home elsewhere.

> [!IMPORTANT]
> Please install and use [Git LFS](https://git-lfs.com) when contributing to this subdirectory. Add any new large file extensions to [`.gitattributes`](https://github.com/apache/arrow-experiments/blob/main/.gitattributes).
15 changes: 14 additions & 1 deletion http/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,20 @@

# Apache Arrow HTTP Data Transport

This area of the Apache Arrow Experiments repository is for collaborative prototyping and research on the subject of sending and receiving Arrow-formatted data over HTTP APIs.
This area of the Apache Arrow Experiments repository is for collaborative prototyping and research on the subject of sending and receiving data in Arrow IPC stream format (IANA media type `application/vnd.apache.arrow.stream`) over HTTP APIs.

The subdirectories beginning with **get** demonstrate clients receiving data from servers (HTTP GET request). Those beginning with **post** demonstrate clients sending data to servers (HTTP POST request).

| Subdirectory | Purpose |
| ------------ | ------- |
| **[get_compressed](get_compressed)** | Demonstrates various ways of using compression when sending and receiving Arrow IPC stream data over HTTP |
| **[get_indirect](get_indirect)** | Demonstrates a two-step sequence for fetching Arrow data from a server, in which a JSON document provides the URIs for the Arrow data |
| **[get_multipart](get_multipart)** | Demonstrates how to send and receive a multipart HTTP response body (`multipart/mixed`) containing Arrow IPC stream data and other data |
| **[get_range](get_range)** | Demonstrates how to use HTTP range requests to download Arrow IPC stream data of known length in multiple requests |
| **[get_simple](get_simple)** | Contains a large set of examples demonstrating the basics of fetching an Arrow IPC stream from a server to a client in 12+ languages |
| **[post_multipart](post_multipart)** | Demonstrates how to send and receive a multipart HTTP request body (`multipart/form-data`) containing Arrow IPC stream data and other data |
| **[post_simple](post_simple)** | Demonstrates the basics of sending Arrow IPC stream data from a client to a server |


The intent of this work is to:
- Ensure excellent interoperability across languages.
Expand Down
2 changes: 1 addition & 1 deletion http/get_simple/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ This directory contains a set of minimal examples of HTTP clients and servers im

The examples here assume that the server cannot determine the exact length in bytes of the full Arrow IPC stream before sending it, so they cannot set the `Content-Length` header or serve Range requests.

The client examples here assume that the client needs to hold the full received data in memory in an Arrow data structure for further in-memory processing. (The case in which the client simply writes the result directly to a file is much simpler and can be achieved trivially by using [curl](https://curl.se) or similar.)
Most of the client examples here assume that the client needs to hold the full received data in memory in an Arrow data structure for further in-memory processing. The case in which the client simply writes the result directly to a file is much simpler and is demonstrated by the [curl client example](curl/client).

To enable performance comparisons to Arrow Flight RPC, the server examples generate the data in exactly the same way as in [`flight_benchmark.cc`](https://github.com/apache/arrow/blob/7346bdffbdca36492089f6160534bfa2b81bad90/cpp/src/arrow/flight/flight_benchmark.cc#L194-L245) as cited in the [original blog post introducing Flight RPC](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/). But note that Flight example sends four concurrent streams.

Expand Down
File renamed without changes.
File renamed without changes.

0 comments on commit dfd45f9

Please sign in to comment.