Abstract internal design of the proxy away from the S3 API #37

vagaerg · 2024-05-28T13:44:13Z

This is a follow up to #20 , originated in #36 (comment)

While we may not yet want to support non-S3-like endpoints for the underlying storage, we should make sure the internal design of the proxy is not tied to the design of the S3 API.

The proxy should receive requests in S3 format, parse them into some standard record format that conveys enough information as to what operation is being performed and upon what object(s)/bucket(s)/prefix(es) (if applicable), and pass this to a "proxy" interface that deals with sending it to the real storage system and parse the response back into a standard format (if applicable). The proxy should then use this information to reconstruct an S3-like response.

For now, there will probably only be implementations for S3-like systems, but the interface should be flexible enough that there's no leakage of S3 implementation details.

For instance, the TrinoS3ProxyResource (external facing interface) is OK to assume that all requests are coming in S3 format (as the proxy will always expose an S3 API externally), but it should not assume that it will be using an S3-like endpoint to forward requests to:

https://github.com/trinodb/s3-proxy/blob/77fa27e659bdbb0cd5e1cf18bcace3e1a4012bac/trino-s3-proxy/src/main/java/io/trino/s3/proxy/server/rest/TrinoS3ProxyResource.java#L40

TrinoS3ProxyClient also implements some logic to parse the incoming request: https://github.com/trinodb/s3-proxy/blob/77fa27e659bdbb0cd5e1cf18bcace3e1a4012bac/trino-s3-proxy/src/main/java/io/trino/s3/proxy/server/rest/TrinoS3ProxyClient.java#L76

This would probably be better centralised as mentioned above, such that TrinoS3ProxyClient gets a standard Java record and doesn't need to deal with URLs, HTTP methods, etc...

The text was updated successfully, but these errors were encountered:

vagaerg · 2024-09-20T16:16:05Z

Some additional context: this would also be desirable for permissioning / authorisation decisions.

The current request record allows the authorisation system to make decisions based on the bucket & key a request is targeting quite easily. However, many requests may target the same bucket, key & HTTP method and yet have completely diferent meanings based on query parameters and headers.

For instance: GET with a bucket but without a key could be (just to name a few, there's more options):

While some of these are similar enough that they may not make too much of a difference from an authorisation standpoint (e.g. ListObjects and ListObjectsV2) others are fundamentally different operations (e.g., getting a bucket owner or its ACLs)

This was referenced May 28, 2024

Replace endpoint builder with ObjectStore abstraction #36

Merged

Support serving of Virtual Host requests #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract internal design of the proxy away from the S3 API #37

Abstract internal design of the proxy away from the S3 API #37

vagaerg commented May 28, 2024

vagaerg commented Sep 20, 2024

Abstract internal design of the proxy away from the S3 API #37

Abstract internal design of the proxy away from the S3 API #37

Comments

vagaerg commented May 28, 2024

vagaerg commented Sep 20, 2024