Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract internal design of the proxy away from the S3 API #37

Open
vagaerg opened this issue May 28, 2024 · 1 comment
Open

Abstract internal design of the proxy away from the S3 API #37

vagaerg opened this issue May 28, 2024 · 1 comment

Comments

@vagaerg
Copy link
Member

vagaerg commented May 28, 2024

This is a follow up to #20 , originated in #36 (comment)

While we may not yet want to support non-S3-like endpoints for the underlying storage, we should make sure the internal design of the proxy is not tied to the design of the S3 API.

The proxy should receive requests in S3 format, parse them into some standard record format that conveys enough information as to what operation is being performed and upon what object(s)/bucket(s)/prefix(es) (if applicable), and pass this to a "proxy" interface that deals with sending it to the real storage system and parse the response back into a standard format (if applicable). The proxy should then use this information to reconstruct an S3-like response.

For now, there will probably only be implementations for S3-like systems, but the interface should be flexible enough that there's no leakage of S3 implementation details.

For instance, the TrinoS3ProxyResource (external facing interface) is OK to assume that all requests are coming in S3 format (as the proxy will always expose an S3 API externally), but it should not assume that it will be using an S3-like endpoint to forward requests to:

https://github.com/trinodb/s3-proxy/blob/77fa27e659bdbb0cd5e1cf18bcace3e1a4012bac/trino-s3-proxy/src/main/java/io/trino/s3/proxy/server/rest/TrinoS3ProxyResource.java#L40

TrinoS3ProxyClient also implements some logic to parse the incoming request: https://github.com/trinodb/s3-proxy/blob/77fa27e659bdbb0cd5e1cf18bcace3e1a4012bac/trino-s3-proxy/src/main/java/io/trino/s3/proxy/server/rest/TrinoS3ProxyClient.java#L76

This would probably be better centralised as mentioned above, such that TrinoS3ProxyClient gets a standard Java record and doesn't need to deal with URLs, HTTP methods, etc...

@vagaerg
Copy link
Member Author

vagaerg commented Sep 20, 2024

Some additional context: this would also be desirable for permissioning / authorisation decisions.

The current request record allows the authorisation system to make decisions based on the bucket & key a request is targeting quite easily. However, many requests may target the same bucket, key & HTTP method and yet have completely diferent meanings based on query parameters and headers.

For instance: GET with a bucket but without a key could be (just to name a few, there's more options):

While some of these are similar enough that they may not make too much of a difference from an authorisation standpoint (e.g. ListObjects and ListObjectsV2) others are fundamentally different operations (e.g., getting a bucket owner or its ACLs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant