Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proof of concept] Expr serialize #916

Closed
wants to merge 5 commits into from
Closed

Conversation

billylanchantin
Copy link
Contributor

🚧 Do not merge

This PR is only a proof of concept.

Description

Adds expr_to_json and expr_from_json.

The idea is to provide an escape hatch for Polars functionality we don't yet support. Included is a test case demonstrating that it's possible.

Discussion

Explorer can lag behind what's available in Polars. This functionality would allow users to opt into behavior not otherwise available. My hope is that we can turn this idea into something like Ecto.Query.API.fragment/1.

This approach is messy for a few reasons:

  • You have to somehow get your hands on the JSON
  • The escape hatch is Polars-specific
  • We'd have to weave it into many functions (though I have thoughts here)

But for that price, we instantly open up all expression-based operations. Please let me know if you think!

@josevalim
Copy link
Member

I am not sure we should go down this route because this is serializing an internal representation, right? So there is no guarantee that, once a new Polars version lands, the JSON representation (keys and values) will remain relevant? I think relying on their SQL API should be slightly more stable, because that's a public API?

@billylanchantin
Copy link
Contributor Author

So there is no guarantee that, once a new Polars version lands, the JSON representation (keys and values) will remain relevant?

The API is public:

And they don't say you can't rely on it. But I think you're likely correct regardless. Looking at the example JSON from my test, it seems they're including a lot of what looks like internal stuff.

In theory if a particular representation breaks, a user could just get the new JSON from the new version of Polars. I'd be ok with that since this is supposed to be a backdoor. But I take your point!

I think relying on their SQL API should be slightly more stable, because that's a public API?

That's true. However their docs say:

As the DataFrame interface is primary, new features are typically added to the expression API first.

So I'm not sure exposing the SQL API would also expose the expressions we want, especially since:

There is no separate SQL engine because Polars translates SQL queries into expressions, which are then executed using its own engine.

But we should definitely pursue the SQL API regardless. How cool would it be for Explorer.DataFrame to implement Ecto.Queryable?

Still, I don't think the SQL API gets us the expressiveness I'm going for. But not exposing an API because it's internal/unreliable is potentially a deal breaker, so great feedback.

@josevalim
Copy link
Member

The API is public but I am not sure if its output are guaranteed to be stable across versions. For example, it could be used to serialize expressions across nodes for the same Polars versions, but not guarantees across different ones.

If Polars had an API for building an expression from a SQL fragment, that would be ideal.

@billylanchantin
Copy link
Contributor Author

If Polars had an API for building an expression from a SQL fragment, that would be ideal.

Seems like they do!

@josevalim
Copy link
Member

@billylanchantin YES!!!

@billylanchantin
Copy link
Contributor Author

@josevalim Ok I can close this and start working on exposing a SQL-to-exprs pipeline :)

FYI you made a good point here: #818 (comment)

But maybe viewing the SQL as somewhat of an escape hatch changes things a bit?

@josevalim
Copy link
Member

But maybe viewing the SQL as somewhat of an escape hatch changes things a bit?

Exactly. To me, the main question is if we can provide a consistent behaviour across backends. For example, if we have a Postgres backend, would this feature also make sense there? And I think it would, so let's go for it.

@billylanchantin
Copy link
Contributor Author

Closing in favor of: #918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants