[Proof of concept] Expr serialize #916

billylanchantin · 2024-06-01T22:46:17Z

🚧 Do not merge

This PR is only a proof of concept.

Description

Adds expr_to_json and expr_from_json.

The idea is to provide an escape hatch for Polars functionality we don't yet support. Included is a test case demonstrating that it's possible.

Discussion

Explorer can lag behind what's available in Polars. This functionality would allow users to opt into behavior not otherwise available. My hope is that we can turn this idea into something like Ecto.Query.API.fragment/1.

This approach is messy for a few reasons:

You have to somehow get your hands on the JSON
The escape hatch is Polars-specific
We'd have to weave it into many functions (though I have thoughts here)

But for that price, we instantly open up all expression-based operations. Please let me know if you think!

josevalim · 2024-06-02T13:12:14Z

I am not sure we should go down this route because this is serializing an internal representation, right? So there is no guarantee that, once a new Polars version lands, the JSON representation (keys and values) will remain relevant? I think relying on their SQL API should be slightly more stable, because that's a public API?

billylanchantin · 2024-06-02T14:08:23Z

So there is no guarantee that, once a new Polars version lands, the JSON representation (keys and values) will remain relevant?

The API is public:

And they don't say you can't rely on it. But I think you're likely correct regardless. Looking at the example JSON from my test, it seems they're including a lot of what looks like internal stuff.

In theory if a particular representation breaks, a user could just get the new JSON from the new version of Polars. I'd be ok with that since this is supposed to be a backdoor. But I take your point!

I think relying on their SQL API should be slightly more stable, because that's a public API?

That's true. However their docs say:

As the DataFrame interface is primary, new features are typically added to the expression API first.

So I'm not sure exposing the SQL API would also expose the expressions we want, especially since:

There is no separate SQL engine because Polars translates SQL queries into expressions, which are then executed using its own engine.

But we should definitely pursue the SQL API regardless. How cool would it be for Explorer.DataFrame to implement Ecto.Queryable?

Still, I don't think the SQL API gets us the expressiveness I'm going for. But not exposing an API because it's internal/unreliable is potentially a deal breaker, so great feedback.

josevalim · 2024-06-02T14:19:42Z

The API is public but I am not sure if its output are guaranteed to be stable across versions. For example, it could be used to serialize expressions across nodes for the same Polars versions, but not guarantees across different ones.

If Polars had an API for building an expression from a SQL fragment, that would be ideal.

billylanchantin · 2024-06-02T14:25:47Z

If Polars had an API for building an expression from a SQL fragment, that would be ideal.

Seems like they do!

https://docs.rs/polars-sql/latest/polars_sql/fn.sql_expr.html

josevalim · 2024-06-02T14:27:18Z

@billylanchantin YES!!!

billylanchantin · 2024-06-02T14:48:23Z

@josevalim Ok I can close this and start working on exposing a SQL-to-exprs pipeline :)

FYI you made a good point here: #818 (comment)

But maybe viewing the SQL as somewhat of an escape hatch changes things a bit?

josevalim · 2024-06-02T17:54:45Z

But maybe viewing the SQL as somewhat of an escape hatch changes things a bit?

Exactly. To me, the main question is if we can provide a consistent behaviour across backends. For example, if we have a Postgres backend, would this feature also make sense there? And I think it would, so let's go for it.

billylanchantin · 2024-06-02T20:18:42Z

Closing in favor of: #918

billylanchantin added 5 commits June 1, 2024 18:07

add to/from json for exprs

34f55f4

add tests including proof of concept

cb040ae

refactor test

d88184e

make it so i can copy/paste into the repl

572398a

turns out jason isn't an actual dep

7d92810

billylanchantin closed this Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proof of concept] Expr serialize #916

[Proof of concept] Expr serialize #916

billylanchantin commented Jun 1, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

[Proof of concept] Expr serialize #916

[Proof of concept] Expr serialize #916

Conversation

billylanchantin commented Jun 1, 2024

🚧 Do not merge

Description

Discussion

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024

josevalim commented Jun 2, 2024

billylanchantin commented Jun 2, 2024