Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ibis backend #53

Closed
ion-elgreco opened this issue Apr 19, 2024 · 8 comments
Closed

Ibis backend #53

ion-elgreco opened this issue Apr 19, 2024 · 8 comments

Comments

@ion-elgreco
Copy link

Could be interesting, since you could then run any of their 20 backends with Polars like API 👀

@MarcoGorelli
Copy link
Member

😄 thanks for the suggestion, but I'd rather look into converting to substrait directly. Ibis is too heavy, I'd rather avoid it even as an optional dependency

@MarcoGorelli
Copy link
Member

closing then, but I appreciate your interest!

@lostmygithubaccount
Copy link

hello! just to add a bit more flavor on this from the Ibis perspective:

since you could then run any of their 20 backends with Polars like API 👀

we would be interested in what we've called "API skins" on top of Ibis for pandas, PySpark, and Polars. the main issue here is time and effort. with pandas, you're always going to end up with an operation support matrix and, but there's already a pretty good start with BigQuery dataframes -- a pandas clone built on top of Ibis. one could take that project and build it work for any generic backend

for PySpark/Polars, the approach would be similar. the Polars API has been relatively unstable and this probably wouldn't make sense to do until 1.0, though we'd welcome any contributions in this direction! it's just not asked frequently enough for us to put in the effort for any of these "skins". you could also probably get a long way by simply aliasing a few things, like mutate -> with_columns and ibis._ -> pl.col 🤷

Ibis is too heavy

the "heavy" installation size of Ibis is nearly all from a few packages:

  1. pandas
  2. numpy (required dependency of pandas)
  3. PyArrow (optional, but encourage for performance, dependency of pandas)

Narwhals has (or would have) the same issues today -- if you're running it on pandas, you must install pandas and numpy (and probably should install PyArrow) so your size is about the same. for any other backend, you have to install that backend and its dependencies so sizes may vary

Ibis has been working to make pandas/pyarrow optional dependencies, and that work is pretty close to done. if you or anyone are eager to see that work get over the finish line, we welcome contributions! but "heavy" installation isn't a very frequent complaint. pandas/pyarrow dependencies are all over the place and it's not a big issue for most

look into converting to substrait directly

we can already go from Ibis to Substrait -- the main issue here is Substrait is still fairly nascent, and unsupported by most backends. it does look promising that in a few years we will simply need dataframe API -> Substrait -> backend, but now if Narwhals wants to support 20+ backends I'm afraid it'll end up duplicating most of the work done in Ibis

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Apr 20, 2024

Thanks for your input!

I have a client for whom I'm running the following on AWS lambda:

It all fits, and it all works wonderfully. Including a package which uses Ibis would be a non-starter

Narwhals doesn't aim to support 20+ backends. My only objective here is in providing a compatibility layer to allow for writing dataframe-agnostic code which:

  • allows for writing efficient Polars code
  • is lightweight
  • avoids pandas footguns
  • is easy-to-use and readable

The target audience is library developers, not end-users. In that sense, I think its goal differ from Ibis', and hence why I don't consider them competitors

I don't even know if anyone's going to use Narwhals, at the moment it's just a fun experiment, and it's working out better than I was expecting it to

@MarcoGorelli
Copy link
Member

converting to substrait directly

this is also out-of-scope for now, see https://github.com/MarcoGorelli/narwhals/issues/60 for more explanation

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jul 8, 2024

Quick update: I'd like to support Ibis, but not for the full Narwhals API, see #566

For anyone interested in running DuckDB with a Python API, I'd suggest sticking with Ibis, realistically it'll always be out of scope for Narwhals

@NickCrews
Copy link

NickCrews commented Oct 25, 2024

I have a little library that provides an ipywidget for better ibis table exploration in jupyter. I am interested in modifying it to work with all dataframe libs, eg polars and pandas. I thought narwhals would be a good use for this. But I want it to still work with ibis. Am I correct in understanding that I am blocked on that until narwhals supports ibis as a backend?

@MarcoGorelli
Copy link
Member

hey @NickCrews

nice library!

thanks for your question - for now I'd suggest having separate codepaths: a narwhals/dataframe one, and an ibis/sql one

i'll let if you know when we progress with the lazy-only layer of support (possibly some time in 2025, right now the priority is on helping some integrations go from 80% to 100% of the way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants