Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for range-based OVER window semantics #1130

Open
zhenzhongxu opened this issue Sep 5, 2024 · 3 comments
Open

feat: Add support for range-based OVER window semantics #1130

zhenzhongxu opened this issue Sep 5, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@zhenzhongxu
Copy link

What happened?

Ibis supports both row-based and range-based OVER window semantics, showing below:

import ibis
from ibis import _

bid_schema = ibis.schema(
    {
        "auction": "int64",
        "bidder": "int64",
        "price": "float64",
        "datetime": "timestamp(3)"
    }
)
bid_table = ibis.table(name="Bid", schema=bid_schema)

# range-based OVER window
range_window = ibis.window(group_by=_.auction, range=(-ibis.interval(seconds=10), 0))
bid_table.filter(_ is not None)[_.price.mean().over(range_window, order_by=_.datetime).name("avg_price")]

# row-based OVER window
row_window =ibis.window(group_by=_.auction, preceding=5, following=5, order_by=_.datetime)
bid_table.filter(_ is not None)[_.price.mean().over(row_window, order_by=_.datetime).name("avg_price")]

It looks like ibis-substrate errors out generating the corresponding substrait plan for the range window.

What version of ibis-substrait are you using?

4.0.1

What substrait consumer(s) are you using, if any?

N/A

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 plan = compiler.compile(window_avg_price)
      2 with open("window_avg_price.proto", "wb") as f:
      3     f.write(plan.SerializeToString())

File /opt/conda/envs/composable-data-arch/lib/python3.12/site-packages/ibis_substrait/compiler/core.py:222, in SubstraitCompiler.compile(self, expr, **kwargs)
    217 from .translate import translate
    219 expr_schema = expr.schema()
    220 rel = stp.PlanRel(
    221     root=stalg.RelRoot(
--> 222         input=translate(expr.op(), compiler=self, **kwargs),
    223         names=translate(expr_schema).names,
    224     )
    225 )
    226 ver = vparse(__substrait_version__)
    227 return stp.Plan(
    228     version=stp.Version(
    229         major_number=ver.major,
   (...)
    256     relations=[rel],
    257 )

File /opt/conda/envs/composable-data-arch/lib/python3.12/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
...
    442     return translate_preceding(boundary.value.value)  # type: ignore
    443 else:
--> 444     return translate_following(boundary.value.value)

AttributeError: 'Cast' object has no attribute 'value'
@gforsyth
Copy link
Member

gforsyth commented Sep 5, 2024

This turned up a few issues in how we were translating window functions that I can address.

But there's one thing that I cannot address -- as far as I can tell, there is no way to specify an interval for a window boundary in Substrait. I could be wrong about this, but reading over the spec, it appears to only consider integer number of rows:

      message Preceding {
        // A strictly positive integer specifying the number of records that
        // the window extends back from the current record. Required. Use
        // CurrentRow for offset zero and Following for negative offsets.
        int64 offset = 1;
      }

      // Defines that the bound extends this far ahead of the current record.
      message Following {
        // A strictly positive integer specifying the number of records that
        // the window extends ahead of the current record. Required. Use
        // CurrentRow for offset zero and Preceding for negative offsets.
        int64 offset = 1;
      }

@zhenzhongxu
Copy link
Author

@EpsilonPrime, do you have thoughts on @gforsyth's question regarding supporting interval for a window boundary in Substrait?

@EpsilonPrime
Copy link

EpsilonPrime commented Sep 7, 2024

Would specifying the BoundsType as BOUNDS_TYPE_RANGE work here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: backlog
Development

No branches or pull requests

3 participants