-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy append in groupby.apply
#231
Comments
Why not create a new method, `apply_lazy`? Wouldn't that make maintenance
simpler as we'd only have to deal with one expected return type per method?
…On Thu, Apr 27, 2023 at 10:33 PM Jared Lewis ***@***.***> wrote:
*Is your feature request related to a problem? Please describe.*
We often have apply functions that look like the following (the grouping
isn't important here) and end with an appending of a set of S
def f(run) -> ScmRun:
return scmdata.run_append(
[
run.set_meta("col", True),
run.set_meta("col", False),
]
)
df.groupby("variable").apply(f)
Rather than performing n + 1 appends (one for each call of f and 1 to
combine), a single append could be performed if a list of ScmRun objects
is returned and the results are lazily appended together at the end of the
groupby operation.
f would become:
def f(run) -> list[ScmRun]:
return [
run.set_meta("col", True),
run.set_meta("col", False),
]
This should result in a small performance improvement for the case where
there are lots of groups.
*Describe the solution you'd like*
Update run_append to handle appending runs of type list[BaseScmRun |
list[BaseScmRun]. aka a list of a mix of ScmRuns or lists of ScmRuns.
This wouldn't require much, if any, change to the groupby code other than
updating documentation.
*Describe alternatives you've considered*
Handling apply functions return values differently if it is a ScmRun or a
list of ScmRun. The proposed soln is more flexible as similar functionality
may be used in other places.
—
Reply to this email directly, view it on GitHub
<#231>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFUH5G24PEL7N4NJ5UYLTXLXDJRRBANCNFSM6AAAAAAXNZLWYY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I don't think that adding another function would make maintenance any easier when the behaviour can easily be added to The real value will come from adding a I think that the (proposed) interface is something like the following when also accepting a list of ScmRuns
The problem is that we often instead use the following, which is subtly different in that we don't handle arbitrary kwargs
So I think that the type that will make mypy happy is more likely |
True, if we get the typing below right that does simplify things.
Missing bracket somewhere? Did you mean: class ApplyFunc(Protocol):
__call__(run: T, *args: Any, **kwargs: Any) -> T | list[T]:
... Which case does that miss? Isn't have no args and kwargs a subset of any args and kwargs? (Feel free to also just make a PR and ignore this question, I can try it out once there's something to play with) |
Opps, I meant In my preliminary testing, the apply function needed to accept these kwargs if only the def invalid_func(run: ScmRun) -> ScmRun:
... Maybe something fancy with a |
Ok nice sounds like you've found the right path forward then |
Is your feature request related to a problem? Please describe.
We often have apply functions that look like the following (the grouping isn't important here) and end with an appending of a set of S
Rather than performing
n + 1
appends (one for each call of f and 1 to combine), a single append could be performed if a list ofScmRun
objects is returned and the results are lazily appended together at the end of the groupby operation.f
would become:This should result in a small performance improvement for the case where there are lots of groups.
Describe the solution you'd like
Update
run_append
to handle appending runs of typelist[BaseScmRun | list[BaseScmRun]
. aka a list of a mix of ScmRuns or lists of ScmRuns.This wouldn't require much, if any, change to the groupby code other than updating documentation.
Describe alternatives you've considered
Handling apply functions return values differently if it is a ScmRun or a list of ScmRun. The proposed soln is more flexible as similar functionality may be used in other places.
The text was updated successfully, but these errors were encountered: