Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy append in groupby.apply #231

Open
lewisjared opened this issue Apr 27, 2023 · 5 comments
Open

Lazy append in groupby.apply #231

lewisjared opened this issue Apr 27, 2023 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@lewisjared
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

We often have apply functions that look like the following (the grouping isn't important here) and end with an appending of a set of S

def f(run) -> ScmRun:
    return scmdata.run_append(
        [
            run.set_meta("col", True),
            run.set_meta("col", False),
        ]
    )


df.groupby("variable").apply(f)

Rather than performing n + 1 appends (one for each call of f and 1 to combine), a single append could be performed if a list of ScmRun objects is returned and the results are lazily appended together at the end of the groupby operation.

f would become:

def f(run) -> list[ScmRun]:
    return [
            run.set_meta("col", True),
            run.set_meta("col", False),
        ]

This should result in a small performance improvement for the case where there are lots of groups.

Describe the solution you'd like

Update run_append to handle appending runs of type list[BaseScmRun | list[BaseScmRun]. aka a list of a mix of ScmRuns or lists of ScmRuns.

This wouldn't require much, if any, change to the groupby code other than updating documentation.

Describe alternatives you've considered

Handling apply functions return values differently if it is a ScmRun or a list of ScmRun. The proposed soln is more flexible as similar functionality may be used in other places.

@lewisjared lewisjared added enhancement New feature or request good first issue Good for newcomers labels Apr 27, 2023
@znicholls
Copy link
Collaborator

znicholls commented Apr 27, 2023 via email

@lewisjared
Copy link
Collaborator Author

I don't think that adding another function would make maintenance any easier when the behaviour can easily be added to run_appends parameters which has clearer typing now. I think this will be a one-line fix to run_append to flatten a list of lists using itertools.

The real value will come from adding a Protocol type for the func used in apply. It isn't that clear what that function is expected to do.

I think that the (proposed) interface is something like the following when also accepting a list of ScmRuns

class ApplyFunc(Protocol):
    __call__(run: T, *args: Any, **kwargs) -> T |  list[T]:
        ....

The problem is that we often instead use the following, which is subtly different in that we don't handle arbitrary kwargs

class ApplyFunc(Protocol):
    __call__(run: T) -> T |  list[T]:
        ....

So I think that the type that will make mypy happy is more likely Callable[[T], T | list[t] | ApplyFunc

@znicholls
Copy link
Collaborator

I don't think that adding another function would make maintenance any easier when the behaviour can easily be added to run_appends parameters which has clearer typing now. I think this will be a one-line fix to run_append to flatten a list of lists using itertools.

True, if we get the typing below right that does simplify things.

So I think that the type that will make mypy happy is more likely Callable[[T], T | list[t] | ApplyFunc

Missing bracket somewhere? Did you mean: Callable[[T], T | list[T]] ? Or Callable[[T], T | list[T]] | ApplyFunc. Could we not just do func: ApplyFunc and have

class ApplyFunc(Protocol):
    __call__(run: T, *args: Any, **kwargs: Any) -> T |  list[T]:
        ...

Which case does that miss? Isn't have no args and kwargs a subset of any args and kwargs? (Feel free to also just make a PR and ignore this question, I can try it out once there's something to play with)

@lewisjared
Copy link
Collaborator Author

Did you mean...

Opps, I meant Callable[[T], T | list[T]] | ApplyFunc. Either a function which takes a single argument or one that conforms with this protocol. 99% of cases where I've used this functionality, it has been using the single argument form.

In my preliminary testing, the apply function needed to accept these kwargs if only the ApplyFunc was used as a typehint. The following function doesn't conform with ApplyFunc:

def invalid_func(run: ScmRun) -> ScmRun:
    ...

Maybe something fancy with a typing.ParamSpec could be done to use the types of the Args from apply, but that isn't as understandable and would require some fiddling to know if it would even be suitable.

@znicholls
Copy link
Collaborator

znicholls commented May 6, 2023

Ok nice sounds like you've found the right path forward then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants