Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized sample #63

Open
erwald opened this issue Mar 13, 2024 · 3 comments
Open

Vectorized sample #63

erwald opened this issue Mar 13, 2024 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@erwald
Copy link
Contributor

erwald commented Mar 13, 2024

It would be nice to have a vectorized version of sq.sample. I often find myself having data frames or series that contain distributions, and when I want to expand those to medians and percentiles, I have to use apply which is slow and a bit verbose. Nicer would be to be able to sample the entire series at once. (This is a feature request.)

@peterhurford
Copy link
Collaborator

@erwald can you give me a code sample?

Basically I want to see two things to understand:

1.) Write the code that uses apply the way that actually works

2.) Write pretend code that uses sq.sample the way you ideally want it to work should this feature be implemented correctly

@peterhurford peterhurford added the enhancement New feature or request label Mar 13, 2024
@peterhurford peterhurford added this to the v0.28 milestone Mar 13, 2024
@erwald
Copy link
Contributor Author

erwald commented Mar 17, 2024

Here's some code:

import pandas as pd
import squigglepy as sq

N = 1000
series = pd.Series(range(1, 5)) * sq.norm(mean=0, sd=1)
print(series.apply(lambda row: sq.get_percentiles(row @ N, percentiles=[50]))) # works
print(sq.get_percentiles(series @ N, percentiles=[50])) # would be nice if it did work

The first print statement will output a series of medians:

: 0    0.046183
: 1   -0.003956
: 2   -0.016223
: 3   -0.025846

The second print statement does not work because you currently can't sample a series or data frame. I mostly want this as convenience, but it might also be possible to get performance benefits from doing this, since I believe you would get the performance benefits of vectorized operations (like multiply, etc.) when sampling?

(Of course the above example would also require get_percentiles to be vectorized in this way.)

@erwald
Copy link
Contributor Author

erwald commented Mar 17, 2024

A common use case here is that I have some time series of estimates represented as squigglepy distributions, and I want to get the medians and 5th and 95th percentiles (or w/e) for each row for plotting.

@peterhurford peterhurford modified the milestones: v0.28, v0.29 Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants