Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement missing Expanding and Rolling APIs in Series, Frame, SeriesGroupBy, FrameGroupBy #977

Open
HyukjinKwon opened this issue Oct 30, 2019 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@HyukjinKwon
Copy link
Member

See https://github.com/databricks/koalas/blob/master/databricks/koalas/missing/window.py

It mimics the class hierarchy from pandas internally. Ideally, we will be able to share implementation across APIs (like cumsum).

@HyukjinKwon HyukjinKwon added enhancement New feature or request help wanted Extra attention is needed labels Oct 30, 2019
@HyukjinKwon
Copy link
Member Author

I am starting with Expanding in Series and Frame.

@HyukjinKwon HyukjinKwon pinned this issue Oct 30, 2019
HyukjinKwon added a commit that referenced this issue Oct 31, 2019
This PR implements `expanding().count()`:

```python
>>> import databricks.koalas as ks
>>> ks.Series([2, 3, float("nan"), 10]).expanding().count()
0    1.0
1    2.0
2    2.0
3    3.0
Name: 0, dtype: float64
>>> ks.DataFrame({'a': [1, float('nan'), 3], 'b': [1.0, 2.0, 3.0]}).expanding().count()
     a    b
0  1.0  1.0
1  1.0  2.0
2  2.0  3.0
```

Relates to #977
@itholic
Copy link
Contributor

itholic commented Nov 1, 2019

I am starting with Rolling in Series and Frame.

@HyukjinKwon
Copy link
Member Author

Please go ahead!

HyukjinKwon pushed a commit that referenced this issue Nov 6, 2019
This PR implements rolling().count():

```python
>>> import databricks.koalas as ks
>>> s = ks.Series([2, 3, float("nan"), 10])
>>> s.rolling(1).count()
0    1.0
1    1.0
2    0.0
3    1.0
Name: 0, dtype: float64

>>> s.to_frame().rolling(1).count()
     0
0  1.0
1  1.0
2  0.0
3  1.0
```

Relates to #977
HyukjinKwon pushed a commit that referenced this issue Nov 7, 2019
…996)

This PR implement min, max, sum and mean of Rolling in Series and DataFrame

For Series:

```python
>>> kser = ks.Series([2, 3, float("nan"), 10])
>>> kser
0     2.0
1     3.0
2     NaN
3    10.0
Name: 0, dtype: float64

>>> ksr.rolling(2).sum()
0    NaN
1    5.0
2    3.0
3    NaN
Name: 0, dtype: float64

>>> ks.rolling(2).min()
0    NaN
1    2.0
2    3.0
3    NaN
Name: 0, dtype: float64

>>> ks.rolling(2).max()
0    NaN
1    3.0
2    3.0
3    NaN
Name: 0, dtype: float64

>>> ks.rolling(2).mean()
0    NaN
1    2.5
2    3.0
3    NaN
Name: 0, dtype: float64
```

For DataFrame

```python
>>> kdf = ks.DataFrame({'a': [1, float('nan'), 3], 'b': [1.0, 2.0, 3.0]})
>>> kdf
     a    b
0  1.0  1.0
1  NaN  2.0
2  3.0  3.0

>>> kdf.rolling(2).sum()
     a    b
0  NaN  NaN
1  1.0  3.0
2  NaN  5.0

>>> kdf.rolling(2).min()
     a    b
0  NaN  NaN
1  1.0  1.0
2  NaN  2.0

>>> kdf.rolling(2).max()
     a    b
0  NaN  NaN
1  1.0  2.0
2  NaN  3.0

>>> kdf.rolling(2).mean()
     a    b
0  NaN  NaN
1  1.0  1.5
2  NaN  2.5
```

relates to #977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants