Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose options to Series' cut and qcut #1007

Merged
merged 2 commits into from
Oct 23, 2024
Merged

Conversation

philss
Copy link
Member

@philss philss commented Oct 23, 2024

Closes: #1006

allow_duplicates: false,
left_close: false,
include_breaks: true
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to explain these two new options - left_close and include_breaks, so I left out of docs. But if you have something in mind, feel free to suggest :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Python docs:

  • left_closed
    • Set the intervals to be left-closed instead of right-closed.
  • include_breaks
    • Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

It's probably fine if we just forward their documentation. Here's a more in depth explanation:

left_closed

The cuts which are the main argument to this function divide the real line into discrete bins. E.g. if we passed cuts [-1, 1], we'd break up the real line into 3 bins like this (call them A, B, and C):

   A       B       C 
-------|-------|-------
      -1       1

This is a bit ambiguous though. We need to decide which bins the cuts themselves belong to. Is -1 in A or B?

By default, the bins are "right closed", meaning the bins contain their right cut but not their left. So -1 $\in$ A and 1 $\in$ B.

Including the left_closed argument flips this. Now the bins contain their left cut but not their right. So -1 $\in$ B and 1 $\in$ C.

include_breaks

This argument changes the return of the overall function. Instead of just mapping each value to the bin it's in, it also returns metadata about the bin it was mapped to. So you get a series of structs back instead of a category.

See their docs for an example of the structs returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thank you @billylanchantin!! :D
I will add these short descriptions :)

@philss philss merged commit 9845751 into main Oct 23, 2024
4 checks passed
@philss philss deleted the ps-add-more-opts-cut-qcut branch October 23, 2024 14:31
philss added a commit that referenced this pull request Oct 23, 2024
Also add docs regarding the `:left_close` and `:include_breaks` options.

Thanks @billylanchantin :D
Reference: #1007 (comment)
philss added a commit that referenced this pull request Oct 23, 2024
…1009)

* Fix `cut/3` and `qcut/3` when `:include_breaks` is false

Also add docs regarding the `:left_close` and `:include_breaks` options.

Thanks @billylanchantin :D
Reference: #1007 (comment)

* Change default of `cut/qcut` to not include breaks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

qcut error 'quantiles are not unique while allow_duplicates=False'
3 participants