-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical column stats update #1089
Numerical column stats update #1089
Conversation
e4be13f
to
5aaf2b6
Compare
not df_series_clean.empty | ||
and df_series_clean.apply(pd.to_numeric, errors="coerce").dtype == "O" | ||
) | ||
if self._greater_than_64_bit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good design choice here I like it
if isinstance(values, (np.ndarray, list)): | ||
unique_value = values[0] | ||
else: | ||
unique_value = values.iloc[0] | ||
unique_value = values[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there is no reason for the if
/else
statement because they are doing the same thing no matter what the condition is.
df_series = pl.from_pandas(df_series) | ||
min_value = df_series.min() | ||
if self.min is not None: | ||
min_value = type(self.min)(min_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not intuitive what is happening here.... type(self.min)(min_value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is casting min_value to the current type of self.min
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed actually updating it to remove this line
batch_count, | ||
batch_biased_variance, | ||
batch_mean, | ||
self._biased_variance = np.float64( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this add np.float64(....
?
) | ||
|
||
@BaseColumnProfiler._timeit(name="skewness") | ||
def _get_skewness( | ||
self, | ||
df_series: pd.Series, | ||
df_series: pd.Series | np.ndarray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this (and following) add to account for larger than 64 bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
* Staging into `main` from `dev` (#1106) * add downloads tile (#1085) * Hot fix json bug (#1105) * update * update * update version (#1107) * add polars to requirements (#1087) * add polars to requirements * Update requirements.txt Co-authored-by: Taylor Turner <[email protected]> --------- Co-authored-by: Taylor Turner <[email protected]> * update precommit env (#1088) * Numerical column stats update (#1089) * partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting * Profiler utils update (#1092) * update profiler utils * finish updates --------- Co-authored-by: Andrew <[email protected]>
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
* partial update to numerical_column_stats * update with full polars replacement * reduce redundant if statement * fix histogram warning * remove unneeded casting
No description provided.