Skip to content

Commit

Permalink
Merge branch 'develop' into renovate/mkdocs-table-reader-plugin-2.x
Browse files Browse the repository at this point in the history
  • Loading branch information
fabclmnt authored May 6, 2024
2 parents dd37133 + ddcb388 commit bda34bd
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ pip install -e .

The profiling report is written in HTML and CSS, which means a modern browser is required.

You need [Python 3](https://python3statement.org/) to run the package. Other dependencies can be found in the requirements files:
You need [Python 3](https://python3statement.github.io/) to run the package. Other dependencies can be found in the requirements files:

| Filename | Requirements|
|----------|-------------|
Expand Down
8 changes: 7 additions & 1 deletion src/ydata_profiling/model/spark/summary_spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,19 +87,25 @@ def multiprocess_1d(args: tuple) -> Tuple[str, dict]:
column, df = args
return column, describe_1d(config, df.select(column), summarizer, typeset)

# Rename the df column names to prevent potential conflicts
for col in df.columns:
df = df.withColumnRenamed(col, f"{col}_customer")

args = [(name, df) for name in df.columns]
with multiprocessing.pool.ThreadPool(12) as executor:
for i, (column, description) in enumerate(
executor.imap_unordered(multiprocess_1d, args)
):
if column.endswith("_customer"):
column = column[:-9]
pbar.set_postfix_str(f"Describe variable:{column}")

# summary clean up for spark
description.pop("value_counts")

series_description[column] = description
pbar.update()
series_description = {k: series_description[k] for k in df.columns}
series_description = {k[:-9]: series_description[k[:-9]] for k in df.columns}

# Mapping from column name to variable type
series_description = sort_column_names(series_description, config.sort)
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<a href="#pp_var_{{ alert.anchor_id }}"><code>{{ alert.column_name }}</code></a> has constant value "{{ alert.values['mode'] }}"
<a href="#pp_var_{{ alert.anchor_id }}"><code>{{ alert.column_name }}</code></a> has constant value "{{ alert.values['value_counts_without_nan'].index[0] }}"

0 comments on commit bda34bd

Please sign in to comment.