Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list index out of range when comparing more than 2 dataframes #1652

Open
3 tasks done
alikleit opened this issue Sep 26, 2024 · 1 comment
Open
3 tasks done

list index out of range when comparing more than 2 dataframes #1652

alikleit opened this issue Sep 26, 2024 · 1 comment
Labels

Comments

@alikleit
Copy link

alikleit commented Sep 26, 2024

Current Behaviour

The following example is from the docs (modified a bit for testing against my files):

from ydata_profiling import ProfileReport, compare

train_df = pd.read_excel("datasets/Train.xlsx")
train_report = ProfileReport(train_df, title="Train")

test_df = pd.read_excel("datasets/Test.xlsx")
test_report = ProfileReport(test_df, title="Test")

validation_df = pd.read_excel("datasets/Test.xlsx")
validation_report = ProfileReport(validation_df, title="Valid")


comparison_report = compare([train_report, validation_report, test_report])

# Obtain merged statistics
statistics = comparison_report.get_description()

# Save report to file
comparison_report.to_notebook_iframe()
# comparison_report.to_file("reports/comparison.html")

Expected Behaviour

To Generate a comparison report

Data Description

Files:
Test.xlsx
Train.xlsx

Code that reproduces the bug

Tried Both:

comparison_report.to_notebook_iframe()
# comparison_report.to_file("reports/comparison.html")


Throws:
```py
IndexError                                Traceback (most recent call last)
Cell In[42], line 19
     16 statistics = comparison_report.get_description()
     18 # Save report to file
---> 19 comparison_report.to_notebook_iframe()
     20 # comparison_report.to_file("reports/comparison.html")

.venv/lib/python3.11/site-packages/ydata_profiling/profile_report.py:526, in ProfileReport.to_notebook_iframe(self)
    524 with warnings.catch_warnings():
    525     warnings.simplefilter("ignore")
--> 526     display(get_notebook_iframe(self.config, self))

.venv/lib/python3.11/site-packages/ydata_profiling/report/presentation/flavours/widget/notebook.py:75, in get_notebook_iframe(config, profile)
     73     output = get_notebook_iframe_src(config, profile)
     74 elif attribute == IframeAttribute.srcdoc:
---> 75     output = get_notebook_iframe_srcdoc(config, profile)
     76 else:
     77     raise ValueError(
     78         f'Iframe Attribute can be "src" or "srcdoc" (current: {attribute}).'
     79     )

.venv/lib/python3.11/site-packages/ydata_profiling/report/presentation/flavours/widget/notebook.py:29, in get_notebook_iframe_srcdoc(config, profile)
     27 width = config.notebook.iframe.width
     28 height = config.notebook.iframe.height
...
     89             alpha=0.6,
     90         )
     92     if date:



IndexError: list index out of range

pandas-profiling version --> (ydata-profiling version)

v4.10.0

Dependencies

python = "~3.11"
blackcellmagic = "^0.0.3"
ruff = "^0.6.1"
ydata-profiling = "^4.10.0"
numpy = "2.0.1"
openpyxl = "^3.1.5"

OS

WSL Ubuntu

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@fabclmnt
Copy link
Contributor

Hi @alikleit ,

as mentioned in the docs Note (end of the page) , the compare functionality only ensure the report comparison for 2 datasets.

Meaning that only the statistics are calculated whenever 3 datasets are provided. The html functionality might work, but not guaranteed. Iframe won't work.

My suggestion is to either leverage the statistics calculated or change the code to compare 2 datasets at a time.

Any other questions let us know.

@fabclmnt fabclmnt added question/discussion ❓ Open dicussion and removed needs-triage labels Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants