feat(python): Write data at table level in `write_excel` #17757

mcrumiller · 2024-07-20T19:18:42Z

Resolves #17756.

Previously, write_column was used to write data to the table, which writes formatting to each cell individually. This update writes the data in the add_table step, which is much simpler, and which also supplies formatting to the table, which in turn formats the columns. The result is simpler data write, simpler formatting, and (very minor) reduction in file size.

alexander-beedie · 2024-07-20T19:44:15Z

Ahh... the contention is that write_column actually writes formats at the cell-level? I'm fine with the change if you can make it work - the failing test implies that those formats are not being applied properly for datetime columns (I eyeballed the sheet output and the datetime col "f" looks like a date 🤔)

py-polars/polars/dataframe/frame.py

Co-authored-by: Alexander Beedie <[email protected]>

mcrumiller · 2024-07-20T19:56:03Z

write_column actually writes formats at the cell-level

Yeah it does (which is a bit weird, no?). I'm looking into the date formatting issue, it's coming back as YYYY-MM-DD regardless of output format. I'm off to see a movie now but I'll look into this later.

alexander-beedie · 2024-07-20T20:13:33Z

Yeah it does (which is a bit weird, no?).

It's coming back to me now; write_column decomposes to cell-level writes, therefore also cell-level formatting: https://xlsxwriter.readthedocs.io/worksheet.html#write_column

I'm looking into the date formatting issue, it's coming back as YYYY-MM-DD regardless of output format. I'm off to see a movie now but I'll look into this later.

Have fun ;) If you can find a way to address this issue then it looks like a sensible update to me.

mcrumiller · 2024-07-21T18:13:53Z

@alexander-beedie I figured it out, and might be useful info for you in the future, and doesn't really make sense hence why it's confusing: when the workbook's default_date_format is supplied, it overrides column-level date format unless the data is supplied with the table. See these examples, three attempts, first two fail.

(failure) Write data (with no table), set column format

from datetime import date
import xlsxwriter

# specify default workbook date format
wb = xlsxwriter.Workbook("date_format.xlsx", {"default_date_format": "yyyy.dd.mm"})
ws = wb.add_worksheet("Date")

data = [date(2024, 1, 1), date(2024, 1, 2)]
ws.write_column(0, 0, [date(2024, 1, 1), date(2024, 1, 2)])

# assign column format (has no effect)
date_format = wb.add_format({"num_format": "mm-dd-yyyy"})
ws.set_column(0, 0, 10, date_format)

wb.close()

Result: column format completely ignored.

(failure) Write empty table, then write data, then set column format

Next, if we set the default date format, and use add_table but add the data later, it still fails, even if we supply a format:

from datetime import date
import xlsxwriter

wb = xlsxwriter.Workbook("date_format.xlsx", {"default_date_format": "yyyy.dd.mm"})
ws = wb.add_worksheet("Date")

data = [[date(2024, 1, 1)], [date(2024, 1, 2)]]
date_format = wb.add_format({"num_format": "mm-dd-yyyy"})
ws.add_table("A1:A2", options={
    "header_row": False,
    "columns": [{"format": date_format}]
})

# write the data afterwards
ws.write_column(0, 0, [date(2024, 1, 1), date(2024, 1, 2)])
ws.set_column(0, 0, 10, date_format)

wb.close()

same thing:

(success) Supply the data with the table.

If we supply the data to the add_table function, our formatting is finally applied:

from datetime import date
import xlsxwriter

wb = xlsxwriter.Workbook("date_format.xlsx", {"default_date_format": "yyyy.dd.mm"})
ws = wb.add_worksheet("Date")

data = [[date(2024, 1, 1)], [date(2024, 1, 2)]]
date_format = wb.add_format({"num_format": "mm-dd-yyyy"})
ws.add_table(
    "A1:A2",
    options={
        "data": data,  # !!! SEE HERE !!
        "header_row": False,
        "columns": [{"format": date_format}],
    },
)

wb.close()

Moving the data write to the add_table function simplifies everything here (removes a loop) and solves the issue.

codecov · 2024-07-21T18:40:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.53%. Comparing base (1df3b0b) to head (038c66d).
Report is 48 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #17757      +/-   ##
==========================================
+ Coverage   80.40%   80.53%   +0.12%     
==========================================
  Files        1502     1503       +1     
  Lines      197041   197026      -15     
  Branches     2794     2800       +6     
==========================================
+ Hits       158439   158676     +237     
+ Misses      38088    37830     -258     
- Partials      514      520       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alexander-beedie · 2024-07-22T12:19:44Z

Moving the data write to the add_table function simplifies everything here (removes a loop) and solves the issue.

Nice discovery - any potential downsides to this approach? The tests can't really validate formatting, so do we see the expected formatting when eyeballing some of the more sophisticated examples manually? (I'll start double-checking some ;)

mcrumiller · 2024-07-22T12:29:24Z

I don't think so. The "heavily customized formatting/definition" parameter set in test_spreadsheet.test_excel_round_trip() still produces the same result, which is this:

which includes the top/bottom formatting, column-specific formatting, and dtype-specific formatting.

Also, I left in the column-formatting section which is probably not strictly necessary any more, I but I feel probably can't hurt if someone decides to manually add data below the table for whatever reason.

alexander-beedie

Hmm, I've tested on some other cases and it looks like the new code is not behaving quite the same in several instances - specifically, issues with the table header.

For example:

pl.DataFrame({
    "id": ["a123", "b345", "c567", "d789"],
    "values": [99, 45, 50, 85],
    "misc": [1.2, 3.4, 5.6, 7.8],
}).write_excel(
    "~/output.xlsx",
    table_style={"style": "Table Style Medium 15"},
)

With this patch the second and third column header names end up black; they are present, but do not conform to the given table_style, which defines them as bold/white, so they look like they aren't there 🤔

Before:

After:

Will need some further tinkering :)

(Interestingly there is also a 2-pixel per column increase in column width with the new code, though that's not important - just an odd observation!)

mcrumiller · 2024-07-23T13:27:46Z

Thanks--I'll take a look at this tonight and do some more thorough testing. I wonder if applying the column-level formatting overrides the table formatting, since the numerical columns are applying a num_format (which defaults to black) whereas the id column is not. If that's the case, the fix is to probably not apply the column-level formatting and leave it to the table formatter.

alexander-beedie · 2024-07-23T14:25:20Z

Excel is a dark art 😆

mcrumiller · 2024-07-23T22:18:25Z

@alexander-beedie I reproduced, and indeed removing the column formatting fixed it.

I think that we should simply leave the formatting to add_table, which seems to be sufficient and simplest. I'll rename the PR since now we're not actually touching the column-level format.

alexander-beedie · 2024-07-24T03:59:53Z

@alexander-beedie I reproduced, and indeed removing the column formatting fixed it.

Nice; will take another run through it today 👍

alexander-beedie · 2024-07-25T11:14:14Z

Looks good :)

Write format at column level

f1e3232

mcrumiller requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli and reswqa as code owners July 20, 2024 19:18

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jul 20, 2024

mcrumiller marked this pull request as draft July 20, 2024 19:35

alexander-beedie reviewed Jul 20, 2024

View reviewed changes

py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved

Ensure fmt applied

788e80e

Co-authored-by: Alexander Beedie <[email protected]>

Move data write to add_table

522610a

mcrumiller marked this pull request as ready for review July 21, 2024 18:22

mcrumiller changed the title ~~feat(python): Write format at column level~~ feat(python): Write format at column level in write_excel Jul 21, 2024

alexander-beedie requested changes Jul 23, 2024

View reviewed changes

Don't set column format

038c66d

mcrumiller changed the title ~~feat(python): Write format at column level in write_excel~~ feat(python): Write format at table level in write_excel Jul 23, 2024

mcrumiller changed the title ~~feat(python): Write format at table level in write_excel~~ feat(python): Write data at table level in write_excel Jul 24, 2024

alexander-beedie added the A-io-spreadsheet Area: reading/writing Excel/ODS files label Jul 25, 2024

alexander-beedie approved these changes Jul 25, 2024

View reviewed changes

alexander-beedie merged commit 3016c07 into pola-rs:main Jul 25, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): Write data at table level in `write_excel` #17757

feat(python): Write data at table level in `write_excel` #17757

mcrumiller commented Jul 20, 2024 •

edited

Loading

alexander-beedie commented Jul 20, 2024 •

edited

Loading

mcrumiller commented Jul 20, 2024 •

edited

Loading

alexander-beedie commented Jul 20, 2024

mcrumiller commented Jul 21, 2024 •

edited

Loading

codecov bot commented Jul 21, 2024 •

edited

Loading

alexander-beedie commented Jul 22, 2024 •

edited

Loading

mcrumiller commented Jul 22, 2024

alexander-beedie left a comment •

edited

Loading

mcrumiller commented Jul 23, 2024 •

edited

Loading

alexander-beedie commented Jul 23, 2024 •

edited

Loading

mcrumiller commented Jul 23, 2024 •

edited

Loading

alexander-beedie commented Jul 24, 2024

alexander-beedie commented Jul 25, 2024

feat(python): Write data at table level in write_excel #17757

feat(python): Write data at table level in write_excel #17757

Conversation

mcrumiller commented Jul 20, 2024 • edited Loading

alexander-beedie commented Jul 20, 2024 • edited Loading

mcrumiller commented Jul 20, 2024 • edited Loading

alexander-beedie commented Jul 20, 2024

mcrumiller commented Jul 21, 2024 • edited Loading

(failure) Write data (with no table), set column format

(failure) Write empty table, then write data, then set column format

(success) Supply the data with the table.

codecov bot commented Jul 21, 2024 • edited Loading

Codecov Report

alexander-beedie commented Jul 22, 2024 • edited Loading

mcrumiller commented Jul 22, 2024

alexander-beedie left a comment • edited Loading

Choose a reason for hiding this comment

mcrumiller commented Jul 23, 2024 • edited Loading

alexander-beedie commented Jul 23, 2024 • edited Loading

mcrumiller commented Jul 23, 2024 • edited Loading

alexander-beedie commented Jul 24, 2024

alexander-beedie commented Jul 25, 2024

feat(python): Write data at table level in `write_excel` #17757

feat(python): Write data at table level in `write_excel` #17757

mcrumiller commented Jul 20, 2024 •

edited

Loading

alexander-beedie commented Jul 20, 2024 •

edited

Loading

mcrumiller commented Jul 20, 2024 •

edited

Loading

mcrumiller commented Jul 21, 2024 •

edited

Loading

codecov bot commented Jul 21, 2024 •

edited

Loading

alexander-beedie commented Jul 22, 2024 •

edited

Loading

alexander-beedie left a comment •

edited

Loading

mcrumiller commented Jul 23, 2024 •

edited

Loading

alexander-beedie commented Jul 23, 2024 •

edited

Loading

mcrumiller commented Jul 23, 2024 •

edited

Loading