Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Larger files cannot be read by loadWorkbook() #227

Open
joe-roy opened this issue Oct 21, 2024 · 1 comment
Open

Bug: Larger files cannot be read by loadWorkbook() #227

joe-roy opened this issue Oct 21, 2024 · 1 comment
Labels

Comments

@joe-roy
Copy link

joe-roy commented Oct 21, 2024

sessionInfo() output

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

time zone: America/Chicago
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] magrittr_2.0.3 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[10] ggplot2_3.5.1 tidyverse_2.0.0 readxl_1.4.3 XLConnect_1.0.10

loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 stringi_1.8.4 generics_0.1.3 rJava_1.0-11 glue_1.7.0 colorspace_2.1-0 hms_1.1.3
[10] scales_1.3.0 fansi_1.0.6 grid_4.4.1 cellranger_1.1.0 munsell_0.5.1 tzdb_0.4.0 lifecycle_1.0.4 compiler_4.4.1 timechange_0.3.0
[19] pkgconfig_2.0.3 rstudioapi_0.16.0 R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4 pillar_1.9.0 tools_4.4.1 withr_3.0.0 gtable_0.3.5

Additional environment information

No response

Description

I'm reading a large-ish excel file and get an error on the loadWorkbook(). I suspect this is an upstream error for something else, but I'm unable to resolve it. I use rig to roll back to an early R-version (and library) that doesn't produce the error. Just posting something here because I suspect it's something that other packages in python have noticed about -- see this thread, for example, nightscape/spark-excel#231

Expected behavior

Error: IOException (Java): The file appears to be potentially malicious. This file embeds more internal file entries than expected.
This may indicates that the file could pose a security risk.
You can adjust this limit via ZipSecureFile.setMaxFileCount() if you need to work with files which are very large.
Limits: MAX_FILE_COUNT: 1000

How to Reproduce

The file attached is an example of one that generates this error when I use loadWorkbook() on it for the most recent version of R: https://www.dropbox.com/scl/fi/9snzbt6sjrxz9noz3y1z2/University-of-Wisconsin-Madison.xlsx?rlkey=ron57jqpzcvhd9d7w9wg7aqdj&st=wfqohp76&dl=0 (This is public data from colleges of engineering about their program submitted to us annually.)

@joe-roy joe-roy added the bug label Oct 21, 2024
@joe-roy joe-roy changed the title Bug: Bug: Larger files cannot be read by loadWorkbook() Oct 21, 2024
@spoltier
Copy link
Member

Thanks for reporting. We will consider adding an option to set the ZipSecureFile options in XLConnect prior to or when loading workbooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants