Reading data produced using a11ytables #90

jack-davison · 2022-12-07T15:46:16Z

Hi Matt,

Thanks very much for this package. I saw your talk at EARL2022 on {a11ytables} but didn't get the opportunity to ask the following.

We often use UK government statistics in the format described and, while the format is accessible in a spreadsheet, it can be painful to read it into R for analysis (data doesn't always start on the same line, there can be multiple independent data tables per sheet, multiple sheets per file, etc.). Often a more code-friendly/tidy format (e.g., a simple csv) just isn't available, so we've had to DIY solutions to iterate over different sheets, detect where the data is when there are multiple tables per sheet, and so on.

Is it on the roadmap to write a read_allytable() function to do the inverse of the current package functionality, i.e., take an accessible spreadsheet saved locally and turn it back into a list of tidy tibbles in R?

Cheers,
Jack

The text was updated successfully, but these errors were encountered:

matt-dray · 2022-12-09T13:23:28Z

Thanks for getting in touch, Jack. This is a great question and I like the idea.

We'd need to detect whether a given spreadsheet was created:

With {a11ytables} or gptables (the Python analogue package) or meets best-practice guidance without these packages (definitely in scope)
With these packages, but the output has been adjusted slightly (might be tricky, depending on how off-piste the changes are)
Without these packages, and doesn't meet the guidance (out of scope)

At simplest, we could detect the tabs containing tables—or provide an argument to users to specify them—and then extract the table(s) alone. however, it might be useful to output a list object where each element represents a tab and we have elements for the title, presence of notes, the table(s), etc.

Related: in case you haven't seen them before, our colleague Duncan (@nacnudus) has written some great packages, {tidyxl} and {unpivotr}, for general-purpose spreadsheet parsing and wrangling. There's an associated online book too.

matt-dray · 2024-03-20T22:54:42Z

Another approach: use Fran's package {odsTableReadr} to identify each of the tables in such a workbook and then extract meta-information from cell A1 to the row just above the start of a table.

matt-dray added enhancement New feature or request discuss Point for discussion code labels Dec 9, 2022

matt-dray removed the code label Dec 27, 2023

matt-dray added this to the Backlog milestone May 27, 2024

matt-dray added the could MoSCoW priority ('on ice') label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading data produced using a11ytables #90

Reading data produced using a11ytables #90

jack-davison commented Dec 7, 2022

matt-dray commented Dec 9, 2022 •

edited

Loading

matt-dray commented Mar 20, 2024

Reading data produced using a11ytables #90

Reading data produced using a11ytables #90

Comments

jack-davison commented Dec 7, 2022

matt-dray commented Dec 9, 2022 • edited Loading

matt-dray commented Mar 20, 2024

matt-dray commented Dec 9, 2022 •

edited

Loading