Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading data produced using a11ytables #90

Open
jack-davison opened this issue Dec 7, 2022 · 2 comments
Open

Reading data produced using a11ytables #90

jack-davison opened this issue Dec 7, 2022 · 2 comments
Labels
could MoSCoW priority ('on ice') discuss Point for discussion enhancement New feature or request
Milestone

Comments

@jack-davison
Copy link

Hi Matt,

Thanks very much for this package. I saw your talk at EARL2022 on {a11ytables} but didn't get the opportunity to ask the following.

We often use UK government statistics in the format described and, while the format is accessible in a spreadsheet, it can be painful to read it into R for analysis (data doesn't always start on the same line, there can be multiple independent data tables per sheet, multiple sheets per file, etc.). Often a more code-friendly/tidy format (e.g., a simple csv) just isn't available, so we've had to DIY solutions to iterate over different sheets, detect where the data is when there are multiple tables per sheet, and so on.

Is it on the roadmap to write a read_allytable() function to do the inverse of the current package functionality, i.e., take an accessible spreadsheet saved locally and turn it back into a list of tidy tibbles in R?

Cheers,
Jack

@matt-dray
Copy link
Collaborator

matt-dray commented Dec 9, 2022

Thanks for getting in touch, Jack. This is a great question and I like the idea.

We'd need to detect whether a given spreadsheet was created:

  1. With {a11ytables} or gptables (the Python analogue package) or meets best-practice guidance without these packages (definitely in scope)
  2. With these packages, but the output has been adjusted slightly (might be tricky, depending on how off-piste the changes are)
  3. Without these packages, and doesn't meet the guidance (out of scope)

At simplest, we could detect the tabs containing tables—or provide an argument to users to specify them—and then extract the table(s) alone. however, it might be useful to output a list object where each element represents a tab and we have elements for the title, presence of notes, the table(s), etc.

Related: in case you haven't seen them before, our colleague Duncan (@nacnudus) has written some great packages, {tidyxl} and {unpivotr}, for general-purpose spreadsheet parsing and wrangling. There's an associated online book too.

@matt-dray matt-dray added enhancement New feature or request discuss Point for discussion code labels Dec 9, 2022
@matt-dray matt-dray removed the code label Dec 27, 2023
@matt-dray
Copy link
Collaborator

Another approach: use Fran's package {odsTableReadr} to identify each of the tables in such a workbook and then extract meta-information from cell A1 to the row just above the start of a table.

@matt-dray matt-dray added this to the Backlog milestone May 27, 2024
@matt-dray matt-dray added the could MoSCoW priority ('on ice') label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could MoSCoW priority ('on ice') discuss Point for discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants