-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove_empty only to last rows of last columns of data.frame #419
Comments
Thanks for the suggestion. My first impression is that this is rather special-purpose. (That is to day, I have not had a need for this in my work flow.)
If @sfirke finds it to be a good feature, I would suggest the implementation to expand the use of the |
Ok. Maybe this is too special-purpose. Let me describe the case. Let's say a spreadsheet (tab) has several tables, A, B, C. It is easy to identify the starting line of each table because each table starts with the column names. However, there is no consistent marker of the end of the tables. So I can define the end of table A as the row number for the start of table B minus 1. This will work, but table A will be imported with a bunch of empty rows at the end. These empty rows at the end are conceptually different because I know they do not contain information. Empty rows at the middle may indicate some parsing error, typing error, etc. So, in my application, I want to differentiate between them. Feel fee to disregard this suggestion and close the issue. |
I would have a related, but slightly different suggestion... my use case is an extract from a database where the first few variables are identifiers of some sort and the rest are actual data. suppose that some variables are conditional on something, in which case you would end up with something like
I need to keep the id column for merging with other data sets, so throwing that away isn't possible. But if remove_empty could be conditioned to use regex (via tidyselect perhaps or anything else) then I could do something like Admittedly, I'm not sure how relevant it would be to removing columns (maybe there are particular columns that should be retained regardless of them always being empty?) |
I have this same need, there are some fully NULL columns that I want to retain and some I want to drop (two different types of columns - one is student responses, which could possibly be fully null, while the other is a column that we don't want if it is fully null). Adding something along the lines of |
This is related to #530, and that is where future discussion is planned to occur. Thanks to all contributors to this issue. These issues seem difficult to generalize, overall, except for the specific column definition (so, switching to #530 for that). Each of these seems useful, but the context for the use would require code that seems specialized for the use case. And, with that specialization, it seems that the code would add to the code base and API complexity. If you have ideas for generalized use cases, please open a new, specific issue for those. |
tkx for the amazing package.
In my application, I extract several tables that are in the same spreadsheet (for hundreds of spreadsheets with a similar format, but some tables have more rows than others, as users add rows to tables as needed). I extract the starting points of each table by regular expressions on the in-sheet column names of each table and the end from the start of the subsequent table -1.
After extracting, several of the resulting tables (data.frames) have empty rows and at the end (or empty columns on the right). Those are conceptually different than empty rows and columns in the middle of the table.
So I would like to be able to remove just the empty cells in the bottom and right margins of the table.
Perhaps the syntax could be something like:
remove_empty(remove_bottom=TRUE)
Edit 1: This is the solution I came up with to remove rows.
It requires a data.table but you could adapt to dplyr based. The function should also not leave the
row_empty
column in its output.EDIT2: I cross-posted on this SO quesiton
The text was updated successfully, but these errors were encountered: