Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow columns to be selected using regular expressions #157

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sd2k
Copy link

@sd2k sd2k commented Oct 29, 2018

This allows a selector such as r'a[bc]?' to be used, which would
select columns 'a', 'ab', 'ac' but not 'ad'. This can be used in
the 'xsv select' command as well as anywhere that the '--select' flag
appears.

The interface is different to the one discussed in #155 because I couldn't think of an easy way to include the regex flag when deserializeing the SelectColumns. It matches the regular expression string syntax from Python at the minute (i.e. prefixing a string with r), but it's a little ugly on the shell due to the need for two sets of strings :(

Here's some examples:

 ➜  xsv git:(155-select-column...) cat test.csv
a,ab,ac,ad,be,bf,bg
0,1,2,3,4,5,6
 ➜  xsv git:(155-select-column...) target/debug/xsv select r"'a[bc]'" test.csv
ab,ac
1,2
 ➜  xsv git:(155-select-column...) target/debug/xsv frequency -s "r'a[bc]?$'"  test.csv
field,value,count
a,0,1
ab,1,1
ac,2,1

This allows a selector such as r'a[bc]?' to be used, which would
select columns 'a', 'ab', 'ac' but not 'ad'. This can be used in
the 'xsv select' command as well as anywhere that the '--select' flag
appears.
@sd2k
Copy link
Author

sd2k commented Nov 7, 2018

BTW I'm not super happy with the user interface here - would happily receive tips on how to implement regex matching as a flag.

@mintyplanet
Copy link
Contributor

I think detecting the regex syntax instead of a flag is a reasonable approach, rather than introducing a new flag for each of the subcommands that use --select.

May I suggest the regex syntax used in awk,sed? i.e. xsv frequency -s '/a[bc]?$/' test.csv
I think that would avoid the need for nested quotes.

@sd2k
Copy link
Author

sd2k commented Nov 19, 2018

@mintyplanet Yes, I think I prefer that syntax too. I've pushed a new commit switching to that instead.

@CMCDragonkai
Copy link

I'd like to extend this idea to having replacement capability. Basically a sort of sed but for CSVs. I currently require the ability to select a particular column and apply a regex replace to each column value. I'm sure this could be extended to this.

Copy link
Contributor

@mintyplanet mintyplanet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to update the usage text in src/cmd/select.rs to describe regex selectors.

src/select.rs Outdated Show resolved Hide resolved
src/select.rs Outdated Show resolved Hide resolved
tests/test_select.rs Outdated Show resolved Hide resolved
@sd2k
Copy link
Author

sd2k commented Nov 24, 2018

Thanks for the review!

Copy link
Contributor

@mintyplanet mintyplanet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@mintyplanet
Copy link
Contributor

Please update the usage documentation in src/cmd/select.rs to describe regex selectors.

@mintyplanet
Copy link
Contributor

@CMCDragonkai Your idea seems sufficiently different from this PR that I would recommend you create a new Issue. If you could provide an example with inputs, outputs and hypothetical xsv command/flag, that would be helpful.

@sd2k
Copy link
Author

sd2k commented Nov 25, 2018

Let me know if you'd like this rebasing 👍

@sd2k
Copy link
Author

sd2k commented Dec 18, 2018

@mintyplanet @BurntSushi is this mergeable, in your eyes? If not, what changes would you like to see first? Thanks!

@mintyplanet
Copy link
Contributor

Looks good to me, but I don't have commit rights.
The PR will need to be approved by @BurntSushi

@matt-morris
Copy link

This is fantastic -- just what I was looking for. Is there anything we can do to help get this merged in?

@trueter

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants