Skip to content

Commit

Permalink
test/data submodule, -c/-C (colorized output), /dev/null handli…
Browse files Browse the repository at this point in the history
…ng, GHA tests
  • Loading branch information
ryan-williams committed Dec 30, 2024
1 parent f9b80e4 commit 210bd36
Show file tree
Hide file tree
Showing 7 changed files with 220 additions and 33 deletions.
60 changes: 54 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,62 @@
name: Release to PyPI
name: Verify README examples, release to PyPI
on:
push:
# branches: [ "main" ]
branches: [ "main" ]
tags: [ "v**" ]
# pull_request:
# branches:
# - "**"
pull_request:
branches: [ "main" ]
workflow_dispatch:
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
jobs:
test:
name: Verify README examples
runs-on: ubuntu-latest
steps:
- run: |
echo $PATH
which which
- env:
PATH: foo:${{ github.env.PATH }}
run: |
echo $PATH
which which
- uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: true
- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: pip
cache-dependency-path: 'requirements*.txt'
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo install parquet2json
- name: Install dvc-utils
run: pip install -e . -r requirements-ci.txt
- name: '`dvc pull` test/data'
working-directory: test/data
run: dvc pull -r s3 -R -A
- name: Set up parquet-helpers
uses: actions/checkout@v4
with:
repository: ryan-williams/parquet-helpers
path: pqt
- name: Verify README examples
env:
# Evaluate README examples from within the `test/data` submodule
BMDF_WORKDIR: test/data
PATH: pqt:${{ env.PATH }}
run: |
. pqt/.pqt-rc
export SHELL
echo "PATH=$PATH"
# mdcmd
# git diff --exit-code || true
release:
name: Release
name: Release to PyPI
if: startsWith(github.ref, 'refs/tags/')
runs-on: ubuntu-latest
steps:
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "test/data"]
path = test/data
url = https://github.com/ryan-williams/dvc-helpers
165 changes: 149 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,25 +55,146 @@ dvc-diff --help
# optional) at HEAD (last committed value) vs. the current worktree content.
#
# Options:
# -c, --color Colorize the output
# -r, --refspec TEXT <commit 1>..<commit 2> (compare two commits) or
# <commit> (compare <commit> to the worktree)
# -s, --shell-executable TEXT Shell to use for executing commands; defaults
# to $SHELL (/bin/bash)
# -S, --no-shell Don't pass `shell=True` to Python
# `subprocess`es
# -U, --unified INTEGER Number of lines of context to show (passes
# through to `diff`)
# -v, --verbose Log intermediate commands to stderr
# -w, --ignore-whitespace Ignore whitespace differences (pass `-w` to
# `diff`)
# -x, --exec-cmd TEXT Command(s) to execute before diffing; alternate
# syntax to passing commands as positional
# arguments
# --help Show this message and exit.
# -c, --color / -C, --no-color Force or prevent colorized output
# -r, --refspec TEXT <commit 1>..<commit 2> (compare two commits)
# or <commit> (compare <commit> to the worktree)
# -R, --ref TEXT Shorthand for `-r <ref>^..<ref>`, i.e. inspect
# a specific commit (vs. its parent)
# -s, --shell-executable TEXT Shell to use for executing commands; defaults
# to $SHELL
# -S, --no-shell Don't pass `shell=True` to Python
# `subprocess`es
# -U, --unified INTEGER Number of lines of context to show (passes
# through to `diff`)
# -v, --verbose Log intermediate commands to stderr
# -w, --ignore-whitespace Ignore whitespace differences (pass `-w` to
# `diff`)
# -x, --exec-cmd TEXT Command(s) to execute before diffing;
# alternate syntax to passing commands as
# positional arguments
# --help Show this message and exit.
```

## Examples <a id="examples"></a>
These examples are verified with [`mdcmd`] and `$BMDF_WORKDIR=test/data`

([`test/data`] is a clone of [ryan-williams/dvc-helpers@test], which contains simple DVC-tracked files used for testing [`git-diff-dvc.sh`])

[`8ec2060`] added a DVC-tracked text file, `test.txt`:

<!-- `bmdf -- dvc-diff -R 8ec2060 test.txt` -->
```bash
dvc-diff -R 8ec2060 test.txt
# 0a1,10
# > 1
# > 2
# > 3
# > 4
# > 5
# > 6
# > 7
# > 8
# > 9
# > 10
```

[`0455b50`] appended some lines to `test.txt`:

<!-- `bmdf -- dvc-diff -R 0455b50 test.txt` -->
```bash
dvc-diff -R 0455b50 test.txt
# 10a11,15
# > 11
# > 12
# > 13
# > 14
# > 15
```

[`f92c1d2`] added `test.parquet`:

<!-- `bmdf -- dvc-diff -R f92c1d2 pqa test.parquet` -->
```bash
dvc-diff -R f92c1d2 pqa test.parquet
# 0a1,27
# > MD5: 4379600b26647a50dfcd0daa824e8219
# > 1635 bytes
# > 5 rows
# > message schema {
# > OPTIONAL INT64 num;
# > OPTIONAL BYTE_ARRAY str (STRING);
# > }
# > {
# > "num": 111,
# > "str": "aaa"
# > }
# > {
# > "num": 222,
# > "str": "bbb"
# > }
# > {
# > "num": 333,
# > "str": "ccc"
# > }
# > {
# > "num": 444,
# > "str": "ddd"
# > }
# > {
# > "num": 555,
# > "str": "eee"
# > }
```

[`f29e52a`] updated `test.parquet`:

<!-- `bmdf -- dvc-diff -R f29e52a pqa test.parquet` -->
```bash
dvc-diff -R f29e52a pqa test.parquet
# 1,3c1,3
# < MD5: 4379600b26647a50dfcd0daa824e8219
# < 1635 bytes
# < 5 rows
# ---
# > MD5: be082c87786f3364ca9efec061a3cc21
# > 1622 bytes
# > 8 rows
# 5c5
# < OPTIONAL INT64 num;
# ---
# > OPTIONAL INT32 num;
# 26a27,38
# > }
# > {
# > "num": 666,
# > "str": "fff"
# > }
# > {
# > "num": 777,
# > "str": "ggg"
# > }
# > {
# > "num": 888,
# > "str": "hhh"
```

[`3257258`] added a DVC-tracked directory `data/`, including `test.{txt,parquet}`), and removed the top-level `test.{txt,parquet}`.

<!-- `bmdf -- dvc-diff -R 3257258 data` -->
```bash
dvc-diff -R 3257258 data
# test.parquet: None -> c07bba3fae2b64207aa92f422506e4a2
# test.txt: None -> e20b902b49a98b1a05ed62804c757f94
```

[`ae8638a`] changed values in `data/test.parquet`, and added rows to `data/test.txt`:

<!-- `bmdf -- dvc-diff -R ae8638a data` -->
```bash
dvc-diff -R ae8638a data
# test.parquet: c07bba3fae2b64207aa92f422506e4a2 -> f46dd86f608b1dc00993056c9fc55e6e
# test.txt: e20b902b49a98b1a05ed62804c757f94 -> 9306ec0709cc72558045559ada26573b
```

### Parquet <a id="parquet-diff"></a>
See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
Expand Down Expand Up @@ -323,3 +444,15 @@ This helped me see that the data update in question (`c0..c1`) dropped some fiel
[`kcr`]: https://github.com/ryan-williams/arg-helpers/blob/a8c60809f8878fa38b3c03614778fcf29132538e/.arg-rc#L118
[`snc`]: https://github.com/ryan-williams/case-helpers/blob/c40a62a9656f0d52d68fb3a108ae6bb3eed3c7bd/.case-rc#L9
[`sdf`]: https://github.com/ryan-williams/arg-helpers/blob/a8c60809f8878fa38b3c03614778fcf29132538e/.arg-rc#L138

[`mdcmd`]: https://github.com/runsascoded/bash-markdown-fence?tab=readme-ov-file#bmdf
[`test/data`]: test/data
[ryan-williams/dvc-helpers@test]: https://github.com/ryan-williams/dvc-helpers/tree/test
[`git-diff-dvc.sh`]: https://github.com/ryan-williams/dvc-helpers/blob/main/git-diff-dvc.sh

[`8ec2060`]: https://github.com/ryan-williams/dvc-helpers/commit/8ec2060
[`0455b50`]: https://github.com/ryan-williams/dvc-helpers/commit/0455b50
[`f92c1d2`]: https://github.com/ryan-williams/dvc-helpers/commit/f92c1d2
[`f29e52a`]: https://github.com/ryan-williams/dvc-helpers/commit/f29e52a
[`3257258`]: https://github.com/ryan-williams/dvc-helpers/commit/3257258
[`ae8638a`]: https://github.com/ryan-williams/dvc-helpers/commit/ae8638a
2 changes: 2 additions & 0 deletions requirements-ci.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
bmdf>=0.4.0
dvc-s3
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
click
pyyaml
qmdx
utz>=0.11.5
qmdx>=0.0.5
utz>=0.13.0
18 changes: 9 additions & 9 deletions src/dvc_utils/cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import json
import shlex
from os import environ as env, listdir
from os import listdir
from os.path import isdir, join
from typing import Tuple

Expand All @@ -18,18 +18,18 @@ def cli():


@cli.command('diff', short_help='Diff a DVC-tracked file at two commits (or one commit vs. current worktree), optionally passing both through another command first')
@option('-c', '--color', is_flag=True, help='Colorize the output')
@option('-c/-C', '--color/--no-color', default=None, help='Force or prevent colorized output')
@option('-r', '--refspec', help='<commit 1>..<commit 2> (compare two commits) or <commit> (compare <commit> to the worktree)')
@option('-R', '--ref', help='Shorthand for `-r <ref>^..<ref>`, i.e. inspect a specific commit (vs. its parent)')
@option('-s', '--shell-executable', help=f'Shell to use for executing commands; defaults to $SHELL ({env.get("SHELL")})')
@option('-s', '--shell-executable', help=f'Shell to use for executing commands; defaults to $SHELL')
@option('-S', '--no-shell', is_flag=True, help="Don't pass `shell=True` to Python `subprocess`es")
@option('-U', '--unified', type=int, help='Number of lines of context to show (passes through to `diff`)')
@option('-v', '--verbose', is_flag=True, help="Log intermediate commands to stderr")
@option('-w', '--ignore-whitespace', is_flag=True, help="Ignore whitespace differences (pass `-w` to `diff`)")
@option('-x', '--exec-cmd', 'exec_cmds', multiple=True, help='Command(s) to execute before diffing; alternate syntax to passing commands as positional arguments')
@argument('args', metavar='[exec_cmd...] <path>', nargs=-1)
def dvc_utils_diff(
color: bool,
color: bool | None,
refspec: str | None,
ref: str | None,
shell_executable: str | None,
Expand Down Expand Up @@ -101,12 +101,12 @@ def dvc_utils_diff(
diff_args = [
*(['-w'] if ignore_whitespace else []),
*(['-U', str(unified)] if unified is not None else []),
*(['--color=always'] if color else []),
*(['--color=always'] if color is True else ['--color=never'] if color is False else []),
]
if cmds:
cmd, *sub_cmds = cmds
cmds1 = [ f'{cmd} {path1 or "/dev/null"}', *sub_cmds ]
cmds2 = [ f'{cmd} {path2 or "/dev/null"}', *sub_cmds ]
cmds1 = [ 'cat /dev/null' ] if path1 is None else [ f'{cmd} {path1 or "/dev/null"}', *sub_cmds ]
cmds2 = [ 'cat /dev/null' ] if path2 is None else [ f'{cmd} {path2 or "/dev/null"}', *sub_cmds ]
if not shell:
cmds1 = [ shlex.split(cmd) for cmd in cmds1 ]
cmds2 = [ shlex.split(cmd) for cmd in cmds2 ]
Expand All @@ -117,10 +117,10 @@ def dvc_utils_diff(
cmds2=cmds2,
verbose=verbose,
shell=shell,
shell_executable=shell_executable,
executable=shell_executable,
)
else:
res = process.run('diff', *diff_args, path1, path2, log=log, check=False)
res = process.run('diff', *diff_args, path1 or '/dev/null', path2 or '/dev/null', log=log, check=False)
exit(res.returncode)


Expand Down
1 change: 1 addition & 0 deletions test/data
Submodule data added at f2b654

0 comments on commit 210bd36

Please sign in to comment.