-
-
Notifications
You must be signed in to change notification settings - Fork 163
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Found two workarounds to get rid of re.DOTALL: (?:.|\n) [\s\S] But I also want to get rid of the nongreedy operator *? So we will probably add another primitive: find substring. I think our re2c code gen can be modified to handle that too.
- Loading branch information
Andy C
committed
Jan 5, 2025
1 parent
a40c0eb
commit 10ec7ae
Showing
4 changed files
with
99 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Usage: | ||
# data_lang/htm8-test.sh | ||
|
||
: ${LIB_OSH=stdlib/osh} | ||
source $LIB_OSH/bash-strict.sh | ||
source $LIB_OSH/task-five.sh | ||
|
||
# parse with lazylex/html.py, or data_lang/htm8.py | ||
|
||
site-files() { | ||
find ../../oilshell/oilshell.org__deploy -name '*.html' | ||
} | ||
|
||
# Issues with lazylex/html.py | ||
# | ||
# - Token ID is annoying to express in Python | ||
# - re.DOTALL for newlines | ||
# - can we change that with [.\n]*? | ||
# - nongreedy match for --> and ?> | ||
|
||
|
||
test-site() { | ||
# 1.5 M lines of HTML - takes 3 xargs invocations! | ||
# | ||
# TODO: | ||
# - test that it lexes | ||
# - test that tags are balanced | ||
|
||
site-files | xargs wc -l | ||
} | ||
|
||
test-wwz() { | ||
echo 'TODO: download .wwz from CI' | ||
} | ||
|
||
task-five "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters