-
-
Notifications
You must be signed in to change notification settings - Fork 163
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
For XML mode too. [doc] Start HTM8 definition
- Loading branch information
Andy C
committed
Jan 11, 2025
1 parent
7f7bd39
commit 71c791e
Showing
5 changed files
with
260 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,6 +101,7 @@ readonly MARKDOWN_DOCS=( | |
qsn | ||
qtt | ||
j8-notation | ||
htm8 | ||
# Protocol | ||
pretty-printing | ||
stream-table-process | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
--- | ||
in_progress: yes | ||
default_highlighter: oils-sh | ||
--- | ||
|
||
HTM8 - Efficient HTML with Errors | ||
================================= | ||
|
||
- Syntax Errors: It's a Subset | ||
- Efficient | ||
- Easy to Remember | ||
- Easy to Implement | ||
- Runs Efficiently - you don't have to materialize a big DOM tree, which | ||
causes many allocations | ||
|
||
<div id="toc"> | ||
</div> | ||
|
||
## Basic Structure | ||
|
||
### Text Content | ||
|
||
Anything except `&` and `<`. | ||
|
||
These must be `&` and `<`. | ||
|
||
`>` is allowed, or you can escape it with `>`. | ||
|
||
### 3 Kinds of Character Code | ||
|
||
1. `&` - named | ||
1. `ϧ` - decimal | ||
1. `ÿ` - hex | ||
|
||
### 3 Kinds of Tag | ||
|
||
1. Start | ||
1. End | ||
1. StartEnd | ||
|
||
### 2 Kinds of Attribute | ||
|
||
1. Unquoted | ||
1. Quoted | ||
|
||
### 2 Kinds of Comment | ||
|
||
1. `<!-- -->` | ||
1. `<? ?>` (XML processing instruction) | ||
|
||
|
||
## Special Rules, From HTML | ||
|
||
### 2 Tags Cause Special Lexing | ||
|
||
- `<script> <style>` | ||
|
||
Note: we still have CDATA for compatibility. | ||
|
||
|
||
### 16 VOID Tags Change Parsing | ||
|
||
- `<source> ...` | ||
|
||
### Bonus: XML Mode | ||
|
||
- Get rid of the 2 special lexing tags, and 16 VOID tags | ||
|
||
Then you can query HTML | ||
|
||
|
||
## Under the Hood | ||
|
||
### 3 Layers of Lexing | ||
|
||
1. Tag | ||
1. Attributes within a Tag | ||
1. Quoted Value for Attributes | ||
|
||
## What Do You Use This for? | ||
|
||
- Stripping comments | ||
- Adding TOC | ||
- Syntax highlighting code | ||
- Adding links shortcuts | ||
- ul-table | ||
|
||
TODO: | ||
|
||
- DOM API on top of it | ||
- node.elementsByTag('p') | ||
- node.elementsByClassName('left') | ||
- node.elementByID('foo') | ||
- innerHTML() outerHTML() | ||
- tag attrs | ||
- low level: | ||
- outerLeft, outerRight, innerLeft, innerRight | ||
- CSS Selectors - `querySelectorAll()` | ||
- sed-like model | ||
|
||
## Algorithms | ||
|
||
### Emitting HTM8 as HTML5 | ||
|
||
Just emit it! This always works, by design. | ||
|
||
### Parsing XML | ||
|
||
- Set `NO_SPECIAL_TAGS` | ||
|
||
### Converting to XML? | ||
|
||
- Always quote all attributes | ||
- Always quote `>` - are we alloxing this in HX8? | ||
- Do something with `<script>` and `<style>` | ||
- I guess turn them into normal tags, with escaping? | ||
- Or maybe just disallow them? | ||
- Maybe validate any other declarations, like `<!DOCTYPE foo>` | ||
- Add XML header `<?xml version=>`, remove `<!DOCTYPE html>` | ||
|
||
## Related | ||
|
||
- [ysh-doc-processing.html](ysh-doc-processing.html) | ||
- [table-object-doc.html](table-object-doc.html) | ||
|
||
## FAQ | ||
|
||
### What Doesn't This Cover? | ||
|
||
- single-quoted attributes? | ||
- We should probably add those, it shouldn't be hard? | ||
|
||
- Encodings other than UTF-8. HTM8 is always UTF-8. | ||
- Unicode Tag names and attribute names. | ||
- This is allowed in HTML5 and XML. | ||
- We leave those out for simpler lexing. Text and attribute values may be unicode. | ||
|
||
There are 5 kinds of tags: | ||
|
||
- Normal HTML tags | ||
- RCDATA for `<title> <textarea>` | ||
- RAWTEXT `<style> <xmp> <iframe>` ? | ||
|
||
and we have | ||
|
||
- CDATA `<script>` | ||
- TODO: we need a test case for `</script>` in a string literal? | ||
- Foreign `<math> <svg>` - XML rules | ||
|
||
## TODO | ||
|
||
- `<svg>` and `<math>` are foreign XML content? Doh | ||
- So I can just switch to XML mode in that case | ||
- TODO: we need a test corpus for this! | ||
- maybe look for wikipedia content | ||
- can we also just disallow these? Can you make these into external XML files? | ||
|
||
This is one way: | ||
|
||
<object data="math.xml" type="application/mathml+xml"></object> | ||
<object data="drawing.xml" type="image/svg+xml"></object> | ||
|
||
Then we don't need special parsing? | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters