Skip to content

Commit

Permalink
Improve coverage testing of UTF-8 routines.
Browse files Browse the repository at this point in the history
  • Loading branch information
mity committed Jan 21, 2024
1 parent cd7c326 commit a3c510a
Showing 1 changed file with 89 additions and 4 deletions.
93 changes: 89 additions & 4 deletions test/coverage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,97 @@ Ditto for Unicode punctuation (here U+00A1).

## `md_decode_utf8__()` and `md_decode_utf8_before__()`

### Alphanumerical Character (i.e. not whitespace, not punctuation)

Non-whitespace & non-punctuation characters below suppress `_` from being
recognized as an emphasis because `_` should be seen as in-word character:

Example of 1-byte UTF-8 sequence (U+0058):
```````````````````````````````` example
X__foo__X
.
<p>X__foo__X</p>
````````````````````````````````

Example of 2-byte UTF-8 sequence (U+0158):
```````````````````````````````` example
Ř__foo__Ř
.
<p>Ř__foo__Ř</p>
````````````````````````````````

Example of 3-byte UTF-8 sequence (U+0BA3):
```````````````````````````````` example
ண__foo__ண
.
<p>ண__foo__ண</p>
````````````````````````````````

Example of 4-byte UTF-8 sequence (U+13142):
```````````````````````````````` example
𓅂__foo__𓅂
.
<p>𓅂__foo__𓅂</p>
````````````````````````````````

### Whitespace character

Whitespace on the other hand should not suppress `_`:

Example of 1-byte UTF-8 sequence (U+0009):
```````````````````````````````` example
x→__foo__→
.
<p>x <strong>foo</strong></p>
````````````````````````````````
(The initial `x` to suppress indented code block.)

Example of 2-byte UTF-8 sequence (U+00A0):
```````````````````````````````` example
__foo__
.
<p> <strong>foo</strong> </p>
````````````````````````````````

Example of 3-byte UTF-8 sequence (U+2000):
```````````````````````````````` example
 __foo__
.
<p> <strong>foo</strong> </p>
````````````````````````````````

(AFAIK, there is no 4-byte UTF-8 whitespace.)

### Punctuation character

Punctuation also should not suppress `_`:

Example of 1-byte UTF-8 sequence (U+002E):
```````````````````````````````` example
.__foo__.
.
<p>.<strong>foo</strong>.</p>
````````````````````````````````

Example of 2-byte UTF-8 sequence (U+00B7):
```````````````````````````````` example
·__foo__·
.
<p>·<strong>foo</strong>·</p>
````````````````````````````````

Example of 3-byte UTF-8 sequence (U+0C84):
```````````````````````````````` example
಄__foo__಄
.
<p>಄<strong>foo</strong>಄</p>
````````````````````````````````

Example of 4-byte UTF-8 sequence (U+1039F):
```````````````````````````````` example
á*Á (U+00E1, i.e. two byte UTF-8 sequence)
 *  (U+2000, i.e. three byte UTF-8 sequence)
𐎟__foo__𐎟
.
<p>á*Á (U+00E1, i.e. two byte UTF-8 sequence)
* (U+2000, i.e. three byte UTF-8 sequence)</p>
<p>𐎟<strong>foo</strong>𐎟</p>
````````````````````````````````


Expand Down

0 comments on commit a3c510a

Please sign in to comment.