From a3c510ac0b81320baa06bd64d9e08b99261c63f7 Mon Sep 17 00:00:00 2001 From: Martin Mitas Date: Sun, 21 Jan 2024 14:11:47 +0100 Subject: [PATCH] Improve coverage testing of UTF-8 routines. --- test/coverage.txt | 93 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 89 insertions(+), 4 deletions(-) diff --git a/test/coverage.txt b/test/coverage.txt index 4b51ef6c..146210c0 100644 --- a/test/coverage.txt +++ b/test/coverage.txt @@ -41,12 +41,97 @@ Ditto for Unicode punctuation (here U+00A1). ## `md_decode_utf8__()` and `md_decode_utf8_before__()` +### Alphanumerical Character (i.e. not whitespace, not punctuation) + +Non-whitespace & non-punctuation characters below suppress `_` from being +recognized as an emphasis because `_` should be seen as in-word character: + +Example of 1-byte UTF-8 sequence (U+0058): +```````````````````````````````` example +X__foo__X +. +

X__foo__X

+```````````````````````````````` + +Example of 2-byte UTF-8 sequence (U+0158): +```````````````````````````````` example +Ř__foo__Ř +. +

Ř__foo__Ř

+```````````````````````````````` + +Example of 3-byte UTF-8 sequence (U+0BA3): +```````````````````````````````` example +ண__foo__ண +. +

ண__foo__ண

+```````````````````````````````` + +Example of 4-byte UTF-8 sequence (U+13142): +```````````````````````````````` example +𓅂__foo__𓅂 +. +

𓅂__foo__𓅂

+```````````````````````````````` + +### Whitespace character + +Whitespace on the other hand should not suppress `_`: + +Example of 1-byte UTF-8 sequence (U+0009): +```````````````````````````````` example +x→__foo__→ +. +

x foo

+```````````````````````````````` +(The initial `x` to suppress indented code block.) + +Example of 2-byte UTF-8 sequence (U+00A0): +```````````````````````````````` example + __foo__ +. +

foo

+```````````````````````````````` + +Example of 3-byte UTF-8 sequence (U+2000): +```````````````````````````````` example + __foo__ +. +

 foo 

+```````````````````````````````` + +(AFAIK, there is no 4-byte UTF-8 whitespace.) + +### Punctuation character + +Punctuation also should not suppress `_`: + +Example of 1-byte UTF-8 sequence (U+002E): +```````````````````````````````` example +.__foo__. +. +

.foo.

+```````````````````````````````` + +Example of 2-byte UTF-8 sequence (U+00B7): +```````````````````````````````` example +·__foo__· +. +

·foo·

+```````````````````````````````` + +Example of 3-byte UTF-8 sequence (U+0C84): +```````````````````````````````` example +಄__foo__಄ +. +

foo

+```````````````````````````````` + +Example of 4-byte UTF-8 sequence (U+1039F): ```````````````````````````````` example -á*Á (U+00E1, i.e. two byte UTF-8 sequence) - *  (U+2000, i.e. three byte UTF-8 sequence) +𐎟__foo__𐎟 . -

á*Á (U+00E1, i.e. two byte UTF-8 sequence) - * (U+2000, i.e. three byte UTF-8 sequence)

+

𐎟foo𐎟

````````````````````````````````