From 6db87646905def847f3e8df4bc434529cf5d8110 Mon Sep 17 00:00:00 2001 From: Jordan Harband Date: Sat, 23 Mar 2024 13:41:07 -0700 Subject: [PATCH] fixup: code points instead of code units --- spec.emu | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/spec.emu b/spec.emu index 276d1d4..4d6e3c4 100644 --- a/spec.emu +++ b/spec.emu @@ -30,12 +30,12 @@ contributors: Jordan Harband 1. Let _escapedList_ be a new empty List. 1. For each code point _c_ in _cpList_, do 1. If _escapedList_ is empty and _c_ is matched by |DecimalDigit|, then - 1. Append code unit U+005C (REVERSE SOLIDUS) to _escapedList_. - 1. Append code unit U+0078 (LATIN SMALL LETTER X) to _escapedList_. - 1. Append code unit U+0033 (DIGIT THREE) to _escapedList_. + 1. Append code point U+005C (REVERSE SOLIDUS) to _escapedList_. + 1. Append code point U+0078 (LATIN SMALL LETTER X) to _escapedList_. + 1. Append code point U+0033 (DIGIT THREE) to _escapedList_. 1. Append _c_ to _escapedList_. 1. Else, - 1. Append the code units in EncodeForRegExpEscape(_c_) to _escapedList_. + 1. Append the code points in EncodeForRegExpEscape(_c_) to _escapedList_. 1. Return CodePointsToString(_escapedList_). @@ -47,8 +47,8 @@ contributors: Jordan Harband

EncodeForRegExpEscape ( - _c_: a code unit, - ): a List of code units + _c_: a code point, + ): a List of code points

description
@@ -56,23 +56,29 @@ contributors: Jordan Harband
- 1. Let _codeUnits_ be a new empty List. + 1. Let _codePoints_ be a new empty List. 1. Let _punctuators_ be the following String, which consists of every ASCII punctuator except U+005F (LOW LINE): *"(){}[]|,.?\*+-^$=<>\/#&!%:;@~'"`"*. 1. Let _toEscape_ be StringToCodePoints(_punctuators_). 1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace|, then - 1. Append code unit U+005C (REVERSE SOLIDUS) to _codeUnits_. + 1. Append code point U+005C (REVERSE SOLIDUS) to _codePoints_. 1. Let _hex_ be Number::toString(𝔽(_c_), 16). 1. If the length of _hex_ is 1 or 2, then 1. Set _hex_ to StringPad(_hex_, 2, *"0"*, ~start~). - 1. Append code unit U+0078 (LATIN SMALL LETTER X) to _codeUnits_. + 1. Append code point U+0078 (LATIN SMALL LETTER X) to _codePoints_. + 1. Append the code points in StringToCodePoints(_hex_) to _codePoints_. + 1. Else if the length of _hex_ is > 4, then + 1. Append code point U+0075 (LATIN SMALL LETTER U) to _codePoints_. + 1. Append code point U+007B (LEFT CURLY BRACKET) to _codePoints_. + 1. Append the code points in StringToCodePoints(_hex_) to _codePoints_. + 1. Append code point U+007D (RIGHT CURLY BRACKET) to _codePoints_. 1. Else, 1. Assert: The length of _hex_ is at most 4. 1. Set _hex_ to StringPad(_hex_, 4, *"0"*, ~start~). - 1. Append code unit U+0075 (LATIN SMALL LETTER U) to _codeUnits_. - 1. Append the code units in _hex_ to _codeUnits_. + 1. Append code point U+0075 (LATIN SMALL LETTER U) to _codePoints_. + 1. Append the code points in StringToCodePoints(_hex_) to _codePoints_. 1. Else, - 1. Append _c_ to _codeUnits_. - 1. Return _codeUnits_. + 1. Append _c_ to _codePoints_. + 1. Return _codePoints_.