$DEFINE UnicodeRE off-> tests fail #343

Alexey-T · 2023-11-16T12:23:29Z

Martin, can you adjust tests project to not fail with UnicodeRE off?
You are really good in composing tests. I admit.
@User4martin

Alexey-T · 2023-11-16T12:34:52Z

Test-proj has it's own UnicodeRE define copy. but even if i disable it, tests fail! @User4martin

User4martin · 2023-11-16T20:24:14Z

Well, haven't checked background yet, but

    // 69
    ( // empty str
    expression: '^ *$';
    inputText: '';
    substitutionText: '';
    expectedResult: '';
    matchStart: 1

fails for me => and that looks like a bug in the regex engine.

The others are down to the define.

Ideally the defines need to move into their own include files.

Then (and I can check that) the failing test may need to be disabled.

The [-] range of Russian chars seems not to be implemented for utf8 yet. Possible, but an issue of its own (and not necessary one that would have my time soon).

The #%85 line break => same thing. But maybe can be fixed easy for utf8.

…DEFED Issue andgineer#343

User4martin · 2023-11-16T20:44:54Z

IsAnyLineBreak could be changed to take a pointer to ReChar.

Then it could return zero, or the length of any matched line break. That way it could handle utf-8 encoded line breaks of more than one byte.

The test case would then need to be changed to have #$C2#$85 in the string.

Alexey-T · 2023-11-16T22:30:48Z

Then it could return zero, or the length of any matched line break.

do code need this really, if it works good already? only more complex logic.

User4martin · 2023-11-16T23:22:11Z

Then it could return zero, or the length of any matched line break.

do code need this really, if it works good already? only more complex logic.

Well, is "not implemented" = works good?

At the moment, using the utf-8 version, Linebreaks like "'NEXT LINE (NEL)' (U+0085)" are simple not detected. utf-8 is unicode, so those codes do exist.

That is unless it is meant to be ASCII? Then a utf-8 version is really needed. (And afaik there is more to be fixed for proper utf8 support, but this would be a start)

Alexey-T · 2023-11-17T07:17:21Z

so it is needed, okay.

Alexey-T · 2023-11-17T07:24:07Z

but is it needed that in non-Unicode mode we must find pure Unicode linebreak? we can ignore chr(85) in non-Unicode mode, logical.

User4martin · 2023-11-17T10:16:21Z

but is it needed that in non-Unicode mode we must find pure Unicode linebreak? we can ignore chr(85) in non-Unicode mode, logical.

IMHO: Wrong Question.
Utf8 is also a Unicode mode.

The question is: Does the regex currently have an ASCII (non Unicode) or an Ut8 (Unicode) mode?

But, IMHO the answer does not matter. IMHO a Utf8 mode is what is needed.

So then the only question is:
Utf8 mode: Add or Fix?

Alexey-T · 2023-11-17T10:18:59Z

So then the only question is:
Utf8 mode: Add or Fix?

Add.

Alexey-T closed this as completed Nov 16, 2023

Alexey-T reopened this Nov 16, 2023

User4martin pushed a commit to User4martin/TRegExpr that referenced this issue Nov 16, 2023

Fix matching empty string in ansi-char mode. Condition was wrongly IF…

e4fcc06

…DEFED Issue andgineer#343

User4martin mentioned this issue Nov 16, 2023

Fix matching empty string in ansi-char mode. Condition was wrongly IF… #344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

$DEFINE UnicodeRE off-> tests fail #343

$DEFINE UnicodeRE off-> tests fail #343

Alexey-T commented Nov 16, 2023 •

edited

Loading

Alexey-T commented Nov 16, 2023 •

edited

Loading

User4martin commented Nov 16, 2023

User4martin commented Nov 16, 2023 •

edited

Loading

Alexey-T commented Nov 16, 2023

User4martin commented Nov 16, 2023

Alexey-T commented Nov 17, 2023

Alexey-T commented Nov 17, 2023 •

edited

Loading

User4martin commented Nov 17, 2023

Alexey-T commented Nov 17, 2023

$DEFINE UnicodeRE off-> tests fail #343

$DEFINE UnicodeRE off-> tests fail #343

Comments

Alexey-T commented Nov 16, 2023 • edited Loading

Alexey-T commented Nov 16, 2023 • edited Loading

User4martin commented Nov 16, 2023

User4martin commented Nov 16, 2023 • edited Loading

Alexey-T commented Nov 16, 2023

User4martin commented Nov 16, 2023

Alexey-T commented Nov 17, 2023

Alexey-T commented Nov 17, 2023 • edited Loading

User4martin commented Nov 17, 2023

Alexey-T commented Nov 17, 2023

Alexey-T commented Nov 16, 2023 •

edited

Loading

Alexey-T commented Nov 16, 2023 •

edited

Loading

User4martin commented Nov 16, 2023 •

edited

Loading

Alexey-T commented Nov 17, 2023 •

edited

Loading