Skip to content

Commit

Permalink
✨ Support for grapheme on fc.string (#5222)
Browse files Browse the repository at this point in the history
**Description**

<!-- Please provide a short description and potentially linked issues
justifying the need for this PR -->

Strings are everywhere in programming, they have always been and will
always be. They are our way to communicate with users in both ways: from
them and to them. As such they are a key foundation for any piece of
code.

Unfortunately strings are hard. When leaving the peaceful (biased) world
of latin characters (ascii) we reach quickly a hard to grab new world.
But we have to deal with it, strings are key for any program! As such
fast-check wants to guide you through the subtle issues hat might come
with strings by providing you with a comprehensive and efficient set of
tools for them.

For instance: As a user you may have used Twitter/X. If so you probably
heard of the 140-character limit (now bumped to 280 for European
characters and still limited to 140 for CJK and Emojis). Actually by 140
they don't mean `tweet.length === 140` (counting chars), nor
`[...tweet].length === 140` (counting code-points), they mean 140 visual
signs. And unfortunately it's most of the time what our users expect.

This PR introduces an extra constraint called `unit` on `fc.string`.
This constraint could be filled with one our 5 presets. Among these
presets `'grapheme'` is planned to be the new default for version 4 and
will behave as users may expect: it counts visual entities, neither
chars, nor code-points. In order to help users finding their ideal
granularity, we added other grapheme granularities: one for code-points
(grapheme-composite) and one for chars (grapheme-ascii). The binary
variants are supposed to be replacements for fullUnicode and other
custom string versions we offered up to version 3 and that we plan to
drop with v4.

See #5221

<!-- * Your PR is fixing a bug or regression? Check for existing issues
related to this bug and link them -->
<!-- * Your PR is adding a new feature? Make sure there is a related
issue or discussion attached to it -->

<!-- You can provide any additional context to help into understanding
what's this PR is attempting to solve: reproduction of a bug, code
snippets... -->

**Checklist** — _Don't delete this checklist and make sure you do the
following before opening the PR_

- [x] The name of my PR follows [gitmoji](https://gitmoji.dev/)
specification
- [x] My PR references one of several related issues (if any)
- [x] New features or breaking changes must come with an associated
Issue or Discussion
- [x] My PR does not add any new dependency without an associated Issue
or Discussion
- [x] My PR includes bumps details, please run `yarn bump` and flag the
impacts properly
- [x] My PR adds relevant tests and they would have failed without my PR
(when applicable)

<!-- More about contributing at
https://github.com/dubzzz/fast-check/blob/main/CONTRIBUTING.md -->

**Advanced**

<!-- How to fill the advanced section is detailed below! -->

- [x] Category: ✨ Introduce new features
- [x] Impacts: No impact, we enriched `fc.string` and provided it with
new capabilities

<!-- [Category] Please use one of the categories below, it will help us
into better understanding the urgency of the PR -->
<!-- * ✨ Introduce new features -->
<!-- * 📝 Add or update documentation -->
<!-- * ✅ Add or update tests -->
<!-- * 🐛 Fix a bug -->
<!-- * 🏷️ Add or update types -->
<!-- * ⚡️ Improve performance -->
<!-- * _Other(s):_ ... -->

<!-- [Impacts] Please provide a comma separated list of the potential
impacts that might be introduced by this change -->
<!-- * Generated values: Can your change impact any of the existing
generators in terms of generated values, if so which ones? when? -->
<!-- * Shrink values: Can your change impact any of the existing
generators in terms of shrink values, if so which ones? when? -->
<!-- * Performance: Can it require some typings changes on user side?
Please give more details -->
<!-- * Typings: Is there a potential performance impact? In which cases?
-->
  • Loading branch information
dubzzz authored Aug 22, 2024
1 parent 1c2450b commit 9f5ec86
Show file tree
Hide file tree
Showing 9 changed files with 1,717 additions and 47 deletions.
8 changes: 8 additions & 0 deletions .yarn/versions/2afbfedf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
releases:
fast-check: minor

declined:
- "@fast-check/ava"
- "@fast-check/jest"
- "@fast-check/vitest"
- "@fast-check/worker"
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import type { Arbitrary } from '../../check/arbitrary/definition/Arbitrary';
import { mapToConstant } from '../mapToConstant';
import type { GraphemeRange } from './data/GraphemeRanges';
import {
asciiAlphabetRanges,
autonomousDecomposableGraphemeRanges,
autonomousGraphemeRanges,
fullAlphabetRanges,
} from './data/GraphemeRanges';
import type { GraphemeRangeEntry } from './helpers/GraphemeRangesHelpers';
import { convertGraphemeRangeToMapToConstantEntry, intersectGraphemeRanges } from './helpers/GraphemeRangesHelpers';

/** @internal */
type StringUnitType = 'grapheme' | 'composite' | 'binary';
/** @internal */
type StringUnitAlphabet = 'full' | 'ascii';
/** @internal */
type StringUnitMapKey = `${StringUnitType}:${StringUnitAlphabet}`;

/**
* Caching all already instanciated variations of stringUnit
* @internal
*/
const registeredStringUnitInstancesMap: Partial<Record<StringUnitMapKey, Arbitrary<string>>> = Object.create(null);

/** @internal */
function getAlphabetRanges(alphabet: StringUnitAlphabet): GraphemeRange[] {
switch (alphabet) {
case 'full':
return fullAlphabetRanges;
case 'ascii':
return asciiAlphabetRanges;
}
}

/** @internal */
function getOrCreateStringUnitInstance(type: StringUnitType, alphabet: StringUnitAlphabet): Arbitrary<string> {
const key: StringUnitMapKey = `${type}:${alphabet}`;
const registered = registeredStringUnitInstancesMap[key];
if (registered !== undefined) {
return registered;
}
const alphabetRanges = getAlphabetRanges(alphabet);
const ranges = type === 'binary' ? alphabetRanges : intersectGraphemeRanges(alphabetRanges, autonomousGraphemeRanges);
const entries: GraphemeRangeEntry[] = [];
for (const range of ranges) {
entries.push(convertGraphemeRangeToMapToConstantEntry(range));
}
if (type === 'grapheme') {
const decomposedRanges = intersectGraphemeRanges(alphabetRanges, autonomousDecomposableGraphemeRanges);
for (const range of decomposedRanges) {
const rawEntry = convertGraphemeRangeToMapToConstantEntry(range);
entries.push({
num: rawEntry.num,
build: (idInGroup) => rawEntry.build(idInGroup).normalize('NFD'),
});
}
}
const stringUnitInstance = mapToConstant(...entries);
registeredStringUnitInstancesMap[key] = stringUnitInstance;
return stringUnitInstance;
}

/** @internal */
export function stringUnit(type: StringUnitType, alphabet: StringUnitAlphabet): Arbitrary<string> {
return getOrCreateStringUnitInstance(type, alphabet);
}
Loading

0 comments on commit 9f5ec86

Please sign in to comment.