The Unicode Consortium dictionary (cldr.dict) really needs to be subdivided into at least 2 separate dictionaries: Emojis and Other #17659

britechguy · 2025-01-27T23:41:45Z

Is your feature request related to a problem? Please describe.

Over time, more and more users on the NVDA group are complaining about the fact that if the checkbox to include Unicode Consortium data (including emoji) is checked when using a synth that does not natively announce emojis almost invariably many characters from unicode ranges outside of the emoji ranges end up being announced because they're being used as "dingbats" or some form of decorative content along with text. As various content developer increase the use of these characters the announcement of same, where it adds nothing of informational value, becomes increasingly frustrating

Describe the solution you'd like

The ranges in the Unicode consortium data that are emojis are well-known and well-documented, and the current CLDR dictionary needs to be updated upon the introduction of new emojis. It would make sense to isolate the character ranges dedicate from emoji in a dictionary of their own. It may, or may not, make sense to further subdivide the remaining Unicode character ranges into separate dictionaries that can be named based on the functional class of unicode characters. The emoji dictionary would be named as such and any other subdivisions beyond "not emoji" should be given meaningful dictionary names if they are actually created.

Describe alternatives you've considered

One could use configuration profiles to turn the inclusion of Unicode consortium data on/off based upon context, but most NVDA users would not know where or when or exactly how to do that. Yet all of them would recognize a checkbox that read "emoji dictionary" and other meaningful dictionary names for ranges of Unicode characters that share a functional class.

Additional context

I personally believe that more than emoji versus non-emoji dictionaries for Unicode consortium characters should be created. There are a number of ranges that can be named relatively clearly, e.g., mathematical symbols, international flags, etc., that make clear what including them would cause NVDA to announce. If they are not selected, those Unicode characters would simply be ignored.

zstanecic · 2025-01-27T23:51:50Z

I don't know how this is doable: Country flags are also part of the uts51 data, effectively making these emoji characters, and these are used by people using emojis.

britechguy · 2025-01-28T01:16:49Z

Decisions would have to be made about categories, like they always are.

No one is going to be happy with each and every "character class" division, but you can be mighty sure that they'll be happier than they are now.

I also don't see any problem with dividing country flags out, as that allows the end user to check a checkbox as to whether those are included, or not.

It comes down to allowing the end user to have far greater control over the classes of Unicode characters that would be announced by NVDA (when the synth does not do so natively) and which would not. I'm not thinking 100 subdivisions, but I am thinking more than 2, and "the common emojis" used worldwide as "pure" emojis are very easily identified.

Gene703122 · 2025-01-28T05:26:40Z

I support this proposal. As things stand now, to change what is announced, you have to manually change every individual item you want announced using the punctuation/symbol pronunciation dialog, making every item you don't want to hear set to all, then using a setting below all in the symbol level for speech settings.

There needs to be an efficient way to set verbosity levels for the kinds of items we are talking about where you can turn them on or off as a class or group, while leaving commonly used and useful ones on.

ABuffEr · 2025-01-28T09:43:29Z

subdivide the remaining Unicode character ranges into separate dictionaries that can be named based on the functional class of unicode characters.

Absolutely agree with this further proposal. Math, currency, punctuations, flags... are surely a valuable improvement in speech customization.

britechguy · 2025-01-28T15:11:11Z

BTW, when I say "character classes" that's just what I mean. There are times where some of those classes will be formed by a contiguous range of Unicode characters and there will be others where the class will be composed of several non-contiguous blocks.

Even the "primary emoji range" has gaps in it, but those existing gaps are almost certain to be used as in-fill as more emojis are added.

@ABuffEr has given perfectly obvious examples of what I mean by "character classes" and almost everyone recognizes what the classes he named are made up of. There would definitely be one (or more, if subdivided) for Dingbats, decorative symbols that are virtually never used for any other purpose.

Someone will need to determine which ranges of characters fall into a functional character class and, as Unicode expands, particularly for emojis, those would need to be added to the emoji dictionary. If there were to be a new currency introduced and a Unicode symbol defined for it (the last I can remember is the Euro) then that would be added to the currency symbols dictionary. Etc., etc., etc.

Dictionary updating would be necessary on some cyclic basis or else triggered by specific announcements about additions to the Unicode characters that have been defined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Unicode Consortium dictionary (cldr.dict) really needs to be subdivided into at least 2 separate dictionaries: Emojis and Other #17659

The Unicode Consortium dictionary (cldr.dict) really needs to be subdivided into at least 2 separate dictionaries: Emojis and Other #17659

britechguy commented Jan 27, 2025

zstanecic commented Jan 27, 2025

britechguy commented Jan 28, 2025 •

edited

Loading

Gene703122 commented Jan 28, 2025

ABuffEr commented Jan 28, 2025

britechguy commented Jan 28, 2025 •

edited

Loading

The Unicode Consortium dictionary (cldr.dict) really needs to be subdivided into at least 2 separate dictionaries: Emojis and Other #17659

The Unicode Consortium dictionary (cldr.dict) really needs to be subdivided into at least 2 separate dictionaries: Emojis and Other #17659

Comments

britechguy commented Jan 27, 2025

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

zstanecic commented Jan 27, 2025

britechguy commented Jan 28, 2025 • edited Loading

Gene703122 commented Jan 28, 2025

ABuffEr commented Jan 28, 2025

britechguy commented Jan 28, 2025 • edited Loading

britechguy commented Jan 28, 2025 •

edited

Loading

britechguy commented Jan 28, 2025 •

edited

Loading