Comments and aliases problem #78

jsbien · 2016-02-28T14:01:20Z

The character is also known as reversed Polish-hook o.

However there is no such a formal alias.

On the other hand Fileformat.Info contains the following comments:

reversed Polish-hook o
archaic phonetic for labialized alveolar fricative
recommended spellings U+007A U+02B7 or U+007A U+032B

Looks like the first comment was converted into an alias while the other ones have been skipped.

I don't think this is correct :-( How the comments are imported and how are they processed? Shouldn't they be displayed just as "Unicode comments"?

Boldewyn · 2016-02-28T19:50:36Z

It's a verbatim copy from this file in the Unicode standard:

http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt

I don't know, where Unicode got the alias name of this one from. When I google it, the most prominent results are your mails to the Unicode mailing list ;-)

In said file, lines prefixed with = are alias names as of the standard (like the "reversed Polish-hook o"), while others (like the * prefixed) are other informative data, comments, ... (I opened #49 in order to add this missing info, too.)

I found many of those quite useful to get a generic idea of the character (see, e.g., the low ASCII control characters, or the guillemets), so I embedded the aliases in the character description.

This works quite well for almost all characters. This is the first one, where the alias seems off.

The best way to fix this would be to file an upstream issue with Unicode. A changed NamesList.txt would automatically lead to a fix here. Had you tried that already, by chance, after asking on the mailing list last year?

If that doesn't lead to a result, we could add additional info between the character description and the Wikipedia entry to describe, why the alias is problematic. If you write one or two sentences I'd add them as file codepoints.net/data/U+018D.en.md.

As a last resort I could hotfix the database to remove the alias, but I'd rather stick to the standard as close as possible. (Also the alias might sneak in again in a later import.)

jsbien · 2016-02-28T20:24:19Z

Let's make haste slowly.

I've checked NameAliases.txt only, I was not aware (or forgot) that aliases are defined also in NamesList.txt. I will file an issue with Unicode, perhaps after discussing the problem on the Unicode list (it was on my TODO list already).

I'm glad you plan to handle also other informations from the file.

Some time in the future I would like to include the information from
https://bitbucket.org/jsbien/unicode4polish/wiki/codes/U+018D_LATIN_SMALL_LETTER_TURNED_DELTA
However I don't have a clear idea how to do it in an elegant and extensible way.

jsbien · 2016-03-13T07:09:52Z

Thanks to the thread about NamesList.txt on the Unicode list I've came to the conclusion that we have to distinguish formal aliases from NameAliases.txt and informal aliases from NamesList.txt. So instead

The character is also known as reversed Polish-hook o.

we should have something like

The Unicode version ??? mentions an informal alias: reversed Polish-hook o.

I've added the Unicode version because, as far as I understand, the annotations are not stable and may vanish.

Boldewyn · 2016-03-20T19:21:36Z

Since I parse the data anew with every next version, the informal alias would vanish then, too, here. So I guess, we could leave that out. Apart from that I very much like the idea to re-word it like this.

jsbien · 2016-03-25T20:26:00Z

What about

The character is also called a reversed Polish-hook

versus e.g.

The character is also known as SYRIAC SUBLINEAR COLON SKEWED LEFT

The second example is an official alias. As for the first one, nobody knows the character as a reversed Polish-hook o, especially as there is no such thing as a Polish-hook, the diacritic mark even in English is called ogonek. The name is just an individual usage of an author of a perhaps obsolete book on phonology.

Moreover, I don't like vanishing information. I would appreciate very much the note in/since which versions the comment appeared. The proposed wording allows for it, e.g.

The character is also called a reversed Polish-hook (in Unicode 4.1.0 and later versions)

Boldewyn added upstream-unicode upstream-unicodeinfo labels Mar 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments and aliases problem #78

Comments and aliases problem #78

jsbien commented Feb 28, 2016

Boldewyn commented Feb 28, 2016

jsbien commented Feb 28, 2016

jsbien commented Mar 13, 2016

Boldewyn commented Mar 20, 2016

jsbien commented Mar 25, 2016