-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments and aliases problem #78
Comments
It's a verbatim copy from this file in the Unicode standard: http://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt I don't know, where Unicode got the alias name of this one from. When I google it, the most prominent results are your mails to the Unicode mailing list ;-) In said file, lines prefixed with I found many of those quite useful to get a generic idea of the character (see, e.g., the low ASCII control characters, or the guillemets), so I embedded the aliases in the character description. This works quite well for almost all characters. This is the first one, where the alias seems off. The best way to fix this would be to file an upstream issue with Unicode. A changed NamesList.txt would automatically lead to a fix here. Had you tried that already, by chance, after asking on the mailing list last year? If that doesn't lead to a result, we could add additional info between the character description and the Wikipedia entry to describe, why the alias is problematic. If you write one or two sentences I'd add them as file codepoints.net/data/U+018D.en.md. As a last resort I could hotfix the database to remove the alias, but I'd rather stick to the standard as close as possible. (Also the alias might sneak in again in a later import.) |
Let's make haste slowly. I've checked NameAliases.txt only, I was not aware (or forgot) that aliases are defined also in NamesList.txt. I will file an issue with Unicode, perhaps after discussing the problem on the Unicode list (it was on my TODO list already). I'm glad you plan to handle also other informations from the file. Some time in the future I would like to include the information from |
Thanks to the thread about NamesList.txt on the Unicode list I've came to the conclusion that we have to distinguish formal aliases from NameAliases.txt and informal aliases from NamesList.txt. So instead
we should have something like
I've added the Unicode version because, as far as I understand, the annotations are not stable and may vanish. |
Since I parse the data anew with every next version, the informal alias would vanish then, too, here. So I guess, we could leave that out. Apart from that I very much like the idea to re-word it like this. |
What about
versus e.g.
The second example is an official alias. As for the first one, nobody knows the character as a reversed Polish-hook o, especially as there is no such thing as a Polish-hook, the diacritic mark even in English is called ogonek. The name is just an individual usage of an author of a perhaps obsolete book on phonology. Moreover, I don't like vanishing information. I would appreciate very much the note in/since which versions the comment appeared. The proposed wording allows for it, e.g.
|
The page U+018D states
However there is no such a formal alias.
On the other hand Fileformat.Info contains the following comments:
Looks like the first comment was converted into an alias while the other ones have been skipped.
I don't think this is correct :-( How the comments are imported and how are they processed? Shouldn't they be displayed just as "Unicode comments"?
The text was updated successfully, but these errors were encountered: