Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic language identifiers #3263

Open
fungs opened this issue Oct 11, 2017 · 17 comments
Open

Systematic language identifiers #3263

fungs opened this issue Oct 11, 2017 · 17 comments

Comments

@fungs
Copy link

fungs commented Oct 11, 2017

In the dropdown where you specify the document language, language names should start with the generic and than local variant, for instance English (British), English (American) etc. Otherwise, languages can be difficult to find.

@lpagliari
Copy link
Contributor

We use this package to get languages and their native names, and it gets those values from https://translatewiki.net/. Here is a list of all supported languages, and there you can see that the names we're using on Etherpad are the same.

Although I agree that the systematic way of naming would make it easier to find the languages, it looks like a huge change to be made on Translate Wiki.

Do you want to give it a try and submit a PR with an alternative for Etherpad to use?

@lpagliari lpagliari reopened this Oct 11, 2017
@fungs
Copy link
Author

fungs commented Oct 12, 2017

I did not know this package. It seems easy to fork the package and change the human readable names to a more systematic way, but since the descriptions are given in native format, this requires native speakers for some of the languages. However, I don't get which language codes they are using, it seems to be a mixture of two and three letter iso codes and others.

This official ISO language list has three letter codes and a third column with systematic (English) language descriptions. How are the codes used for mapping in etherpad?

@lpagliari
Copy link
Contributor

This is the package that creates the dropdown. It won't be possible to use other language codes because that would break the compatibility with all plugins that have i18n on them, as the code is used as file name for each translated language.

@fungs
Copy link
Author

fungs commented Oct 16, 2017

Mmm, too bad that TranslateWiki has done such a bad job on structuring their data. I can only offer to modify the auxilary JSON/JS to include a column as specified in the three-letter ISO list and maybe include the three-letter codes in another column as well. I would also keep the native language descriptions, so there would be four columns and a mapping for future conversions.

@JohnMcLear
Copy link
Member

cc @Nikerabbit

@Nikerabbit
Copy link
Contributor

First, there is no longer a separation between two letter and three letter codes. The modern standard is to use https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry (which basically is "use two letter code if available, otherwise three letter code" but that's irrelevant).

Secondly, can you use https://github.com/wikimedia/jquery.uls or do something similar for the language selector which does not require changing the language names?

Thirdly, you could get language autonyms from https://github.com/wikimedia/language-data which is maintained by the Wikimedia Language team if you can't use jquery.uls for the language selector.

@Wikinaut
Copy link
Contributor

Just to inform you briefly that I noticed a problem in hooks/i18n.js with the languages.isValid check, which fails for example for locales such as "de-de":

if ((ext == '.json') && languages.isValid(locale)) {

@muxator
Copy link
Contributor

muxator commented Oct 6, 2018

Hi, I think this and #3404 are important, but I really do not have the time to dig into it. Any volunteer?

@JohnMcLear JohnMcLear self-assigned this Mar 31, 2020
@JohnMcLear
Copy link
Member

@Nikerabbit did TW make any progress here? I'm keen to do the stuff on our side but bear in mind we need client and server side rendering of translations so we can't use the WM solution. What's changed w/ TW? Perhaps we can do a call to organize how best to modernize?

@Nikerabbit
Copy link
Contributor

@JohnMcLear Can you please clearly define the issue you have. In my last comment I listed a few possible solutions already.

@JohnMcLear
Copy link
Member

@Nikerabbit I looked at the options available and even using the WMF maintained data the descriptive names are improperly formed to meet the requirements of this PR: https://github.com/wikimedia/language-data/blob/master/data/langdb.yaml

Look for English, they are all:

  en-ca: [Latn, [AM], Canadian English]
  en-gb: [Latn, [EU, AS, PA], British English]
  en-simple: [Latn, [WW], Simple English]
  en: [Latn, [EU, AM, AF, ME, AS, PA, WW], English]

jquery ul is also provides "British English" and "whatever English" which is isn't suitable also a note of this bug in their UX wikimedia/jquery.uls#359

If the response to this issue is "no", not possible then that's totally fine, if it's something TW are working on and have a solution in the pipeline I'd be all ears.

That's my understanding of this issue.

@Nikerabbit
Copy link
Contributor

So basically, instead of British English, you'd like to have English (British).

Translatewiki.net does not provide languages names. The two projects which I know that provide language names are CLDR and the aforementioned language-data library. Neither of those give the names in the format you'd like to have.

As far as I can see, this leaves two kinds of solutions,

  • Do the necessary work to fulfill the requirement:
    • Search for other language name sources
    • Build list of language name (overrides) yourself
  • Change the requirements to allow other solutions:
    • Use a language selector with search so that the way the language name is displayed doesn't matter, or
    • Sort things by language code, or
    • Sort things alphabetically, but then "pull up" any codes that are more granular than the short code, so that British English comes after English.

@JohnMcLear
Copy link
Member

@Nikerabbit I agree, #3822 addresses some of the work.

I feel like this is a heuristics / quirk / UX requirement that isn't really needed, so I'm going to file it under feature request and not minor bug as it IS the desired behavior from TW point of view. Is that correct?

@Nikerabbit
Copy link
Contributor

It's out of scope for translatewiki.net as we do not provide language names, so no desires from that point of view.

Speaking of the larger ecosystem of i18n I am involved with, the projects that I am aware of have gone with alternative solutions.

@JohnMcLear
Copy link
Member

@Nikerabbit Now that is interesting, can you point me at what some of those projects have opted to use instead? I'm not -1 redoing our implementation because it shouldn't be rocked science to replace it with something well documented as long as it will work on client and server which I imagine nowadays, most do.

@JohnMcLear
Copy link
Member

cc @Nikerabbit again :) Bump!

@JohnMcLear JohnMcLear removed their assignment Apr 23, 2020
@Nikerabbit
Copy link
Contributor

Quick survey:

  • MediaWiki: multilingual language name search – non-js fallback as dropdown sorted by language code - both code and autonym shown
  • FreeCol: dropdown sorted by autonym - autonym shown (not sure where they get those from though)
  • Etherpad Lite: dropdown sorted by language code - autonym shown
  • OpenStreetMap website: input for language code separated by spaces
  • Dissemin: dropdown sorted by language code - autonym shown
  • Phabricator: dropdown sorted by English language name - English language name shown
  • Oppia: searchable dropdown with no clear sorting pattern - autonym shown

Note that almost everything also auto-detects language from the browser or operating system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants