-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial work on language-negotiation materials #581
Open
aphillips
wants to merge
17
commits into
w3c:gh-pages
Choose a base branch
from
aphillips:gh-pages
base: gh-pages
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
1c3574a
Various edits
aphillips 99ec8ba
Correct typos
aphillips 9ac6994
Add locale examples
aphillips b6c7b1f
Fix table
aphillips 817c32f
Improve locale example
aphillips 759536c
Improving the text
aphillips 4454690
Fix list of signals
aphillips aeef1cf
Move note under list to make it easier to read
aphillips 8a968d7
Fix formatting
aphillips 5950acd
Address consistency of locale vs. language negotiation
aphillips aeed058
Merge branch 'gh-pages' of https://github.com/w3c/i18n-drafts into wc…
aphillips 61cc40c
Merge branch 'wc3-gh-pages' into gh-pages
aphillips 354eb35
Remove stray apostrophes
aphillips 1168920
Clarify localization and example about language choice
aphillips dde0ea9
Update language-negotiation.md
aphillips 139f1dc
Merge branch 'w3c:gh-pages' into gh-pages
aphillips c1b11ae
Address some of @kleinsin's comments
aphillips File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,340 @@ | ||
# Language and Locale Negotiation on the Web | ||
|
||
One challenge faced by websites and services on the Web is how to decide on | ||
the most appropriate language and [locale](https://www.w3.org/TR/i18n-glossary#locale) to use when presenting | ||
information to users. | ||
|
||
This article outlines some of the considerations and best practices when | ||
deciding how to choose, set, and store a user's locale preference. | ||
It is not a complete recipe: not only will different users have different | ||
preferences but there are choices that developers need to make when | ||
implementing locale negotiation. | ||
|
||
## What are language and locale negotiation? How are they the same/different? | ||
|
||
**_Language negotiation_** is the process of using various inputs to select between different language | ||
experiences for a given request, session, or user experience. | ||
Web sites perform language negotiation in order to provide the user with an experience that they understand. | ||
|
||
Language negotiation allows software to select the best experience for the user's needs. | ||
Sometimes different languages will have different content or experiences, | ||
but most often a site or software will have a consistent experience | ||
that is translated ("localized") from a specific source language to support users | ||
with different linguistic or cultural needs. | ||
|
||
Localization of content can take different forms. | ||
For example, a site consisting of static files might just choose between different | ||
language versions of each page, while another site might load text into a blank template | ||
from resource bundles (such those employed by GNU gettext or Java's `java.util.ResourceBundle`). | ||
|
||
One aspect of language negotiation is that the result is usually a | ||
[language tag](https://www.w3.org/TR/i18n-glossary/#dfn-language-tag) | ||
that identifies the user's locale. | ||
The language tags used on the Web (and by most software) are defined by | ||
[BCP47](https://rfc-editor.org/bcp/bcp47). | ||
|
||
A [locale](https://www.w3.org/TR/i18n-glossary/#dfn-locale) is: | ||
> An identifier (such as a language tag) for a set of international preferences. | ||
> Usually this identifier indicates the preferred language of the user and | ||
> possibly includes other information, such as a geographic region (such as a country). | ||
> A locale is passed in APIs or set in the operating environment to obtain culturally-affected behavior within a system or process. | ||
|
||
Developers pass the locale (or set it in the operating environment) when calling | ||
internationalized APIs used for operations such as loading the localized resources mentioned above. | ||
But they are also use APIs that make the content or data in a system appear or | ||
work in a localized way, such as by formatting dates and numbers, sorting lists, | ||
and many other operations that vary by language or culture. | ||
|
||
As a result, language negotiation should really be called **_locale negotiation_**, | ||
because it includes the process that internationalized software uses to match a | ||
user's [international preferences](https://www.w3.org/TR/i18n-glossary/#dfn-international-preferences) | ||
to the internationalized functionality and localized resources available | ||
in a given piece of software (such as a website). | ||
|
||
The negotiated locale is used to select resources (static pages, resource files, | ||
etc.) and to set the locale for internationalization (I18N) APIs such as | ||
`Intl.DateFormat` or `Intl.Collator` in JavaScript. | ||
|
||
> **_Example_** | ||
> Consider a web site operated by a non-profit `example.org`. | ||
> They want to offer different language experiences to meet user needs. | ||
> This organization decides that their site experience will serve users | ||
> in North America. | ||
> While other languages are spoken in this region, initially they decide to offer three languages: | ||
> `en` (English), `es` (Spanish), and `fr` (French). | ||
> | ||
> Using `en`, `es`, or `fr` as the locale in software, however, does not | ||
> produce complete results | ||
> or these locales may produce unexpected results compared to more-specific | ||
> locales that include, for example, region subtags. | ||
> Such regional variations in formatting affect how the software | ||
> presents values such as dates, times, numbers, and so forth, | ||
> which, in turn, affects how users experience the localization: | ||
> | ||
> | Locale | Locale Description | Formatted Date | Number | Currency | | ||
> |--------|--------------------|-----------------|--------|----------| | ||
> | en | English | Jun 28, 2024, 1:24:45 PM | 1,234.56 |$1,234.56 | | ||
> | en-US | English (United States) | Jun 28, 2024, 1:24:45 PM | 1,234.56 |$1,234.56 | | ||
> | en-CA | English (Canada) | Jun 28, 2024, 1:24:45 p.m. | 1,234.56 |US$1,234.56 | | ||
> | en-GB | English (United Kingdom) | 28 Jun 2024, 13:24:45 | 1,234.56 |US$1,234.56 | | ||
> | fr | French | 28 juin 2024, 13:24:45 | 1 234,56 |1 234,56 $US | | ||
> | fr-CA | French (Canada) | 28 juin 2024, 13 h 24 min 45 s | 1 234,56 |1 234,56 $ US | | ||
> | fr-FR | French (France) | 28 juin 2024, 13:24:45 | 1 234,56 |1 234,56 $US | | ||
> | es-419 | Spanish (Latin America) | 28 jun 2024, 1:24:45 p.m. | 1,234.56 |USD 1,234.56 | | ||
> | es | Spanish | 28 jun 2024, 13:24:45 | 1234,56 |1.234,56 US$ | | ||
> | es-MX | Spanish (Mexico) | 28 jun 2024, 1:24:45 p.m. | 1,234.56 |USD 1,234.56 | | ||
> | es-ES | Spanish (Spain) | 28 jun 2024, 13:24:45 | 1234,56 |1.234,56 US$ | | ||
> | es-US | Spanish (United States) | 28 jun 2024, 1:24:45 p.m. | 1,234.56 |$1,234.56 | | ||
|
||
|
||
## Determining the User's International Preferences | ||
|
||
The first problem in locale negotiation is: | ||
how do we know what the user's preferences are? | ||
|
||
If an application knows exactly who a user is, | ||
it can often use stored data to know exactly what the user prefers. | ||
Otherwise, locale negotiation depends on whatever information is available | ||
in the session or request to guess at the user's intent. | ||
|
||
Guessing at the user's intention might vary depending on many different things. | ||
For example, users might expect a different default currency on a website | ||
intended for use in Germany than on one intended for use in Switzerland, | ||
even if both are localized into German. | ||
|
||
Determining the user's international preferences, thus, often | ||
depends on a hierarchy of "signals". | ||
Some signals are directly related to the user's international preferences | ||
(such as storing the specific locale to use!) | ||
while others are open to interpretation | ||
(such as the user's geographic location). | ||
These can include: | ||
- values directly indicated by the user, such as: | ||
- locale preferences stored in the user's profile on the site | ||
- locale preferences stored in a cookie | ||
- locale preferences stored in the browser, sesssion, or application | ||
- values from the user's runtime environment, such as: | ||
- the HTTP `Accept-Language` header (which often matches the user's | ||
operating system runtime locale) | ||
- the browser's localization or locale, such as `navigator.language` | ||
or `navigator.languages` | ||
- values that imply the user's locale preferences, such as: | ||
- the user's geographic location (perhaps determined from signals | ||
such as the user's IP address or other networking information) | ||
- the site's configuration | ||
- values in the URL; such as: | ||
- a subdomain (`en.example.com` vs. `fr.example.com`) | ||
- a path element (`example.com/en/index.html` vs. `example.com/fr/index.html`) | ||
- a part of the filename (`index.en.html` vs. `index.fr.html`) | ||
- a query parameter (`example.com?lang=en` vs. `example.com?lang=fr`) | ||
- or application specific information | ||
|
||
> [!NOTE] | ||
> Be careful about making assumptions based on regional or locale information. | ||
> | ||
> For example, knowing that someone's IP address might be located in a specific | ||
> country (via geoIP detection) does not mean that the person speaks the most | ||
> common languages of that country, or that they prefer local formatting | ||
> conventions of that country. | ||
> It's possible, too, that the location is being spoofed, such as via | ||
> a VPN. | ||
> | ||
> Similarly, a locale should not be used to imply unrelated values, | ||
> such as the currency of a transaction or the time zone of the user. | ||
|
||
|
||
## Types of User | ||
|
||
Locale negotiation is more reliable if the user has provided a strong signal | ||
(such as choosing their locale from a menu). | ||
Once the locale has been determined, it is usually a good idea to store the result | ||
for future use (including for offline interactions, such as push notifications). | ||
|
||
The simplest use case is a site or service that does not preserve the user's | ||
information between sessions, has no offline experience, and does not maintain | ||
a user profile. | ||
Such a site might need to negotiate the locale with every request. | ||
|
||
Towards the other end of the spectrum is a site or service that | ||
maintains a user profile (account), | ||
has an offline experience (such as sending emails to the user), | ||
and attempts to maintain the user's locale preference between sessions. | ||
|
||
In general, these use cases (and ones in-between these two extremes) | ||
have three types of user profile: | ||
- **Unrecognized User** - the user does not have a profile or the site | ||
does not maintain one on the users behalf. There are no cookies or other | ||
cross-session indicators of the user's preference. | ||
- **Recognized User** - the user has a profile on | ||
this site and there is some cross-session or other indication to | ||
associate the request with a specific user profile. However, the user | ||
has not authenticated themselves. Users in this state can have a measure | ||
of personalization done for them, but should not be given access to | ||
account secrets (such as the password, payment information, and the like). | ||
- **Authenticated User** - the user has a profile on this site | ||
and has authenticated themselves (by logging in or through some other | ||
mechanism) | ||
|
||
An important special case is that of recognized users for sites that don't maintain | ||
a server-side profile or account: user state, including language preference, | ||
can be stored in the cookies or browser local storage. | ||
The effect might be similar to having an "authenticated user", | ||
except that, of course, that the site can't generate offline interactions | ||
and might "forget" the user's preferences between sessions or browsers. | ||
|
||
## Site Configuration | ||
|
||
A site cannot support every language in the world nor can it support | ||
all of the potential variations of a given language/locale. | ||
To ensure a consistent experience, sites need these concepts: | ||
|
||
- **Available Locales** The list of locales that the site will support. | ||
Only locales appearing in this list can be selected by the locale | ||
negotiation process. | ||
- **Default Locale**. If none of the user's signals match the available | ||
list of locales on the site, there needs to be some locale chosen. | ||
This default is the "ultimate fallback" for the site. | ||
|
||
Sites sometimes have different configurations for different sets of users. | ||
For example, a site might be preparing to release a new localization (language) and need | ||
testers to have access to the locale prior to making the locale | ||
available to regular users. | ||
Or a site might make a locale available only to users from certain geographies. | ||
(As of 2024, an example of this is the [US Amazon website](https://www.amazon.com), | ||
which makes more languages available to users with a non-USA shipping address | ||
than to domestic users.) | ||
|
||
Sites also sometimes need to manage more detailed preferences. | ||
|
||
For example, suppose a site supports two variations of Spanish: | ||
`en-ES` (Spanish for Spain) and `es-419` (Spanish for Latin America). | ||
|
||
Users might provide signals that match an unsupported variety of Spanish. | ||
Suppose a user requests a generic form of Spanish using the tag `es`. | ||
The site might need to indicate which flavor of Spanish (`es-ES` or `es-419`) | ||
is the default Spanish. | ||
|
||
Suppose a different user requests `es-CR` (Spanish for Costa Rica). | ||
Then the site might need to maintain a mapping for different sorts of Spanish. | ||
Such a configuration might be a map of values or it might use an algorithm. | ||
For example, the site might use the UN M.49 regional containment data | ||
provided as | ||
[part of CLDR](https://www.unicode.org/cldr/charts/44/supplemental/territory_containment_un_m_49.html) | ||
to discover that `CR` (Costa Rica) is part of `013` (Central America) | ||
which is part of `419` (Latin America), making `es-419` the best match | ||
for a request for `es-CR`. | ||
|
||
Another example might be geographic defaults. | ||
For example, a site might make `es-419` the default for users whose | ||
geoIP location indicates Latin America but use the site default for users | ||
outside that region. | ||
|
||
Notice that there is a tension between providing a long list of supported locales | ||
(to give users the ability to tailor API-based formatting presentation) | ||
and providing a short list of available locales (to aid in selection). | ||
In our example above, where the site is available in English, French, and Spanish, | ||
there might only be three localizations, but twenty or more locales that the site _could_ | ||
make available. | ||
Deciding which combinations of language and locale to expose to users and how to represent these | ||
depends on many factors. | ||
|
||
## Constructing the Algorithm | ||
|
||
Hierarchical negotiation is the most common mechanism for performing locale negotiation. | ||
One way to implement this is to work from the least specific signal to the | ||
most specific one and then return the result. | ||
|
||
> For example: | ||
> | ||
> 1. Let the return value be the site default. | ||
> 2. If the user's geographic region has a default let return value be that locale. | ||
> 3. If the user has an Accept-Language header | ||
> i. For each language range in the A-L header | ||
> a. if the language range matches an available locale's language tag, let return value be that locale | ||
Comment on lines
+252
to
+253
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Markdown doesn't seem to support list syntax like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes: the article will be converted to HTML before final publication. |
||
> 4. If the user has a cookie with a locale preference, let return value be that locale | ||
> 5. If the URL contains a (recognized, permitted) locale identifier, let return value be that locale | ||
> 6. If the user is recognized, let return value be the locale in the user's profile | ||
> | ||
> Return the return value. | ||
|
||
Notice how the above algorithm might be tailored to weight one item higher | ||
(or lower) in the hierarchy. | ||
|
||
The above short description leaves out checking if each language tag is in the | ||
list of supported locales | ||
and leaves out mapping of values, either of which might affect the outcome. | ||
|
||
### What happens when the user logs in? | ||
|
||
When a user authenticates (logs in), the user's preferences need to be checked. | ||
If the user's profile contains a locale preference different from the one | ||
currently negotiated with the user-agent, the stored preference may need to be updated | ||
or the user's preference changed: | ||
- optionally query the user if they want to change their locale | ||
(did the currently negotiated locale meet their need?) | ||
- update the session with the value in their profile | ||
(either the value currently negotiated or what they had previously) | ||
|
||
Updating the user's profile helps ensure, for example, that offline communications, | ||
such as push notifications, emails, or SMS messages, are in the user's preferred locale) | ||
|
||
### Why provide locale overrides via the URL? | ||
|
||
There are a number of reasons why the URL might need to encode the locale to use. | ||
|
||
One important use is to allow testing of the site in a given locale. | ||
This is useful in reproducing issues that only appear in specific locales | ||
without having to change the customer support account's preference. | ||
It is also useful when the locale is not yet available to end users | ||
but it being prepared for production. | ||
|
||
Another is that URLs can be shared or used for marketing materials. | ||
By encoding the locale into the URL, you can guarantee that the site | ||
shown is in the same language as some external resource. | ||
For example, if the user was brought to the site by clicking on a display ad in a specific language | ||
or clicked on a link in an email they received from the site. | ||
This can help users of a non-default locale who might also be unrecognized users to | ||
get the right experience. | ||
|
||
### What's not included in the locale? | ||
|
||
There are a number of values that might be inferred from a locale identifier | ||
but which should be separately maintained. | ||
These include the user's preferred currency (or the currency of a given transaction), | ||
the user's time zone, | ||
and the user's country/jurisdiction (where they actually live, for example). | ||
|
||
While locales often contain a country code, | ||
this code does not mean that the user is located in that country | ||
or has a residence in that country. | ||
|
||
A country sometimes implies a currency, but care must be taken not to | ||
apply a currency to a numeric value blindly. | ||
|
||
Most countries have a single time zone, but the user may not be in that | ||
country. | ||
|
||
There can also be local, user, or operating system preferences | ||
that are _related_ to locale. | ||
An example of this is the ability to choose between 12- and 24-hour | ||
time display. | ||
The locale always provides the default for this value. | ||
But user overrides of the value sometimes need to be propagated separately. | ||
|
||
### What if the user doesn't like the results? | ||
|
||
When a site offers multiple languages or offers multiple locales | ||
for a given language (or both), | ||
the result of locale negotiation might not be what the user would have chosen. | ||
When this happens, the user should be provided a convenient control in | ||
a predictable, visible location to choose the locale for herself. | ||
|
||
The result of choosing a locale should be sticky. | ||
Any offline hints (such as cookies) as well as any server-side | ||
user profile should be updated. | ||
In addition, if the site uses URL elements, the page should redirect with | ||
the locale preference added/substituted in the URL. | ||
|
||
Sites that "don't remember" a user's choice can be frustrating to use, | ||
as the user might need to navigate a foreign language experience to reach one | ||
that they understand. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'they are also use' -> 'they also use'?