Experimental: Language tag canonicalization #159

facelessuser · 2019-09-08T14:08:08Z

There is talk about potentially having the CSS level 4 :lang() pseudo-class canonicalizing tags and ranges to better help in situations such as: :lang(yue, zh-yue, zh-HK). The idea is you could then just do something like: :lang(yue). For best matches, it is recommended to canonicalize both the range used in the pseudo-class and the tag it is comparing. Canonicalization would also output in the extlang form.

Generally * are ignored in ranges except when at the start: *-yue. Things like en-*-US resolve to en-US, though implicit matching between tags will still match en-xxx-US with en-US.

Currently, in this pull, we have canonicalization implemented according to RFC5646, but there are still some questions:

Should we abandon canonicalization, like we are currently doing, when the tag is invalid? Or do we just canonicalize the valid parts and ignore the failing parts?
As mentioned above, ranges can use *, so we strip out non-essential *s and them canonicalize the range. This seems like the only sane approach, but am I misunderstanding something?
It is only suggested that we MAY order variants to improve matching. We decided to go ahead and do this. Should we though? We have also omitted any failures if the required prefixes for a given variant are not found in the tag. This is to help ensure that both the the tag variant order is the same as the range's variant order, as specified range may not explicitly define all required ranges and rely on implicit matching to grab those. This seems reasonable, but should we abort canonicalization if the prefixes are not found? It is not a MUST requirement in the spec, only a SHOULD.

Anyways, some things to think about. Technically we could merge this as is and simply disable the canonicalization and it should behave exactly how it did before. We could also enable this functionality under an experimental flag if we wanted. Right now, we are simply waiting to see what is decided for the official level 4 CSS spec.

This work may not be used depending on what is decided in the level 4 selector spec, but if it is potentially done, we should hopefully be ready. Tests are still needed to ensure we are doing things as expected. Also, I am not entirely certain how ranges are supposed to be handled vs tags, but I think I am handling them as expected.

Only include keys we actively care about.

Ignore in registry.py as while covered, the information is not helpful. Ignore coverage in Canonicalization as there are no real are no tests for this currently, and we simply want to make sure we are covering all the pre-canonicalization logic.

facelessuser · 2021-02-16T17:25:28Z

This still hasn't formally made it into the spec. It may never. We wanted to get the jump on this, but there seems to be very little interest from the CSS spec team to decide on what they want to do. We will continue to hold on this as it is still not clear if this is the direction they will take.

facelessuser added the S: work-in-progress A partial solution. More changes will be coming. label Sep 8, 2019

facelessuser force-pushed the lang-canonicalize branch 2 times, most recently from adb96af to db36b41 Compare September 8, 2019 15:07

facelessuser mentioned this pull request Sep 26, 2019

Update language filter algorithm #160

Merged

facelessuser force-pushed the lang-canonicalize branch from cc131ff to 225f3dd Compare September 26, 2019 03:49

facelessuser added 8 commits October 20, 2019 09:14

Spelling fixes

bf90d25

Enable canonicalization

247c02b

Spelling fix

e5d1ea5

Reduce size of registry database which we keep

47a0997

Only include keys we actively care about.

Don't allow * in language tags (okay in range) and remove dead code

fa7c1a2

Fix merge issue

7ad7f64

facelessuser force-pushed the lang-canonicalize branch from 60b3e54 to 7ad7f64 Compare October 20, 2019 15:15

github-actions bot added C: css-matching Related to CSS matching. C: docs Related to documentation. C: infrastructure Related to project infrastructure. selectors C: tests Related to testing. labels Oct 20, 2019

gir-bot removed the selectors label Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental: Language tag canonicalization #159

Experimental: Language tag canonicalization #159

facelessuser commented Sep 8, 2019 •

edited

Loading

facelessuser commented Feb 16, 2021

Experimental: Language tag canonicalization #159

Are you sure you want to change the base?

Experimental: Language tag canonicalization #159

Conversation

facelessuser commented Sep 8, 2019 • edited Loading

facelessuser commented Feb 16, 2021

facelessuser commented Sep 8, 2019 •

edited

Loading