Skip to content

Commit

Permalink
linter: Expect rules to be in NFKC (publicsuffix#725)
Browse files Browse the repository at this point in the history
For more info read
https://docs.python.org/3/library/unicodedata.html
under 'unicodedata.normalize'.

See publicsuffix#715 (comment)
  • Loading branch information
rockdaboot authored and weppos committed Nov 6, 2018
1 parent 77ef951 commit 00e7d2f
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 1 deletion.
7 changes: 6 additions & 1 deletion linter/pslint.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

import sys
import codecs
import unicodedata

nline = 0
line = ""
Expand Down Expand Up @@ -104,7 +105,7 @@ def lint_psl(infile):
for line in lines:
nline += 1

# check for leadind/trailing whitespace
# check for leading/trailing whitespace
stripped = line.strip()
if stripped != line:
line = line.replace('\t','\\t')
Expand Down Expand Up @@ -168,6 +169,10 @@ def lint_psl(infile):
error('Invalid UTF-8 character')
continue

# rules must be NFC coded (Unicode's Normal Form Kanonical Composition)
if unicodedata.normalize("NFKC", line) != line:
error('Rule must be NFKC')

# each rule must be lowercase (or more exactly: not uppercase and not titlecase)
if line != line.lower():
error('Rule must be lowercase')
Expand Down
2 changes: 2 additions & 0 deletions linter/test_NFKC.expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
9: error: Rule must be NFKC: 'südtirol.it'
11: warning: No PRIVATE section found
11 changes: 11 additions & 0 deletions linter/test_NFKC.input
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// test:
// - label contains non-NFKC character(s)
//
// best viewed with 'LC_ALL=C.UTF-8 vi <filename>' (or any other UTF-8 locale)

// ===BEGIN ICANN DOMAINS===

südtirol.it
südtirol.it

// ===END ICANN DOMAINS===

0 comments on commit 00e7d2f

Please sign in to comment.