Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO-8859-2 , GB-18030 and ISO-2022 #7

Open
Aoi-hosizora opened this issue Jan 22, 2021 · 0 comments
Open

ISO-8859-2 , GB-18030 and ISO-2022 #7

Aoi-hosizora opened this issue Jan 22, 2021 · 0 comments

Comments

@Aoi-hosizora
Copy link

  1. newRecognizer_8859_2_xx function should return newRecognizer_8859_2(xx) which uses ISO-8859-2 rather than newRecognizer_8859_1(xx) which uses ISO-8859-1.

chardet/single_byte.go

Lines 325 to 336 in 3af4cd4

func newRecognizer_8859_2_cs() *recognizerSingleByte {
return newRecognizer_8859_1("cs", &ngrams_8859_2_cs)
}
func newRecognizer_8859_2_hu() *recognizerSingleByte {
return newRecognizer_8859_1("hu", &ngrams_8859_2_hu)
}
func newRecognizer_8859_2_pl() *recognizerSingleByte {
return newRecognizer_8859_1("pl", &ngrams_8859_2_pl)
}
func newRecognizer_8859_2_ro() *recognizerSingleByte {
return newRecognizer_8859_1("ro", &ngrams_8859_2_ro)
}

  1. As charset "GB-18030" need to change "GB18030" #2 and The correct iana name for GB18030 is without the dash #3 says, GB18030 should be GB18030.

"GB-18030",

  1. ISO-2022-XX charsets language should be ja, ko and cn

chardet/2022.go

Lines 83 to 101 in 3af4cd4

func newRecognizer_2022JP() *recognizer2022 {
return &recognizer2022{
"ISO-2022-JP",
escapeSequences_2022JP,
}
}
func newRecognizer_2022KR() *recognizer2022 {
return &recognizer2022{
"ISO-2022-KR",
escapeSequences_2022KR,
}
}
func newRecognizer_2022CN() *recognizer2022 {
return &recognizer2022{
"ISO-2022-CN",
escapeSequences_2022CN,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant