-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Updating CLDR data #941
base: master
Are you sure you want to change the base?
Conversation
Many tests seem to be wrong just for example in tests/test_languages.py:805 for language Zulu Current data translates Additionally languages like This PR fixes those issues but currently, the tests are not updated. @noviluni, please suggest should I update the tests accordingly. A review will be helpful. Thanks Note: This PR breaks 39 tests. |
Hi @gavishpoddar, My initial idea was to update version by version, but it's OK if we update directly to the last version as you did. After that we will need to check file by file to see if we are removing things that could generate "breaking changes" (and possibly adding them to our own data), but before starting the review I would like to understand why you removed the "version". It is really important to point to a specific version and not directly to master to easily understand which version are we pointing and to be able to update easily in the future (master could be "incomplete" or "wrong"). In the past we didn't have a way to know it, so we didn't know which version we were using and how outdated we were, so I would like you to reconsider adding again the thanks! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I am trying to fix update the CLDR data which is breaking the multiple tests so I am trying to highlight a few changes I have made along with the reasoning.
Please check and suggest.
tests/test_languages.py
Outdated
@@ -802,7 +802,7 @@ def setUp(self): | |||
|
|||
# zu | |||
param('zu', "3 mashi 2007 ulwesibili 10:08", "3 march 2007 tuesday 10:08"), | |||
param('zu', "son 23 umasingana 1996", "sunday 23 january 1996"), | |||
param('zu', "isonto 23 Januwari 1996", "sunday 23 january 1996"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was incorrectly translated verified via Google Translation
tests/test_languages.py
Outdated
@@ -573,7 +573,7 @@ def setUp(self): | |||
param('mn', "12 9-р сар 2019 пүрэв", "12 september 2019 thursday"), | |||
|
|||
# mr | |||
param('mr', "16 फेब्रुवारी 1908 गुरु 02:03 मउ", "16 february 1908 thursday 02:03 pm"), | |||
param('mr', "16 फेब्रुवारी 1908 गुरु 02:03", "16 february 1908 thursday 02:03"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLDR 39 has removed मउ (pm) -> pm. Either way, both of them are wrong.
tests/test_languages.py
Outdated
@@ -210,7 +210,7 @@ def setUp(self): | |||
|
|||
# as | |||
param('as', '17 জানুৱাৰী 1885', '17 january 1885'), | |||
param('as', 'বৃহষ্পতিবাৰ 1 জুলাই 2009', 'thursday 1 july 2009'), | |||
param('as', 'বৃহস্পতিবাৰ 1 জুলাই 2009', 'thursday 1 july 2009'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect previous test: I know the language
tests/test_languages.py
Outdated
@@ -270,7 +270,7 @@ def setUp(self): | |||
|
|||
# bs-Latn | |||
param('bs-Latn', "23 septembar 1879, petak", "23 september 1879 friday"), | |||
param('bs-Latn', "subota 1 avg 2009 02:27 popodne", "saturday 1 august 2009 02:27 pm"), | |||
param('bs-Latn', "subota 1 aug 2009 02:27 popodne", "saturday 1 august 2009 02:27 pm"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be verified but new CLDR 39 data updates this.
tests/test_languages.py
Outdated
param('ce', "6 январь 1987 пӏераскан де", "6 january 1987 friday"), | ||
param('ce', "оршотан де 3 июль 1890", "monday 3 july 1890"), | ||
param('ce', "6 январь 1987 пӏераска", "6 january 1987 friday"), | ||
param('ce', "оршот де 3 июль 1890", "monday 3 july 1890"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chechen Language: Can't be verified but new CLDR 39 data updates this.
tests/test_languages.py
Outdated
param('kl', "2 martsi 2001 ataasinngorneq", "2 march 2001 monday"), | ||
param('kl', "pin 1 oktoberi 1901", "wednesday 1 october 1901"), | ||
param('kl', "2 marsi 2001 ataasinngorneq", "2 march 2001 monday"), | ||
param('kl', "pin 1 oktobari 1901", "wednesday 1 october 1901"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kalaallisut; Greenlandic: Can't be verified but new CLDR 39 data updates this.
tests/test_languages.py
Outdated
|
||
# kln | ||
param('kln', "3 ng'atyaato koang'wan 10:09 kooskoliny", "3 february thursday 10:09 pm"), | ||
param('kln', "kipsuunde nebo aeng' 14 2009 kos", "december 14 2009 wednesday"), | ||
|
||
# kok | ||
param('kok', "1 नोव्हेंबर 2000 आदित्यवार 01:19 मनं", "1 november 2000 sunday 01:19 pm"), | ||
param('kok', "1 नोव्हेंबर 2000 आयतार 01:19", "1 november 2000 sunday 01:19"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Konkani (Indian language): Can't be verified but new CLDR 39 data updates this.
tests/test_languages.py
Outdated
param('qu', "5 pauqar waray 1878 miércoles", "5 march 1878 wednesday"), | ||
param('qu', "6 int 2009 domingo", "6 june 2009 sunday"), | ||
param('qu', "5 marzo 1878 miércoles", "5 march 1878 wednesday"), | ||
param('qu', "6 jun 2009 domingo", "6 june 2009 sunday"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quechua : Can't be verified but new CLDR 39 data updates this.
tests/test_languages.py
Outdated
param('so', "sab 5 bisha saddexaad 1765 11:08 gn", "saturday 5 march 1765 11:08 pm"), | ||
param('so', "16 lit 2008 axd", "16 december 2008 sunday"), | ||
param('so', "sabti 5 bisha saddexaad 1765 11:08 gd", "saturday 5 march 1765 11:08 pm"), | ||
param('so', "16 desembar 2008 axd", "16 december 2008 sunday"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somali: Verified via google translation.
tests/test_languages.py
Outdated
@@ -741,7 +741,7 @@ def setUp(self): | |||
param('sv', "onsdag 16 mars 08:15 eftermiddag", "wednesday 16 march 08:15 pm"), | |||
|
|||
# sw | |||
param('sw', "5 mei 1994 jumapili 10:17 asubuhi", "5 may 1994 sunday 10:17 am"), | |||
param('sw', "5 mei 1994 jumapili 10:17", "5 may 1994 sunday 10:17"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swahili: asubuhi
means in the morning
. Verified via google translation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing tests
tests/test_languages.py
Outdated
@@ -1159,7 +1159,7 @@ def test_translation(self, shortname, datetime_string, expected_translation): | |||
param('dav', "15 juma", "15 week"), | |||
# de | |||
param('de', "nächstes jahr", "in 1 year"), | |||
param('de', "letzte woche 04:25 nachm", "1 week ago 04:25 pm"), | |||
param('de', "vor einer Woche 04:25 nachm", "1 week ago 04:25 pm"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
German: Verified via google translate
tests/test_languages.py
Outdated
@@ -1139,7 +1139,7 @@ def test_translation(self, shortname, datetime_string, expected_translation): | |||
param('cgg', "5 omwaka", "5 year"), | |||
# chr | |||
param('chr', "ᎯᎠ ᎢᏯᏔᏬᏍᏔᏅ", "0 minute ago"), | |||
param('chr', "ᎾᎿ 8 ᎧᎸᎢ ᏥᎨᏒ", "8 month ago"), | |||
param('chr', "8 ꭷꮈ ꮵꭸꮢ", "8 month ago"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cherokee: Can't be verified but new CLDR 39 data updates this.
tests/test_languages.py
Outdated
@@ -1197,7 +1197,7 @@ def test_translation(self, shortname, datetime_string, expected_translation): | |||
param('et', "1 a pärast", "in 1 year"), | |||
param('et', "4 tunni eest", "4 hour ago"), | |||
# eu | |||
param('eu', "aurreko hilabetea", "1 month ago"), | |||
param('eu', "aurreko hilabetean", "1 month ago"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basque: Verified via google translate.
tests/test_languages.py
Outdated
@@ -1266,7 +1266,7 @@ def test_translation(self, shortname, datetime_string, expected_translation): | |||
param('id', "dalam 43 menit", "in 43 minute"), | |||
param('id', "dlm 23 dtk", "in 23 second"), | |||
# ig | |||
param('ig', "nnyaafụ", "1 day ago"), | |||
param('ig', "ụnyaahụ", "1 day ago"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Igbo: Partially correct. Verified via google translate. Better than previous.
tests/test_languages.py
Outdated
@@ -1240,7 +1240,7 @@ def test_translation(self, shortname, datetime_string, expected_translation): | |||
param('gsw', "moorn", "in 1 day"), | |||
param('gsw', "geschter", "1 day ago"), | |||
# gu | |||
param('gu', "2 વર્ષ પહેલા", "2 year ago"), | |||
param('gu', "2 વર્ષ પહેલાં", "2 year ago"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gujarati: Same output via google translate for પહેલા
or પહેલાં
tests/test_languages.py
Outdated
@@ -1420,7 +1420,7 @@ def test_translation(self, shortname, datetime_string, expected_translation): | |||
param('ms', "bulan depan", "in 1 month"), | |||
# mt | |||
param('mt', "ix-xahar li għadda", "1 month ago"), | |||
param('mt', "2 sena ilu", "2 year ago"), | |||
param('mt', "2 snin ilu", "2 year ago"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maltese: Same output via google translate for sena
or snin
tests/test_languages.py
Outdated
param('nn', "for 5 minutter siden", "5 minute ago"), | ||
param('nn', "om 3 uker", "in 3 week"), | ||
param('nn', "for 5 min sidan", "5 minute ago"), | ||
param('nn', "om 3 veke", "in 3 week"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Norwegian Nynorsk: Can't be verified but new CLDR 39 data updates this. Google translate has no support for Nynorsk
tests/test_search.py
Outdated
@@ -608,8 +608,7 @@ def test_splitting_of_not_parsed(self, shortname, string, expected, settings=Non | |||
|
|||
# Hindi | |||
param('hi', | |||
'जुलाई 1937 में, मार्को-पोलो ब्रिज हादसे का बहाना लेकर जापान ने चीन पर हमला कर दिया और चीनी साम्राज्य ' | |||
'की राजधानी बीजिंग पर कब्जा कर लिया,'), | |||
'जुलाई 1937 में, मार्को-पोलो ब्रिज हादसे का बहाना की राजधानी बीजिंग पर कब्जा कर लिया. '), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hindi: Test failed because of incorrect language detection.
At this point, 7 tests are failing in I am unable to fix them please help. |
@gavishpoddar the builds for this PR were not enabled (it's a newish github feature), sorry about that - just enabled them. |
Fixing translation
…dateparser into pr/gavishpoddar/963
Fixing translation
Codecov Report
@@ Coverage Diff @@
## master #941 +/- ##
=======================================
Coverage 98.29% 98.29%
=======================================
Files 234 234
Lines 2694 2700 +6
=======================================
+ Hits 2648 2654 +6
Misses 46 46
Continue to review full report at Codecov.
|
39.0.0
.https://github.com/unicode-cldr/cldr-dates-full
(archived) ->https://github.com/unicode-org/cldr-json
.TODO :
Fixes issue #940