-
Notifications
You must be signed in to change notification settings - Fork 0
CSV to YAML converter for UN/LOCODE data
License
sergeuz/locode
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project is an attempt to provide and maintain internationalized database of world countries, their regions, cities and other localities. Contains 'loc2yaml', CSV to YAML converter tool for UN/LOCODE data, which is chosen as foundation for locations database. Helper tool called 'loctrans', allowing to generate initial translations to different languages is provided as well (uses Google Translate service). UN/LOCODE homepage: http://www.unece.org/cefact/locode/welcome.html Recent CSV files with locations data can be downloaded here: http://www.unece.org/cefact/codesfortrade/codes_index.html Examples of loc2yaml usage: loc2yaml unlocode.csv Process all the data of 'unlocode.csv' file and put generated files into default 'country' directory, creating it in current directory if necessary. loc2yaml -c us,ca -o path/to/yaml/files unlocode.csv Process only data related to USA and Canada, putting generated files into specified directory. loc2yaml -o world.yaml *.csv Generate single YAML file for all CSV files found in current directory. Note that loc2yaml doesn't support wildcard file names natively. loc2yaml -m -r -o world.yaml *.yaml Merge all YAML files found in current directory into single file. All content of destination file will be overwritten (see -r flag). Examples of loctrans usage: loctrans -k GOOGLE_API_KEY -t uk -o dest/ua.yaml src/ua.yaml Generate translations for Ukrainian language and save results as separate file. Default translations will be used as source texts. loctrans -k GOOGLE_API_KEY -c ru -s en -t ru world.yaml Assuming 'world.yaml' contains location data for several countries, generate translations only for Russia (target language is also Russian) and update original YAML file. Existent English translations will be used as source texts when possible. Structure of generated documents: 01 country: 02 'NO': 03 name: 04 default: Norge 05 en: Norway 06 region: 07 '01': 08 name: Østfold 09 city: 10 FRK: Fredrikstad 11 SPG: Sarpsborg 12 '02': 13 name: 02 (TODO) 14 .NONE: 15 city: 16 .EMR: 17 name: Emerald City 18 flags: preserve, odd Format notes: * Country, region and city codes are case-sensitive (lines 2, 7, 10, 11, 12, 16), loc2yaml expects them to be in upper case to update files correctly. Reserved words such as 'name', 'region', etc. should be in lower case. * Country codes should comply with 2-letter codes defined in ISO 3166-1 standard (line 2). * Custom region codes are formally allowed, though should be used only if absolutely necessary. Ensure there are no intersections with codes defined in ISO 3166-2 for country subdivisions (lines 7, 12, 14). Note that these standard subdivision codes are used without redundant country prefix. * Custom city codes are allowed for entries missing in source UN/LOCODE data. In above context term 'city' may correspond to any locality within grouping region (lines 10, 11, 16). * loc2yaml groups orphane localities using special '.NONE' region code (line 14). In most if not all cases such entries should be moved to proper region manually. * There are several ways to provide names for countries, regions and cities. In-place naming, where entry name follows its code right at the same line, is supported only for cities (lines 10, 11). * Translations to different languages can be provided by expanding 'name' element (lines 3-5). It's recommended to use ISO 639-1 language codes and always provide a "default" translation. * When loc2yaml is unable to provide full name, e.g. region names in newly generated files, it uses placeholder name based on locality code with '(TODO)' suffix added to it (line 13). * Comma-separated list of parser hints (described below) can be provided for loc2yaml to alter its behavior when updating certain entries (line 18). Parser hints: * preserve Protects certain entry from deletion when loc2yaml is invoked to update existent YAML file and source UN/LOCODE data doesn't contain such entry anymore. Usually applied to custom entries when loc2yaml is expected to be used with --no-obsolete argument. * odd Not recognized by loc2yaml directly, but can be useful to mark undesirable entries for target applications. Being simply removed such entries likely will be restored on next updating invocation of loc2yaml.
About
CSV to YAML converter for UN/LOCODE data
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published