Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge data files before lookup #98

Merged
merged 2 commits into from
Dec 13, 2021

Conversation

movermeyer
Copy link
Collaborator

@movermeyer movermeyer commented Dec 7, 2021

What are you trying to accomplish?

Fixes #96.

Some parts (ldml, ldmlBCP47 amd supplementalData) of CLDR data require that you merge all the files with the same root element before doing lookups.

Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format

That way, CLDR can move data between files, or split files, for purposes of organization without affecting the actual values.

However, ruby-cldr was hard-coding the paths where particular data should be read from.
This lead to data being no longer exported when it was moved in the upstream CLDR.

What approach did you choose and why?

I changed Cldr::Export::Data::Base#doc to return a doc that has all of the related XML files merged into one Nokogiri document.

Merging all of these files is expensive, so I use a class variable as a cache. This might not be the absolute ideal way to do things from a patterns perspective, but it saves us from having to do a much more intense refactor. I like it well enough. 🤷

I use @locale as a flag for whether or not the data is supplemental or locale dependent. This will not work if we ever want to use this for the other types of data in CLDR (keyboard, platform, or ldmlBCP47). Perhaps a future refactor would pull this out into subclasses of Base instead (Supplemental and LanguageDependent?)? I have this idea that lib/cldr/export.rb could be radically simpler if it didn't have to ask is_shared_component? and Transforms was less of a special snowflake. But that's a much larger refactor than what I'm attempting right now in this PR. Again, good enough for now. 🤷

What should reviewers focus on?

What do you think about the changes?

The impact of these changes

Fixes #96.

Testing

You could run thor cldr:export for master and this branch, then run diff -r to compare the outputs:

git checkout master
thor cldr:export --target=./data_from_master

git checkout movermeyer:movermeyer/merge_before_lookup
thor cldr:export --target=./data_from_pr

diff -r data_from_master data_from_pr

Which will show 3 classes of diffs:

  1. Only in data_from_master/en-001: rbnf.yml. See comment.
  2. The data/transforms/* files all change slightly since they no longer have a @doc instance variable. This should not be important to any consumer.
  3. Only in data_after: variables.yml, this is the result of Variables stopped being output #96 getting fixed.

Comment on lines +14 to +15
grouping_nodes = select("rbnf/rulesetGrouping")
return {} if grouping_nodes.empty?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change since it didn't make sense to check for the existence of path anymore.

This had a side-effect of cleaning up a edge-case bug:

common/rbnf/en_001.xml exists, but is empty. So this code was returning [] instead of the expected {}. Now that it has been fixed to return {}, en-001/rbnf.yml is no longer output since we don't output empty hashes.

@movermeyer movermeyer force-pushed the movermeyer/merge_before_lookup branch 4 times, most recently from 53747d7 to ecb1dd3 Compare December 7, 2021 18:16
# and the <identity> elements are not important to us / make no sense when combined together.
return Nokogiri::XML('') if paths_to_merge.empty?

rest = paths_to_merge[1..paths_to_merge.size - 1]
Copy link
Collaborator Author

@movermeyer movermeyer Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't just use paths_to_merge[1..] since that's not available in Ruby 2.3, and we haven't officially dropped support for the old versions of Ruby.

@movermeyer movermeyer marked this pull request as ready for review December 7, 2021 18:18
@movermeyer movermeyer requested a review from Korri December 9, 2021 17:46
Comment on lines +88 to +96
rest.inject(Nokogiri::XML(File.read(paths_to_merge.first))) do |result, path|
next_doc = Nokogiri::XML(File.read(path))

next_doc.root.children.each do |child|
result.root.add_child(child)
end

result
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there is a cleaner/clearer way to do this part, but quickly playing around couldn't quickly figure out something better 🤷‍♂️

Some parts (`ldml`, `ldmlBCP47` amd `supplementalData`) of CLDR data require that you merge all the files with the same root element before doing lookups.
This gives you a mechanism to lookup the paths you need to merge together based on the root element.

Ref: https://www.unicode.org/reports/tr35/tr35.html#XML_Format
@movermeyer movermeyer merged commit d1bdfef into ruby-i18n:main Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Variables stopped being output
2 participants