Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split POT file depending on specified depth #77

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

antoniolinhart
Copy link
Contributor

@antoniolinhart antoniolinhart commented Sep 13, 2023

Hey all! This is a PR in relation to #67 to split the pot file depending on
the specified depth. Another PR will be submitted to change gettext
to merge a directory of pot files.

  • Add an input param to xgettext to take in an integer for the depth to
    split the POT file. Remove the input-file param
  • Mimics the same structure as the SUMMARY file for the Book
  • Revise the helper fn to create a render ctx for tests. Now, it can
    create on-the-fly directories if the markdown file exists in a
    sub-directory
  • Test the new split catalog function with different possible inputs

@antoniolinhart
Copy link
Contributor Author

@mgeisler What do you think about the implementation of splitting the template file? In this version I chose to add messages that did not have a matching source to a catalog that was saved in the file name specified in the input params.

I'm open to changing how we use that pot-file input param, I thought that this made the most sense for now. Would you want to add another optional field for output directory if we have a depth > 0?

@antoniolinhart antoniolinhart marked this pull request as ready for review September 14, 2023 04:47
@mgeisler
Copy link
Collaborator

  • Mimics the directory structure for the book

I think this is a key decision. I was thinking we would use the Book (from RenderContext) as the driver of this — not the file names.

We currently iterate over the BookItems in depth-first order (via Book::iter), but we could instead iterate over the sections field. I think that would let you avoid having to think of where SUMMARY.md fits into the picture.

As for what to name the POT files, I guess this can be derived from Chapter::path? Perhaps we need a top-level output directory (po/) and then we create po/foo.pot for a Chapter { path: foo.md } chapter. We would put all sub_items into the same po/foo.pot file if we're splitting at depth = 1. If we use depth = 2, then we would create po/foo.pot with only the data from this chapter and create po/foo/bar.pot with everything from the sub_items.

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

@antoniolinhart
Copy link
Contributor Author

  • Mimics the directory structure for the book

I think this is a key decision. I was thinking we would use the Book (from RenderContext) as the driver of this — not the file names.

We currently iterate over the BookItems in depth-first order (via Book::iter), but we could instead iterate over the sections field. I think that would let you avoid having to think of where SUMMARY.md fits into the picture.

Oh! That's a good point for using the sections directly instead of that built in iterator. Until looking into the Book items struct it was hard to know exactly why that worked the way it did.

If I follow through all of the Chapters into their sub items and take the content from each Chapter and so on, that should include all the messages, right? I'm running into an interesting situation where there's some messages that aren't there that should be. Comparing between a current messages.pot and a messages.pot from my program with depth 0, it's missing about 10 messages that all appear to be the Rust scripts that get included as sample code. There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

As for what to name the POT files, I guess this can be derived from Chapter::path? Perhaps we need a top-level output directory (po/) and then we create po/foo.pot for a Chapter { path: foo.md } chapter. We would put all sub_items into the same po/foo.pot file if we're splitting at depth = 1. If we use depth = 2, then we would create po/foo.pot with only the data from this chapter and create po/foo/bar.pot with everything from the sub_items.

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you!

Still working through the revised implementation!

@mgeisler
Copy link
Collaborator

mgeisler commented Sep 15, 2023

There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

Much of the Rust code is included via the {{#include foo.rs}} system. That syntax is a mdbook extension. This is done by the built-in links preprocessor.

Could the scripts be related to this?

Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago 🙂

@mgeisler
Copy link
Collaborator

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you!

Still working through the revised implementation!

Sounds great, thank you so much for your help here!

@mgeisler
Copy link
Collaborator

Hi @antoniolinhart, just a small thing: the format check fails. You should configure your editor to call rustfmt automatically (or run dprint fmt on the code).

@antoniolinhart
Copy link
Contributor Author

There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

Much of the Rust code is included via the {{#include foo.rs}} system. That syntax is a [mdbook extension]https://rust-lang.github.io/mdBook/format/mdbook.html#including-files). This is done by the built-in links preprocessor.

Could the scripts be related to this?

Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago 🙂

Looks like I need to pull main more often 😆 This seems to be what my issue was. Thank you!

@mgeisler
Copy link
Collaborator

Hey @antoniolinhart, I skimmed through the updated code — from the test cases it looks like it works? Super cool!

You should rebase onto the latest main and resolve the merge conflict.

.vscode/settings.json Outdated Show resolved Hide resolved
i18n-helpers/Cargo.toml Outdated Show resolved Hide resolved
@antoniolinhart
Copy link
Contributor Author

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere 🙂

The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier

@mgeisler
Copy link
Collaborator

mgeisler commented Oct 2, 2023

The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier

Oh! I had not realized this until now...

We can change our own SUMMARY.md file, but we won't be able to the change the upstream format easily.

I think the code right now splits the POT file into ~63 files, whereas we only have ~13 parts. Do you think you can add code for tracking the parts? I would suggest transforming the part title into a slug using something like this:

fn slug(title: &str) -> String {
    title
        .split_whitespace()
        .map(|word| {
            word.to_lowercase()
                .chars()
                .filter(|&ch| ch == '-' || !ch.is_ascii_punctuation())
                .collect::<String>()
        })
        .filter(|word| !word.is_empty())
        .collect::<Vec<_>>()
        .join("-")
}

I tested it a bit on our own titles plus some headings from the Rust Book and they look reasonable.

@antoniolinhart
Copy link
Contributor Author

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere 🙂

After the most recent improvements I think it's closer to expectations - depth 1 splits each 'part' into a separate catalog.

Here's the output I get for the following depths (using the Android chapter as an example after depth = 1):

Depth 1 (note that summary.pot includes messages from any chapters prior to encountering a PartTitle):

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 1}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX
po/XX
├── android.pot
├── bare-metal-afternoon.pot
├── bare-metal-morning.pot
├── concurrency-afternoon.pot
├── concurrency-morning.pot
├── day-1-afternoon.pot
├── day-1-morning.pot
├── day-2-afternoon.pot
├── day-2-morning.pot
├── day-3-afternoon.pot
├── day-3-morning.pot
├── final-words.pot
├── solutions.pot
└── summary.pot

Depth 2

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 2}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl.pot                                                                
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Depth 3

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 3}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl                                                                    
│   │   ├── changing-api.pot                                                    
│   │   ├── client.pot                                                          
│   │   ├── deploy.pot                                                          
│   │   ├── implementation.pot                                                  
│   │   ├── interface.pot                                                       
│   │   └── server.pot                                                          
│   ├── aidl.pot                                                                
│   ├── build-rules                                                             
│   │   ├── binary.pot                                                          
│   │   └── library.pot                                                         
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability                                                        
│   │   ├── with-c.pot                                                          
│   │   └── with-java.pot                                                       
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Depth 4

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 4}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer
$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl                                                                    
│   │   ├── changing-api.pot                                                    
│   │   ├── client.pot                                                          
│   │   ├── deploy.pot                                                          
│   │   ├── implementation.pot                                                  
│   │   ├── interface.pot                                                       
│   │   └── server.pot                                                          
│   ├── aidl.pot                                                                
│   ├── build-rules                                                             
│   │   ├── binary.pot                                                          
│   │   └── library.pot                                                         
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability                                                        
│   │   ├── with-c                                                              
│   │   │   ├── calling-c-with-bindgen.pot                                      
│   │   │   └── calling-rust-from-c.pot                                         
│   │   ├── with-c.pot                                                          
│   │   └── with-java.pot                                                       
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Comment on lines 67 to 68
/// Transform a Book into a HashMap of (file_name, catalog) pairs.
/// The depth will be used to crawl the chapters and sub-chapters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny thing: the docstrings are Markdown and start with a one-line description of the function. You can see the results of this with cargo doc --open locally:

Suggested change
/// Transform a Book into a HashMap of (file_name, catalog) pairs.
/// The depth will be used to crawl the chapters and sub-chapters
/// Transform a Book into a HashMap of `(file_name, catalog)` pairs.
///
/// The depth will be used to crawl the chapters and sub-chapters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh~ I was not aware that this was how that was generated How do I rebuild this doc to reflect local changes? It seems to be for ToT and not based on my local branch, even when I cargo build

Copy link
Collaborator

@mgeisler mgeisler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @antoniolinhart, thanks for all the work put into this! I think the functionality is super now, I just have a bunch of small comments about naming and a few Rust tricks you can apply.

I hope you can incorporate it over the next few weeks.

i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
i18n-helpers/src/bin/mdbook-xgettext.rs Outdated Show resolved Hide resolved
@mgeisler
Copy link
Collaborator

Hi @antoniolinhart! Happy new year 🥳

Would you be able to rebase this PR on top of the latest main? We should focus on getting it merged to prevent it from bit-rotting!

Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to msgcat everything back together before using the translation).

@antoniolinhart
Copy link
Contributor Author

Hi @antoniolinhart! Happy new year 🥳

Would you be able to rebase this PR on top of the latest main? We should focus on getting it merged to prevent it from bit-rotting!

Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to msgcat everything back together before using the translation).

Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this. That's a good point! My goal will be to (this week/next week) get the comments responded to / get it rebased on main since I know the longer I wait the harder it will be 😅

Thanks for your patience on this!

@mgeisler
Copy link
Collaborator

Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this.

That's completely okay! The functionality here is luckily not blocking anything right now.

My goal will be to (this week/next week) get the comments responded to / get it rebased on main since I know the longer I wait the harder it will be 😅

Thanks for sticking with it! I think the conflicts should be small at this point, so it should be possible to revive this PR without too much trouble.

@antoniolinhart
Copy link
Contributor Author

Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support!

@mgeisler
Copy link
Collaborator

Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support!

I'm the one thanking! Let me look it through now.

Copy link
Collaborator

@mgeisler mgeisler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this, let's merge this now and then see about the combination side of things later!

@mgeisler
Copy link
Collaborator

Hey @antoniolinhart, I hope the changes in #107 doesn't cause too big conflicts here! I think we can merge this straight after it gets rebased.

@antoniolinhart
Copy link
Contributor Author

@mgeisler

Should be good to go! It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

@mgeisler
Copy link
Collaborator

@mgeisler

Should be good to go!

Great!

It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it 😄

@antoniolinhart
Copy link
Contributor Author

@mgeisler
Should be good to go!

Great!

It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it 😄

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 91.91176% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 90.58%. Comparing base (a4ca49b) to head (440e352).
Report is 28 commits behind head on main.

Files Patch % Lines
i18n-helpers/src/lib.rs 83.66% 27 Missing and 6 partials ⚠️
i18n-helpers/src/xgettext.rs 95.78% 2 Missing and 23 partials ⚠️
i18n-helpers/src/bin/mdbook-xgettext.rs 0.00% 7 Missing ⚠️
i18n-helpers/src/normalize.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   90.62%   90.58%   -0.05%     
==========================================
  Files          11       12       +1     
  Lines        2368     3132     +764     
  Branches     2368     3132     +764     
==========================================
+ Hits         2146     2837     +691     
- Misses        159      211      +52     
- Partials       63       84      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This is a PR in relation to google#67 to split the pot file depending on the
specified depth. Another PR will be submitted to change gettext to
merge a directory of pot files.
@mgeisler
Copy link
Collaborator

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to

  1. Create a temporary branch at the point where your split-pot-file branch was.
  2. Hard reset split-pot-file to main to throw away all commits and merges and what not.
  3. Soft reset split-pot-file back to the temporary branch. This added all the changes back, but as a delta to main.
  4. Commit this under your name and force push back to your repository.

If the tests pass, then I'll merge this next.

@mgeisler mgeisler merged commit d25c6ef into google:main Feb 29, 2024
7 checks passed
@antoniolinhart
Copy link
Contributor Author

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to

  1. Create a temporary branch at the point where your split-pot-file branch was.
  2. Hard reset split-pot-file to main to throw away all commits and merges and what not.
  3. Soft reset split-pot-file back to the temporary branch. This added all the changes back, but as a delta to main.
  4. Commit this under your name and force push back to your repository.

If the tests pass, then I'll merge this next.

Wonderful! Thank you I will try this out next time I run into the same issue.

cip999 added a commit to cip999/mdbook-i18n-helpers that referenced this pull request Dec 13, 2024
PR google#77 added the first half of a functionality requested in google#67, i.e. being
able to split the catalog across a tree of POT files. However, there are
edge cases where the generated files can have identical path relative to
the po/ directory, with the effect of collapsing messages for unrelated
parts or chapters in the same POT fie.

This commit handles deduplication through a "black box" struct
`UniquePathBuilder`, and adds a doctest and a unit test for collision
scenarios. The new version is backward-compatible for cases that
are collision-free in the old version.

Contextually, simplify the recursion logic in `get_subcontent_for_chapter`.
cip999 added a commit to cip999/mdbook-i18n-helpers that referenced this pull request Dec 13, 2024
PR google#77 added the first half of a functionality requested in google#67, i.e. being
able to split the catalog across a tree of POT files. However, there are
edge cases where the generated files can have identical path relative to
the po/ directory, with the effect of collapsing messages for unrelated
parts or chapters in the same POT fie.

This commit handles deduplication through a "black box" struct
`UniquePathBuilder`, and adds a doctest and a unit test for collision
scenarios. The new version is backward-compatible for cases that
are collision-free in the old version.

Contextually, simplify the recursion logic in `get_subcontent_for_chapter`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants