Split POT file depending on specified depth #77

antoniolinhart · 2023-09-13T06:21:08Z

Hey all! This is a PR in relation to #67 to split the pot file depending on
the specified depth. Another PR will be submitted to change gettext
to merge a directory of pot files.

Add an input param to xgettext to take in an integer for the depth to
split the POT file. Remove the input-file param
Mimics the same structure as the SUMMARY file for the Book
Revise the helper fn to create a render ctx for tests. Now, it can
create on-the-fly directories if the markdown file exists in a
sub-directory
Test the new split catalog function with different possible inputs

antoniolinhart · 2023-09-13T06:25:25Z

@mgeisler What do you think about the implementation of splitting the template file? In this version I chose to add messages that did not have a matching source to a catalog that was saved in the file name specified in the input params.

I'm open to changing how we use that pot-file input param, I thought that this made the most sense for now. Would you want to add another optional field for output directory if we have a depth > 0?

src/bin/mdbook-xgettext.rs

mgeisler · 2023-09-14T08:38:42Z

Mimics the directory structure for the book

I think this is a key decision. I was thinking we would use the Book (from RenderContext) as the driver of this — not the file names.

We currently iterate over the BookItems in depth-first order (via Book::iter), but we could instead iterate over the sections field. I think that would let you avoid having to think of where SUMMARY.md fits into the picture.

As for what to name the POT files, I guess this can be derived from Chapter::path? Perhaps we need a top-level output directory (po/) and then we create po/foo.pot for a Chapter { path: foo.md } chapter. We would put all sub_items into the same po/foo.pot file if we're splitting at depth = 1. If we use depth = 2, then we would create po/foo.pot with only the data from this chapter and create po/foo/bar.pot with everything from the sub_items.

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

antoniolinhart · 2023-09-15T06:07:57Z

Mimics the directory structure for the book

I think this is a key decision. I was thinking we would use the Book (from RenderContext) as the driver of this — not the file names.

We currently iterate over the BookItems in depth-first order (via Book::iter), but we could instead iterate over the sections field. I think that would let you avoid having to think of where SUMMARY.md fits into the picture.

Oh! That's a good point for using the sections directly instead of that built in iterator. Until looking into the Book items struct it was hard to know exactly why that worked the way it did.

If I follow through all of the Chapters into their sub items and take the content from each Chapter and so on, that should include all the messages, right? I'm running into an interesting situation where there's some messages that aren't there that should be. Comparing between a current messages.pot and a messages.pot from my program with depth 0, it's missing about 10 messages that all appear to be the Rust scripts that get included as sample code. There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

As for what to name the POT files, I guess this can be derived from Chapter::path? Perhaps we need a top-level output directory (po/) and then we create po/foo.pot for a Chapter { path: foo.md } chapter. We would put all sub_items into the same po/foo.pot file if we're splitting at depth = 1. If we use depth = 2, then we would create po/foo.pot with only the data from this chapter and create po/foo/bar.pot with everything from the sub_items.

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you!

Still working through the revised implementation!

mgeisler · 2023-09-15T08:53:48Z

There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

Much of the Rust code is included via the {{#include foo.rs}} system. That syntax is a mdbook extension. This is done by the built-in links preprocessor.

Could the scripts be related to this?

Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago 🙂

mgeisler · 2023-09-15T08:55:01Z

There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the Book in this manner.

This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you!

Still working through the revised implementation!

Sounds great, thank you so much for your help here!

mgeisler · 2023-09-15T12:44:59Z

Hi @antoniolinhart, just a small thing: the format check fails. You should configure your editor to call rustfmt automatically (or run dprint fmt on the code).

antoniolinhart · 2023-09-15T17:23:38Z

There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?

Much of the Rust code is included via the {{#include foo.rs}} system. That syntax is a [mdbook extension]https://rust-lang.github.io/mdBook/format/mdbook.html#including-files). This is done by the built-in links preprocessor.

Could the scripts be related to this?

Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago 🙂

Looks like I need to pull main more often 😆 This seems to be what my issue was. Thank you!

src/bin/mdbook-xgettext.rs

mgeisler · 2023-09-20T13:40:10Z

Hey @antoniolinhart, I skimmed through the updated code — from the test cases it looks like it works? Super cool!

You should rebase onto the latest main and resolve the merge conflict.

.vscode/settings.json

i18n-helpers/src/bin/mdbook-xgettext.rs

i18n-helpers/Cargo.toml

i18n-helpers/src/bin/mdbook-xgettext.rs

antoniolinhart · 2023-09-29T14:19:09Z

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere 🙂

The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier

mgeisler · 2023-10-02T14:07:11Z

The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier

Oh! I had not realized this until now...

We can change our own SUMMARY.md file, but we won't be able to the change the upstream format easily.

I think the code right now splits the POT file into ~63 files, whereas we only have ~13 parts. Do you think you can add code for tracking the parts? I would suggest transforming the part title into a slug using something like this:

fn slug(title: &str) -> String {
    title
        .split_whitespace()
        .map(|word| {
            word.to_lowercase()
                .chars()
                .filter(|&ch| ch == '-' || !ch.is_ascii_punctuation())
                .collect::<String>()
        })
        .filter(|word| !word.is_empty())
        .collect::<Vec<_>>()
        .join("-")
}

I tested it a bit on our own titles plus some headings from the Rust Book and they look reasonable.

antoniolinhart · 2023-10-06T00:54:38Z

Hi @antoniolinhart, thanks for the cleanup. I tried checking out the branch and ran

% MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "depth": 1}}' mdbook build -d po1 
2023-09-29 10:38:28 [INFO] (mdbook::book): Book building has started
2023-09-29 10:38:28 [INFO] (mdbook::book): Running the xgettext backend
2023-09-29 10:38:28 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

This gives me

% tree po1                                                                  (~/src/comprehensive-rust)
po1
└── src
    ├── android
    │   ├── aidl.pot
    │   ├── build-rules.pot
    │   ├── interoperability.pot
    │   ├── logging.pot
    │   └── setup.pot
    ├── android.pot

I had hoped there would be a single POT file for each chapter in the top-level. Like this:

% tree expected1                                                            (~/src/comprehensive-rust)
expected1
├── android.pot
├── async.pot
├── bare-metal.pot
├── basic-syntax.pot
├── cargo.pot

I suppose the problem in an off-by-one error somewhere 🙂

After the most recent improvements I think it's closer to expectations - depth 1 splits each 'part' into a separate catalog.

Here's the output I get for the following depths (using the Android chapter as an example after depth = 1):

Depth 1 (note that summary.pot includes messages from any chapters prior to encountering a PartTitle):

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 1}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

$ tree po/XX
po/XX
├── android.pot
├── bare-metal-afternoon.pot
├── bare-metal-morning.pot
├── concurrency-afternoon.pot
├── concurrency-morning.pot
├── day-1-afternoon.pot
├── day-1-morning.pot
├── day-2-afternoon.pot
├── day-2-morning.pot
├── day-3-afternoon.pot
├── day-3-morning.pot
├── final-words.pot
├── solutions.pot
└── summary.pot

Depth 2

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 2}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl.pot                                                                
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Depth 3

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 3}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl                                                                    
│   │   ├── changing-api.pot                                                    
│   │   ├── client.pot                                                          
│   │   ├── deploy.pot                                                          
│   │   ├── implementation.pot                                                  
│   │   ├── interface.pot                                                       
│   │   └── server.pot                                                          
│   ├── aidl.pot                                                                
│   ├── build-rules                                                             
│   │   ├── binary.pot                                                          
│   │   └── library.pot                                                         
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability                                                        
│   │   ├── with-c.pot                                                          
│   │   └── with-java.pot                                                       
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

Depth 4

$ MDBOOK_OUTPUT='{"xgettext": {"depth": 4}}'   mdbook build -d po/XX
2023-10-05 17:36:46 [INFO] (mdbook::book): Book building has started
2023-10-05 17:36:46 [INFO] (mdbook::book): Running the xgettext backend
2023-10-05 17:36:46 [INFO] (mdbook::renderer): Invoking the "xgettext" renderer

$ tree po/XX                                                                   
po/XX                                                                           
├── android                                                                     
│   ├── aidl                                                                    
│   │   ├── changing-api.pot                                                    
│   │   ├── client.pot                                                          
│   │   ├── deploy.pot                                                          
│   │   ├── implementation.pot                                                  
│   │   ├── interface.pot                                                       
│   │   └── server.pot                                                          
│   ├── aidl.pot                                                                
│   ├── build-rules                                                             
│   │   ├── binary.pot                                                          
│   │   └── library.pot                                                         
│   ├── build-rules.pot                                                         
│   ├── exercises.pot                                                           
│   ├── interoperability                                                        
│   │   ├── with-c                                                              
│   │   │   ├── calling-c-with-bindgen.pot                                      
│   │   │   └── calling-rust-from-c.pot                                         
│   │   ├── with-c.pot                                                          
│   │   └── with-java.pot                                                       
│   ├── interoperability.pot                                                    
│   ├── logging.pot                                                             
│   ├── setup.pot                                                               
│   └── welcome.pot

mgeisler · 2023-10-19T06:46:23Z

i18n-helpers/src/bin/mdbook-xgettext.rs

+/// Transform a Book into a HashMap of (file_name, catalog) pairs.
+/// The depth will be used to crawl the chapters and sub-chapters


Tiny thing: the docstrings are Markdown and start with a one-line description of the function. You can see the results of this with cargo doc --open locally:

Suggested change

/// Transform a Book into a HashMap of (file_name, catalog) pairs.

/// The depth will be used to crawl the chapters and sub-chapters

/// Transform a Book into a HashMap of `(file_name, catalog)` pairs.

///

/// The depth will be used to crawl the chapters and sub-chapters

Ooh~ I was not aware that this was how that was generated How do I rebuild this doc to reflect local changes? It seems to be for ToT and not based on my local branch, even when I cargo build

i18n-helpers/src/bin/mdbook-xgettext.rs

mgeisler

Hi @antoniolinhart, thanks for all the work put into this! I think the functionality is super now, I just have a bunch of small comments about naming and a few Rust tricks you can apply.

I hope you can incorporate it over the next few weeks.

i18n-helpers/src/bin/mdbook-xgettext.rs

mgeisler · 2024-01-15T13:00:54Z

Hi @antoniolinhart! Happy new year 🥳

Would you be able to rebase this PR on top of the latest main? We should focus on getting it merged to prevent it from bit-rotting!

Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to msgcat everything back together before using the translation).

antoniolinhart · 2024-01-16T16:52:05Z

Hi @antoniolinhart! Happy new year 🥳

Would you be able to rebase this PR on top of the latest main? We should focus on getting it merged to prevent it from bit-rotting!

Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to msgcat everything back together before using the translation).

Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this. That's a good point! My goal will be to (this week/next week) get the comments responded to / get it rebased on main since I know the longer I wait the harder it will be 😅

Thanks for your patience on this!

mgeisler · 2024-01-17T08:47:57Z

Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this.

That's completely okay! The functionality here is luckily not blocking anything right now.

My goal will be to (this week/next week) get the comments responded to / get it rebased on main since I know the longer I wait the harder it will be 😅

Thanks for sticking with it! I think the conflicts should be small at this point, so it should be possible to revive this PR without too much trouble.

antoniolinhart · 2024-01-30T01:51:13Z

Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support!

mgeisler · 2024-02-12T17:34:22Z

Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support!

I'm the one thanking! Let me look it through now.

i18n-helpers/src/xgettext.rs

mgeisler

Thank you so much for this, let's merge this now and then see about the combination side of things later!

mgeisler · 2024-02-20T13:17:47Z

Hey @antoniolinhart, I hope the changes in #107 doesn't cause too big conflicts here! I think we can merge this straight after it gets rebased.

antoniolinhart · 2024-02-21T19:23:05Z

@mgeisler

Should be good to go! It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

mgeisler · 2024-02-22T13:46:17Z

@mgeisler

Should be good to go!

Great!

It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it 😄

antoniolinhart · 2024-02-22T17:18:29Z

@mgeisler
Should be good to go!

Great!

It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into google:main which should have the updated files?

Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it 😄

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

codecov-commenter · 2024-02-29T09:54:56Z

Codecov Report

Attention: Patch coverage is 91.91176% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 90.58%. Comparing base (a4ca49b) to head (440e352).
Report is 28 commits behind head on main.

Files	Patch %	Lines
i18n-helpers/src/lib.rs	83.66%	27 Missing and 6 partials ⚠️
i18n-helpers/src/xgettext.rs	95.78%	2 Missing and 23 partials ⚠️
i18n-helpers/src/bin/mdbook-xgettext.rs	0.00%	7 Missing ⚠️
i18n-helpers/src/normalize.rs	88.88%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   90.62%   90.58%   -0.05%     
==========================================
  Files          11       12       +1     
  Lines        2368     3132     +764     
  Branches     2368     3132     +764     
==========================================
+ Hits         2146     2837     +691     
- Misses        159      211      +52     
- Partials       63       84      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This is a PR in relation to google#67 to split the pot file depending on the specified depth. Another PR will be submitted to change gettext to merge a directory of pot files.

mgeisler · 2024-02-29T11:04:42Z

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to

Create a temporary branch at the point where your split-pot-file branch was.
Hard reset split-pot-file to main to throw away all commits and merges and what not.
Soft reset split-pot-file back to the temporary branch. This added all the changes back, but as a delta to main.
Commit this under your name and force push back to your repository.

If the tests pass, then I'll merge this next.

antoniolinhart · 2024-02-29T21:00:20Z

Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍

Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to

Create a temporary branch at the point where your split-pot-file branch was.

Hard reset split-pot-file to main to throw away all commits and merges and what not.

Soft reset split-pot-file back to the temporary branch. This added all the changes back, but as a delta to main.

Commit this under your name and force push back to your repository.

If the tests pass, then I'll merge this next.

Wonderful! Thank you I will try this out next time I run into the same issue.

PR google#77 added the first half of a functionality requested in google#67, i.e. being able to split the catalog across a tree of POT files. However, there are edge cases where the generated files can have identical path relative to the po/ directory, with the effect of collapsing messages for unrelated parts or chapters in the same POT fie. This commit handles deduplication through a "black box" struct `UniquePathBuilder`, and adds a doctest and a unit test for collision scenarios. The new version is backward-compatible for cases that are collision-free in the old version. Contextually, simplify the recursion logic in `get_subcontent_for_chapter`.

antoniolinhart marked this pull request as ready for review September 14, 2023 04:47

mgeisler reviewed Sep 14, 2023

View reviewed changes