-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split POT file depending on specified depth #77
Conversation
@mgeisler What do you think about the implementation of splitting the template file? In this version I chose to add messages that did not have a matching source to a catalog that was saved in the file name specified in the input params. I'm open to changing how we use that pot-file input param, I thought that this made the most sense for now. Would you want to add another optional field for output directory if we have a depth > 0? |
I think this is a key decision. I was thinking we would use the We currently iterate over the As for what to name the POT files, I guess this can be derived from There is no guarantee that nested chapters are stored in nested files, but I think we should be safe if we follow the logical structure of the |
Oh! That's a good point for using the sections directly instead of that built in iterator. Until looking into the Book items struct it was hard to know exactly why that worked the way it did. If I follow through all of the Chapters into their sub items and take the content from each Chapter and so on, that should include all the messages, right? I'm running into an interesting situation where there's some messages that aren't there that should be. Comparing between a current messages.pot and a messages.pot from my program with depth 0, it's missing about 10 messages that all appear to be the Rust scripts that get included as sample code. There are many scripts included in the messages that are sample code, there just happen to be some that don't come through?
This makes more sense to follow the book instead. I think it will lead to the split depth input being a lot more logical too. Thank you! Still working through the revised implementation! |
Much of the Rust code is included via the Could the scripts be related to this? Oh... another idea: are you perhaps seeing the effects of #75 by @dyoo? It was merged just a few days ago 🙂 |
Sounds great, thank you so much for your help here! |
Hi @antoniolinhart, just a small thing: the format check fails. You should configure your editor to call |
Looks like I need to pull |
Hey @antoniolinhart, I skimmed through the updated code — from the test cases it looks like it works? Super cool! You should rebase onto the latest |
The challenge with this is that Android isn't actually a top-level chapter, mdbook treats it as a part title and doesn't contain the sub-items as expected like "interoperability" "aidl", "build-rules". I think if we wanted something like this we'd have to leverage the listed directory in the code, or change the SUMMARY file to make it easier |
Oh! I had not realized this until now... We can change our own I think the code right now splits the POT file into ~63 files, whereas we only have ~13 parts. Do you think you can add code for tracking the parts? I would suggest transforming the part title into a slug using something like this: fn slug(title: &str) -> String {
title
.split_whitespace()
.map(|word| {
word.to_lowercase()
.chars()
.filter(|&ch| ch == '-' || !ch.is_ascii_punctuation())
.collect::<String>()
})
.filter(|word| !word.is_empty())
.collect::<Vec<_>>()
.join("-")
} I tested it a bit on our own titles plus some headings from the Rust Book and they look reasonable. |
After the most recent improvements I think it's closer to expectations - depth 1 splits each 'part' into a separate catalog. Here's the output I get for the following depths (using the Android chapter as an example after depth = 1): Depth 1 (note that summary.pot includes messages from any chapters prior to encountering a PartTitle):
Depth 2
Depth 3
Depth 4
|
/// Transform a Book into a HashMap of (file_name, catalog) pairs. | ||
/// The depth will be used to crawl the chapters and sub-chapters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny thing: the docstrings are Markdown and start with a one-line description of the function. You can see the results of this with cargo doc --open
locally:
/// Transform a Book into a HashMap of (file_name, catalog) pairs. | |
/// The depth will be used to crawl the chapters and sub-chapters | |
/// Transform a Book into a HashMap of `(file_name, catalog)` pairs. | |
/// | |
/// The depth will be used to crawl the chapters and sub-chapters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh~ I was not aware that this was how that was generated How do I rebuild this doc to reflect local changes? It seems to be for ToT and not based on my local branch, even when I cargo build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @antoniolinhart, thanks for all the work put into this! I think the functionality is super now, I just have a bunch of small comments about naming and a few Rust tricks you can apply.
I hope you can incorporate it over the next few weeks.
Hi @antoniolinhart! Happy new year 🥳 Would you be able to rebase this PR on top of the latest Even if the functionality to merge the split PO files are not yet there, the splitting functionality could be useful to someone (they would then have to write a tiny script to |
Happy new year! I got taken away with some high priority team projects so it's been a while since I've been able to look at this. That's a good point! My goal will be to (this week/next week) get the comments responded to / get it rebased on Thanks for your patience on this! |
That's completely okay! The functionality here is luckily not blocking anything right now.
Thanks for sticking with it! I think the conflicts should be small at this point, so it should be possible to revive this PR without too much trouble. |
Should be ready for another round of review @mgeisler ! Thank you for your extended patience and support! |
I'm the one thanking! Let me look it through now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this, let's merge this now and then see about the combination side of things later!
Hey @antoniolinhart, I hope the changes in #107 doesn't cause too big conflicts here! I think we can merge this straight after it gets rebased. |
Should be good to go! It appears that the diff base is not against the correct source so it's showing that I changed a lot of files that I obtained from rebasing on the updated code. I tried looking around but it appears that this merge is into |
Great!
Yeah, something is off here — I'll try to untangle it and then merge the PR. It'll take me a day of two to get around to it 😄 |
Thanks!! Also the clippy failure was due to me leaving some debugging logs in by accident (after the last time I ran clippy) so that should be fixed now 👍 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #77 +/- ##
==========================================
- Coverage 90.62% 90.58% -0.05%
==========================================
Files 11 12 +1
Lines 2368 3132 +764
Branches 2368 3132 +764
==========================================
+ Hits 2146 2837 +691
- Misses 159 211 +52
- Partials 63 84 +21 ☔ View full report in Codecov by Sentry. |
This is a PR in relation to google#67 to split the pot file depending on the specified depth. Another PR will be submitted to change gettext to merge a directory of pot files.
440e352
to
ab7c48d
Compare
Hey Antonio, I think I got your changes back into a simple branch. I don't actually know if you did anything wrong here. What I did on my side was to
If the tests pass, then I'll merge this next. |
Wonderful! Thank you I will try this out next time I run into the same issue. |
PR google#77 added the first half of a functionality requested in google#67, i.e. being able to split the catalog across a tree of POT files. However, there are edge cases where the generated files can have identical path relative to the po/ directory, with the effect of collapsing messages for unrelated parts or chapters in the same POT fie. This commit handles deduplication through a "black box" struct `UniquePathBuilder`, and adds a doctest and a unit test for collision scenarios. The new version is backward-compatible for cases that are collision-free in the old version. Contextually, simplify the recursion logic in `get_subcontent_for_chapter`.
PR google#77 added the first half of a functionality requested in google#67, i.e. being able to split the catalog across a tree of POT files. However, there are edge cases where the generated files can have identical path relative to the po/ directory, with the effect of collapsing messages for unrelated parts or chapters in the same POT fie. This commit handles deduplication through a "black box" struct `UniquePathBuilder`, and adds a doctest and a unit test for collision scenarios. The new version is backward-compatible for cases that are collision-free in the old version. Contextually, simplify the recursion logic in `get_subcontent_for_chapter`.
Hey all! This is a PR in relation to #67 to split the pot file depending on
the specified depth. Another PR will be submitted to change
gettext
to merge a directory of
pot
files.xgettext
to take in an integer for the depth tosplit the POT file. Remove the input-file param
create on-the-fly directories if the markdown file exists in a
sub-directory