Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate missing content from default language during build #612

Merged
merged 23 commits into from
Apr 15, 2024

Conversation

lianmakesthings
Copy link
Collaborator

@lianmakesthings lianmakesthings commented Apr 5, 2024

Closes #611

Introducing a bunch of changes in this PR to make dealing with languages much easier.

The Problem

Hugo expects the entire content for a language to exist in a single folder, e.g. content/zh. Currently, we are asking everyone who introduces a new language or new content to duplicate all english content to all languages.

The Solution

Instead of manually duplicating content I have introduced a script which iterates over all files in the default language folder (en) and duplicates all files that do not exist in the target language, By using the -L flag this command also handles symlinks, i.e. it duplicates the linked content.
This script will run before the preview and production builds.

What Changes

  • English content no longer needs to be duplicated to other languages
  • The (local) dev version won't show the duplicated English pages unless the duplicate_missing_content script is explicitly triggered

Copy link

netlify bot commented Apr 5, 2024

Deploy Preview for tag-app-delivery ready!

Name Link
🔨 Latest commit a38e546
🔍 Latest deploy log https://app.netlify.com/sites/tag-app-delivery/deploys/661d345ed088ec0008cb6466
😎 Deploy Preview https://deploy-preview-612--tag-app-delivery.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

github-actions bot commented Apr 5, 2024

Action Required

You are adding or updating English content so please take the following actions for other languages.

  • If you add new content under website/content/en or targets of the symbolic links in the same directory, please replicate it in the corresponding directories of all other languages. (e.g. If you create website/content/en/blog/new-post.md, you should copy it to website/content/ja/blog/new-post.md, etc.)
  • If you update the content in the same location, please perform the following actions for the corresponding content in other languages.
    • If the content has not been translated yet, replace the files with the updated English version.
    • If the content has already been translated, include a note suggesting that users check the English page for the most recent updates.

Signed-off-by: lianmakesthings <[email protected]>
@lianmakesthings
Copy link
Collaborator Author

@cjyabraham could you also take a look at this please? 🙏

@lianmakesthings lianmakesthings changed the title Improve language support Duplicate missing content from default language during build Apr 5, 2024
@cjyabraham
Copy link
Contributor

This is a nice solution.

One concern, it will create multiple copies of the same content. Should we define the canonical url of copies so that we don't get flagged by Google for duplicate content or the wrong one gets prioritized in the search rankings? Is this a legitimate risk we should address?

@lianmakesthings lianmakesthings force-pushed the improve-language-support branch from ad1459a to 00dbdb4 Compare April 6, 2024 09:58
@lianmakesthings
Copy link
Collaborator Author

One concern, it will create multiple copies of the same content. Should we define the canonical url of copies so that we don't get flagged by Google for duplicate content or the wrong one gets prioritized in the search rankings? Is this a legitimate risk we should address?

This is for SEO purposes? And we already have this situation right now with the manually duplicated content, right?

I figure the easiest way to do that would be to add the canonical url to the front matter so it can be easily duplicated. I can look into that and if it's not too much work add this. On the other hand, our website is fairly small and I'm not sure how big of a problem this actually poses.

@lianmakesthings lianmakesthings force-pushed the improve-language-support branch from 00dbdb4 to 807b6b6 Compare April 6, 2024 10:06
@cjyabraham
Copy link
Contributor

One concern, it will create multiple copies of the same content. Should we define the canonical url of copies so that we don't get flagged by Google for duplicate content or the wrong one gets prioritized in the search rankings? Is this a legitimate risk we should address?

This is for SEO purposes? And we already have this situation right now with the manually duplicated content, right?

I figure the easiest way to do that would be to add the canonical url to the front matter so it can be easily duplicated. I can look into that and if it's not too much work add this. On the other hand, our website is fairly small and I'm not sure how big of a problem this actually poses.

Yes, this is for SEO. Yes, we already have that situation. True, I'm not sure this is a real problem but just thought I'd raise it in case anyone else knew.

@cjyabraham
Copy link
Contributor

Thinking a bit more about the UX of this solution, is it weird that a person who is browsing the site in French, for example, would suddenly come across a page in English that should be in French? How would they interpret that? Would they possibly conclude the site is broken for seeing an English page on a /fr/ url? Would they perhaps start hunting around for the French translation?

To improve this usability, it may help to add a message at the top of such a page saying something like this (but in French): "This page has not yet been translated into French, however, here is the English version of the page for now."

Any thoughts?

@lianmakesthings
Copy link
Collaborator Author

lianmakesthings commented Apr 8, 2024

@cjyabraham
yeah, that's another good question to think about.
For this PR, I suggest to keep the scope as is: To only automate the current manual process of duplicating english content.

I will also open three additional issues to evaluate if we want to add these features:

@hhiroshell
Copy link
Contributor

@lianmakesthings
It looks nice idea to me too. Thank you for your work on further improvements of i18n workflows.

Copy link
Collaborator

@abangser abangser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be an awesome addition and I can so see how it will continue to improve! Well done on tackling the big first push @lianmakesthings !

I made a few nit comments, but have two bigger questions...

  1. When I navigate the site locally I see languages appear and disappear per page based on if there is a translation. For example, on the Operator White paper I see Chinese, Korean and Japanese. But on the platforms maturity model I don't see Korean. And neither page has the Spanish and French additions that have come recently.
  2. With this new model do we have any need for .gitignore to manage things? I can't quite reason about it right now, since we definitely need people to be able to commit to these directories, but also a bunch of the files are sorta ignored now... 🤔

Comment on lines +39 to +40
- If you update content, that has corresponding files in other languages, include a note suggesting that users check the English page for the most recent updates in those translated pages.
- If you add new content under `website/content/en` there is nothing you need to do.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 This will be a much better experience! I have added a comment to #615 to possibly even improve on this even more 💪

Co-authored-by: abangser <[email protected]>
Signed-off-by: Lian Li <[email protected]>
@lianmakesthings
Copy link
Collaborator Author

lianmakesthings commented Apr 13, 2024

Thanks fro the review @abangser 🙏

To address the comments:

  1. When I navigate the site locally I see languages appear and disappear per page based on if there is a translation. For example, on the Operator White paper I see Chinese, Korean and Japanese. But on the platforms maturity model I don't see Korean. And neither page has the Spanish and French additions that have come recently.

Yes, the behaviour will be different for dev and staging (deploy preview) / prod, due to the flow. On dev, the content is served directly from content/* to a local http server. For staging and prod

  1. the translate_missing_content will iterate over the files in the content/en/ folder, and duplicate every file into the target language folder if it does not have a corresponding (i.e. translated) file yet.
  2. from the original and duplicated content, the static site is built into the public/ folder
  3. the static site is served from. public/

I figured for dev purposes it's okay/better to have the duplicated stuff out of the way and if you really need to get a 1:1 mirror, you can also manually execute the script and then run make serve or hugo serve.
What do you think?

  1. With this new model do we have any need for .gitignore to manage things? I can't quite reason about it right now, since we definitely need people to be able to commit to these directories, but also a bunch of the files are sorta ignored now... 🤔

I don't think .gitignore needs to come into play here. Which files do you think could/should be ignored?

@abangser
Copy link
Collaborator

abangser commented Apr 13, 2024

Yes, the behaviour will be different for dev and staging (deploy preview) / prod, due to the flow. On dev, the content is served directly from content/* to a local http server. For staging and prod

Ahhhh I read that comment but hadn't understood it. Thanks for the clarification. So long as it is working as expected in prod I think that is fine 👍

I don't think .gitignore needs to come into play here. Which files do you think could/should be ignored?

As I said, it was me thinking out loud. I guess what I am concerned with is someone has all these files locally and commits them thereby overwriting the dynamic creation introduced in this PR (and possibly causing drift etc). But realistically we will spot that in a PR review so likely not a concern. Sorry for the noise, but thanks for the space to think it through!

I will add approval here, but I realise it is a bit ceremonial as I can not merge. Might help with other people's reviews though!

Copy link
Collaborator

@abangser abangser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a binding review, but I have read through this, asked some questions and feel good about this change. I am really excited to lower the cost to add and manage translated content to the site!

@lianmakesthings
Copy link
Collaborator Author

lianmakesthings commented Apr 13, 2024

As I said, it was me thinking out loud. I guess what I am concerned with is someone has all these files locally and commits them thereby overwriting the dynamic creation introduced in this PR (and possibly causing drift etc). But realistically we will spot that in a PR review so likely not a concern. Sorry for the noise, but thanks for the space to think it through!

Yeah, the issue is that we cannot know which files were added due to the script and which ones are genuinely translated files, unless a human being looks at them. And in a normal dev workflow you shouldn't need to duplicate the files locally anyway. So my suggestion would be to catch these in PR reviews.

Thanks for voicing and talking through your concerns!

lianmakesthings and others added 7 commits April 15, 2024 15:21
Signed-off-by: lianmakesthings <[email protected]>
Signed-off-by: lianmakesthings <[email protected]>
Signed-off-by: lianmakesthings <[email protected]>
Signed-off-by: lianmakesthings <[email protected]>
Signed-off-by: lianmakesthings <[email protected]>
Copy link
Contributor

@angellk angellk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome - thanks @lianmakesthings! Lgtm

@lianmakesthings lianmakesthings merged commit 571f986 into main Apr 15, 2024
6 checks passed
@lianmakesthings lianmakesthings deleted the improve-language-support branch April 15, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Language Support
5 participants