-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Pendo markdown conversion support (#3)
* Initial markdown conversion: basic syntax and strikethrough * Remplement micromark gfm strikethrough syntax plugin for pendo underline * Add strikethrough mdast extension reimplementation for underline * Update tests to use tree nodes instead of snapshots * Add stringify tests for implemented extensions * Bugfix for stringify strikethrough with pluses * Add type for underline mdast node * Fix const reference * Make parser types local to the plugin * Disallow single-tilde strikethrough * Remove position info from parsed md * Convert color syntax to XML nodes before parsing to AST * Add ast conversion from mdast html nodes to custom color node * Update color nodes test to use actual parser * Fix invalid source string in test * Disable markdown syntax unsupported by Pendo * Fix type definitions placement * Make markdown plugins stand out better * Fix test for code indented disable * * Backconvert color nodes * Enable color ast transform in main convert interface * More tests * Force delete position from ast to reduce diff noise * Clean up tree visitor returns * Fix tree visitors again * Implement escaping markdown syntax with components * * Refactor component ast nodes to hold metadata internally * Move component factories to where the definitions are * Support self-closing components * Clean up folder structure * Assemble all steps for producing escaped strings * Implement backconverting of escaped strings * Add todo for color nodes * Add test for backconverting a shuffled string * * Add tests for underline markdown extension * Change default plugin setting to enable single-plus underline * Fix underline plugin node type extension * Move strikethrough plugin typedef * Fix micromark plugin test names * Clean up string transformer color exports * Clean up ast transformer color tests and exports * More tests for the component ast transformer (and some bugfixes) * Fix private types * Add test for backconverting from shuffled string * Make tree type generic and ensure to not mutate original * Sad path tests for escaped string backconversion * Bugfix for double closing tag backconversion * Create TS definitions for a loctool plugin interface * Implement interface for pendo xliff filetype * Add ilib-xliff dependency and define its types * Add type definitions for Resource subclasses * Update typedefs folder structure * Infer concrete created resource type * Constrain resource factory props based on resType * Clean up translation unit typedef * Fix optional overrides type in resource clone method * Allow constraining resource type on a TranslationSet * Implement parsing of the pendo xliff * Relative import sugar * Attach resource fields definitions directly to respective classes * Add missing xliff serialize method declaration * Implement writing out localized files * Revert "Attach resource fields definitions directly to respective classes" This reverts commit 0449675. This currently deviates too much from the original loctool documentation and introduces additional confusion about other missing fields for which interfaces define getters. * Add e2e test loctool project * Fix plugin entrypoint export * Add missing resource types mapping * Fix relative path and buggy locale replacement in name * Fix reference comments * Add todos * Separate plugin responsibilities like paths and locales from file processing * Add support for path template mapping * Add locale mapping * Add test readme * Eslint no-unused-vars ignore underscores * Don't modify trans unit when there are no components * Add filtering of trans units based on their datatype * Example comment * Create unit tests for pendo file processing * Workaround for trans-unit ID vs resname conflicts * Commit e2e translations * Npmrc registry override * Switch to directly using XML tree for transforming pendo xliff files * Clean up repo setup * Update lockfile * Clean up repo setup * Add missing Apache 2.0 headers * Typo fix * Skip backconversion entirely when there are no components in localized string * Fix path mapping templates * Add readme * Fix E2E by adding a devdependency on itself * Add test for xliff 2.0 parsing
- Loading branch information
Showing
53 changed files
with
8,220 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
registry=https://registry.npmjs.org/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,226 @@ | ||
# ilib-loctool-pendo-md | ||
|
||
[Loctool](https://github.com/iLib-js/loctool) plugin to handle translation strings exported from [Pendo](https://www.pendo.io/). | ||
|
||
This plugin accepts an XLIFF file exported from Pendo (`<xliff version="1.2"><file datatype="pendoguide">`) and extracts existing translation units from it mapping them 1:1 to loctool Resources with **escaped Markdown syntax**. | ||
|
||
## Extraction | ||
|
||
### Markdown Syntax | ||
|
||
As per [Pendo documentation](https://support.pendo.io/hc/en-us/articles/360031866552-Use-markdown-syntax-for-guide-text-styling), the app supports a subset of classic Markdown syntax: | ||
|
||
```md | ||
_italics_ or _italics_ | ||
**bold** | ||
[links](example.com) | ||
|
||
1. ordered lists | ||
|
||
- unordered lists | ||
|
||
* unordered lists | ||
|
||
- unordered lists | ||
``` | ||
|
||
And some custom extensions: | ||
|
||
```md | ||
~~Strikethrough~~ | ||
++Underline++ | ||
{color: #000000}colored text{/color} | ||
``` | ||
|
||
There is a high risk of breaking this syntax by translators, so the main task of this plugin is to **escape** this syntax using XML-like component tags `<c0></c0>`. | ||
|
||
### Escaping | ||
|
||
Given a Pendo markdown string like | ||
|
||
```markdown | ||
String with _emphasis_, ++underline++, {color: #FF0000}colored text{/color} and [a link](https://example.com) | ||
``` | ||
|
||
transform it to an escaped string | ||
|
||
```text | ||
String with <c0>emphasis</c0>, <c1>underline</c1>, <c2>color</c2> and <c3>a link</c3> | ||
``` | ||
|
||
### Unescaping (backconversion) | ||
|
||
After parsing a source string, plugin keeps track of escaped components: | ||
|
||
```text | ||
- c0: emphasis | ||
- c1: underline | ||
- c2: color #FF0000 | ||
- c3: link https://example.com | ||
``` | ||
|
||
Thanks to that, during localization this plugin is able to **unescape** (backconvert) these components in a translated string: | ||
|
||
```text | ||
Translated string, <c3>translated link</c3> <c1>translated underline</c1>, <c0>translated emphasis</c0> <c2>translated color</c2> | ||
``` | ||
|
||
it will transform it back to the markdown syntax | ||
|
||
```markdown | ||
Translated string, [translated link](example.com) ++translated underline++, _translated emphasis_ {color: $FF0000}translated color{/color} | ||
``` | ||
|
||
Note that it supports shuffled order of components, since this is often required in different languages. | ||
|
||
## Translation | ||
|
||
During the _localize_ step, this plugin will output a copy of the original Pendo XLIFF for each locale defined in the loctool's `project.json` settings. For each source string which has translation in loctool (i.e. provided via loctool's xliff files), this translation will optionally be unescaped as described above and will be insterted into the corresponting `<target>` element content in the output file. | ||
|
||
Additionally, this plugin supports output locale mapping. | ||
|
||
## Example localization process | ||
|
||
Below you can find a step-by-step process to showcase the plugin's intention. | ||
|
||
Given a source Pendo XLIFF file `$PROJECT/guides/A000A00Aaa0aaa-AaaaAaa00A0a_en.xliff` | ||
|
||
```xml | ||
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2"> | ||
<file original="A000A00Aaa0aaa-AaaaAaa00A0a" datatype="pendoguide" source-language="en-US" target-language=""> | ||
<body> | ||
<group id="Aaaaaaaa0aAaaAAA0AAA0A0aAaa"> | ||
<trans-unit id="8de49842-c1fd-4536-905e-8817673b4c24|md"> | ||
<source><![CDATA[**Callout!**]]></source> | ||
<target></target> | ||
<note>TextView</note> | ||
</trans-unit> | ||
</group> | ||
</body> | ||
</file> | ||
</xliff> | ||
``` | ||
|
||
and the following loctool configuration | ||
|
||
```json | ||
{ | ||
"name": "ilib-loctool-pendo-md-test", | ||
"id": "ilib-loctool-pendo-md-test", | ||
"description": "translate strings exported from Pendo", | ||
"projectType": "custom", | ||
"sourceLocale": "en", | ||
"includes": ["guides/*.xliff"], | ||
"settings": { | ||
"xliffsDir": "translations", | ||
"locales": ["pl-PL"], | ||
"localeMap": { | ||
"pl-PL": "pl" | ||
}, | ||
"pendo": { | ||
"mappings": { | ||
"guides/*.xliff": { | ||
"template": "[dir]/[basename]_[locale].[extension]" | ||
} | ||
} | ||
} | ||
}, | ||
"plugins": ["ilib-loctool-pendo-md"] | ||
} | ||
``` | ||
|
||
invoking | ||
|
||
```sh | ||
loctool localize "$PROJECT" | ||
``` | ||
|
||
will first run the _extract_ step and produce a loctool XLIFF with extracted **escaped** strings `$PROJECT/ilib-loctool-pendo-md-test-extracted.xliff`: | ||
|
||
```xml | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<xliff version="1.2"> | ||
<file original="" source-language="en" product-name="ilib-loctool-pendo-md-test"> | ||
<body> | ||
<trans-unit id="1" resname="8de49842-c1fd-4536-905e-8817673b4c24|md" restype="string" datatype="plaintext"> | ||
<source><c0>Callout!</c0></source> | ||
<note>TextView [c0: strong]</note> | ||
</trans-unit> | ||
</body> | ||
</file> | ||
</xliff> | ||
``` | ||
|
||
notice that: | ||
|
||
1. markdown strong `** **` in the source string is now escaped as components `<c0> </c0>` | ||
2. trans-unit comment is updated to include description of the escaped components: _[c0: strong]_ | ||
|
||
Then, loctool will immediately run the _localize_ step and produce a (not really) localized copy of the source file `$PROJECT/guides/A000A00Aaa0aaa-AaaaAaa00A0a_en_pl.xliff`: | ||
|
||
```xml | ||
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2"> | ||
<file original="A000A00Aaa0aaa-AaaaAaa00A0a" datatype="pendoguide" source-language="en-US" target-language="pl"> | ||
<body> | ||
<group id="Aaaaaaaa0aAaaAAA0AAA0A0aAaa"> | ||
<trans-unit id="8de49842-c1fd-4536-905e-8817673b4c24|md"> | ||
<source><![CDATA[**Callout!**]]></source> | ||
<target/> | ||
<note>TextView</note> | ||
</trans-unit> | ||
</group> | ||
</body> | ||
</file> | ||
</xliff> | ||
``` | ||
|
||
notice that: | ||
|
||
1. target tag stays empty because there is no translation available yet | ||
2. file name includes mapped output locale `pl` rather than the translation locale `pl-PL` | ||
3. `target-language` attribute is also filled using the mapped output locale | ||
|
||
Now you need to obtain translations. Assume you've sent the loctool XLIFF file `$PROJECT/ilib-loctool-pendo-md-test-extracted.xliff` to a linguist and received translations for locale `pl-PL`. Following your project's config, you put it in `$PROJECT/translations/ilib-loctool-pendo-md-test-pl-PL.xliff`: | ||
|
||
```xml | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<xliff version="1.2"> | ||
<file original="" source-language="en" target-language="pl-PL" product-name="ilib-loctool-pendo-md-test"> | ||
<body> | ||
<trans-unit id="1" resname="8de49842-c1fd-4536-905e-8817673b4c24|md" restype="string" datatype="plaintext"> | ||
<source><c0>Callout!</c0></source> | ||
<target><c0>Wywołanie!</c0></target> | ||
<note>TextView [c0: strong]</note> | ||
</trans-unit> | ||
</body> | ||
</file> | ||
</xliff> | ||
``` | ||
|
||
note that the target also has `<c0> </c0>` tags in it, since your linguist knew how to handle XML-like tags properly. | ||
|
||
Running loctool again | ||
|
||
```sh | ||
loctool localize "$PROJECT" | ||
``` | ||
|
||
this time, it will load the _pl-PL_ translations from the file specified in your `xliffsDir` folder and _localize_ step will backconvert and insert those translations while regenerating the (now actually) localized file `$PROJECT/guides/A000A00Aaa0aaa-AaaaAaa00A0a_en_pl.xliff`: | ||
|
||
```xml | ||
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2"> | ||
<file original="A000A00Aaa0aaa-AaaaAaa00A0a" datatype="pendoguide" source-language="en-US" target-language="pl"> | ||
<body> | ||
<group id="Aaaaaaaa0aAaaAAA0AAA0A0aAaa"> | ||
<trans-unit id="8de49842-c1fd-4536-905e-8817673b4c24|md"> | ||
<source><![CDATA[**Callout!**]]></source> | ||
<target state="translated">**Wywołanie!**</target> | ||
<note>TextView</note> | ||
</trans-unit> | ||
</group> | ||
</body> | ||
</file> | ||
</xliff> | ||
``` | ||
|
||
which you can safely import back to Pendo. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Note on unstable `micromark` versions | ||
|
||
Currently, this plugin uses the following versions of `micromark`-related dependencies: | ||
|
||
```json | ||
"mdast-util-from-markdown": "^0", | ||
"mdast-util-to-markdown": "^0", | ||
"mdast-util-gfm-strikethrough": "^0", | ||
"micromark-extension-gfm-strikethrough": "^0" | ||
``` | ||
|
||
this is because all these packages became ESM-only at the moment of their stable release `1.0.0`, while at the time of writing `loctool` is written in CommonJS (so it can't `import`) and loads plugins synchronously (so it can't `import()` either) - see plugin loader source at https://github.com/iLib-js/loctool/blob/v2.25.1/lib/CustomProject.js#L116. | ||
|
||
This should be _mostly fine_, since these versions have been used publicly in `remark v13` (i.e. `remark-parse@9`). In addition, Pendo strings are expected to have low complexity due to being pre-segmented prior to export from Pendo. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,24 @@ | ||
export * from "./addNumbers"; | ||
/** | ||
* Copyright © 2024, Box, Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licensefs/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
import type { Plugin } from "loctool"; | ||
import PendoXliffFileType from "./loctool/PendoXliffFileType"; | ||
|
||
// loctool plugin entrypoint | ||
const plugin: Plugin = PendoXliffFileType; | ||
|
||
export = plugin; |
Oops, something went wrong.