Skip to content

Translation Workflow

cst edited this page Mar 16, 2016 · 13 revisions

How Panic Button is doing it

Git repositories:

1. panicbutton.io (jekyll site)

2. PanicButton (Android app)

Panicbutton.io Transifex Workflow

Markdown source

[Status: Existing in the panicbutton.io repository]

The markdown files in the panicbutton.io Jekyll site's _posts directory serve as the source files for the translation process. Authors use Prose to edit the markdown source files, in English, triggering a rebuild of the site on update of a markdown post.

Files that move from the panicbutton.io repository to the PanicButton Android app through the Transifex translation process are located in _posts/mobile and _posts/help. These relevant markdown files contain the yaml "categories" assignment "mobile" or "help".

Markdown -> JSON Key-Value Pair conversion

[Status: script completed in panicbutton.io/_locales/lib/tx_md2jsonKV.py]

Because the markdown files for panicbutton.io contain extensive amounts of yaml front matter, it is necessary to convert the files to JSON prior to sending them to Transifex. Transifex does not handle the mixed yaml/markdown source well. A push to the panicbutton.io repository triggers travis.yaml to launch the bash script, tx_md2jsonKV.py, which converts all markdown files in _posts into JSON Key-Value pairs comprising key-value pairs. tx_md2jsonKV.py builds these new JSON Key-Value files in the _locales directory.

In this conversion process, there is one JSON file for each language, with each master JSON file containing yaml front-matter and content of that language's markdown files. Each item of yaml front-matter represents a new key, and the markdown content also comprises one key, "content". For example, if the folder _posts/mobile/ has ten markdown files, the resulting JSON file, mobile.json, will comprise all ten of the markdown files, each broken down into its yaml and markdown key-value pairs.

tx_md2jsonKV.py documentation

To run tx_md2jsonKV.py, navigate into _locales/lib and python tx_md2jsonKV.py. Dependencies: JQ.

tx_md2jsonKV.py follows several steps, progressing as follows:

All translated markdown files, when returned from Transifex are contained in language-specific directories within the _posts directory. Because we only want to translate the English language files within _posts, we must exclude directories of translated files from the JSON KV generation process. tx_md2jsonKV.py first creates an array of the existing language directories, taken from config.py.

Note: The language array in config.py is, at present, manually populated. In the future, it should be populated using some type of link to the panicbutton.io Transifex interface, which lists all available languages for translation. This is to be addressed in future iterations of the script.

####Processing each markdown file When it is run, tx_md2jsonKV.py walks through all directories in _posts not containing translated content, performing the following steps on each file:

  1. First, tx_md2jsonKV.py locates the second "---" YAML demarcation, splitting the markdown "content" of the file from the YAML front-matter. The script writes the markdown text to _locales/lib/temp/{FILENAME}.md.txt

  2. Next, the script opens the original markdown file again, this time to harvest the YAML front-matter. The script takes everything above the second "---" YAML demarcation. The script assigns each piece of YAML front matter a unique key, which will be used to retain the original order of the YAML front-matter at the close of the Transifex process. This key also serves as an ID that allows for easy replacement of the original, untranslated YAML front-matter by the returned, translated JSON content.

The key is attached at the beginning of each front-matter item, using the formation K{NUM}-{FRONTMATTERITEM}.

Each file's YAML front matter, with the appended key, is written to the directory _locales/lib/temp/{FILENAME}.md.yml.

  1. Each YAML file in _locales/lib/temp/{FILENAME}.md.yml is opened. The YAML front-matter, complete with its keys assigned in Step 2.2 is converted into JSON using the module yaml2json.py, imported into tx_md2jsonKV.py. The resulting JSON file is written to _locales/lib/temp/{FILENAME}.md.json.orig.

  2. Next, each .json.orig file in _locales/lib/temp/{FILENAME}.md.json.orig is fed into the script pre-tx-push.jq. This script uses JQ, a highly efficient command-line JSON processor.

First, the JQ file imports white-listed YAML keys, contained in the file jqconfig.json. These keys have been added because they are assigned to translatable content in the markdown files. The jqconfig.json file is manually generated, and it is a TODO to make the file accessible to content editors/site administrators without full backend access, and to automate the process of populating the jqconfig.json file accordingly.

jqconfig.json specifies keys and subkeys. keys are YAML front-matter items that are paired directly with translatable content. subkeys contain arrays of keys, some of which are linked to translatable content.

pre-tx-push.jq first cycles through each of the keys. If a key is linked to a top-level YAML item, it creates a new JSON KV pair. The key is {FILENAME}---{KEYNAME} and the value is the content of that key.

Next, pre-tx-push.jq cycles through each of the subkeys. If the subkey's arrays contain translatable keys, then the script performs a similar key processing step as above, except this time, all subkeys are included in the resulting JSON key, such that {FILENAME}---{SUBKEY-1}---{SUBKEY-2}...---{SUBKEYN}---{KEYNAME} and the value is content of the key buried inside of the subkeys.

The resulting, flat JSON KV array for each .json.orig file is written to _locales/lib/temp/ as {FILENAME}.md.json.tmp.

  1. Next, each .txt file in _locales/lib/temp/{FILENAME}.md.txt is opened, and its contents stored as a JSON KV pair, with the key following the format {FILENAME}---content. This new JSON KV pair is appended to the JSON in the corresponding _locales/lib/temp/{FILENAME}.json.tmp file. The resulting JSON is written as _locales/lib/temp/{FILENAME}.json.

####Final steps Once this processing is completed, the contents of the _locales/lib/temp/{FILENAME}.json files are combined into a single file, the above-mentioned tx_mdtoJSON.en.json. This is the file that is fed into Transifex in order to spark the translation processes.

Although Transifex accepts nested JSON, per their documentation, we have opted to create a flat JSON KV file in this manner in order to retain more control over the assignment of content keys, which must work with our very specific translation process.

Transifex configuration

[Status: .tx config creation in progress]

Transifex is hooked into panicbutton.io's root in a .tx folder that contains the Transifex config file (.tx/config). The config file alerts the Transifex client to files in the repository that are subject to translation in Transifex; dictates the naming convention for the returned, translated files; and sets the file format (here, Key-Value JSON).

JSON -> Transifex: Registering a change

[Status: TXGH Integration TODO]

If the newly built JSON Key-Value files (in _locales, from tx_md2jsonKV.py) differ from the ones that they replace, TXGH, a Sinatra-based transifex-github integration/server, registers this change to the files in _locales. It updates Transifex to show that each language's repository is no longer 100% translated. This cues translators to update each language's translations.

Transifex: Returning translated files to repository

[Status: TXGH integration TODO, script completed in panicbutton.io repo: _locales/lib/tx_jsonKV2md.py ]

When a translator completes a language's translation, and the Transifex repository registers 100% completion, TXGH pushes the newly translated JSON Key-Value files into panicbutton.io's git repository at _locales/<lang>/, overwriting the files that exist in this repository (the old translations).

This push triggers another bash script prompted by travis.yaml, tx_jsonKV2md.py, which converts the JSON Key-Value files in _locales/<lang>/ back into markdown. The new markdown files are placed in _posts/<lang>/, replacing the markdown files from the previous round of translations.

Documentation for tx_jsonKV2md.py script

####1. Running the script

To run the tx_jsonKV2md.py script, navigate into the _locales/lib directory and run python tx_jsonKV2md.py

####2. What the script does

When translated files are returned from Transifex, they are processed by the tx_jsonKV2md.py script according to the following steps.

Transifex returns one JSON KV file for each translated language. These files are returned to _locales/{LANG}/ and they are named tx_jsonKV_{LANG}.json.

Like the JSON KV file that was sent to Transifex, these returned files are flat JSON KV files. Each key identifies the translated markdown filename and the corresponding YAML front-matter or markdown content that has been translated.

In the case of nested YAML front-matter, the subkeys that contain the translated key are also included in the JSON KV key name. For more information about this, see step ==2.4== in the markdown to JSON KV conversion documentation.

tx_jsonKV2md cycles through each language in config.py, performing the following actions for each.

Note: The language array in config.py is, at present, manually populated. In the future, it should be populated using some type of link to the panicbutton.io Transifex interface, which lists all available languages for translation. This is to be addressed in future iterations of the script.

  1. First, it creates a JSON dictionary from the file _locales/{LANG}/tx_jsonKV_{LANG}.json.

  2. Next, it splits this JSON dictionary into objects, each of which represents the translated content of one of the markdown file that were fed into the JSON KV file during the markdown -> JSON KV conversion process.

Each JSON key is added as an attribute of the object.

Nested JSON keys (ie those that are contained within subkeys) are stored as attributes nested inside of objects. There is an object that represents each level of subkey. So, if there is a key, "title" inside of a subkey, "action", inside of a subkey, "checklist," then a "checklist" object will be created and appended to the markdown file object. Another object, "action," will be appeneded to the "checklist" object, "title" will be added as a key to the "action" object. This allows the objects to grow in the tree-like structure that is representative of the YAML front-matter that will eventually result.

Note: the above is very difficult to describe, and I think it could be done with clearer language. To see the process, see function breakfiles and the helper functions that branch off of it.

Once all attributes have been assigned and sub-objects created, each markdown file's object is placed into a dictionary (jsonObj.obj_dict), the keys of which are the markdown filename.

  1. For each markdown file object in the dictionary of objects jsonObj.obj_dict, if the object contains the top-level attribute "content", this attribute represents the markdown text of the file, not a piece of YAML front-matter. In this case, the "content" value is written to a text document, _locales/{LANG}/temp/{FILENAME}.md_translated.txt

  2. Each markdown file's object is merged with contents of the "original", untranslated JSON, located in _locales/lib/temp/{FILENAME}.json.orig. Translated content replaces untranslated content; this is made possible because the keys of each set of translated / untranslated content are identical.

The resulting JSON dictionary, in which all translated content has replaced the original, English content, is written as JSON to _locales/{LANG}/temp/{FILENAME}.md.trans.json. Importantly, this file contains only YAML front matter; markdown content, mentioned in step #4 and not included in the .json.orig files is excluded.

  1. Each .trans.json file is piped through the command line tool json2yaml, converting the json into YAML. This YAML content is written to _locales/{LANG}/temp/{FILENAME}.md.trans.yaml.

  2. Finally, the markdown content is created. A markdown file is created at _posts/{lang}/{FILENAME}.md.

  3. The YAML front matter at _locales/temp/{FILENAME}.trans.yaml is processed to (a) correctly order the contents, per the alphanumeric key (b) remove the alphanumeric key from the YAML front matter item name (c) adjust the spacing, which is inflated by the json2yaml command line tool in step #5.

  4. The processed YAML front matter is written into the markdown file created at _posts/{lang}/{FILENAME}.md, between the necessary YAML demarcations, "---".

  5. The markdown text -- _locales/{LANG}/temp/{FILENAME}.md_translated.txt -- is opened and appended to _posts/{LANG}/{FILENAME}.md, completing the conversion process.

Pushing Markdown files to Android Panic Button App

[Status: travis.yaml cloning and pushing to Android application TODO]

On each push to the panicbutton.io dev branch, travis.yaml performs a process to transform and push relevant information to the PanicButton Android application repository. First, travis.yaml git clones the PanicButton Android application's asset repository. The script then converts all of the markdown files in _posts/mobile and _posts/help (including all files in the _posts/mobile/<lang> and _posts/help/<lang> directories) into custom, PanicButton-specific JSON following templates located in /api/help.json and /api/mobile.json. These files are named mobile_<lang>.json and help_<lang>.json.

Upon creation, the files are copied into the newly cloned PanicButton App assets directory, over-writing the now-outdated, equivalently-named files.

Next, travis.yaml calls git add -A && git commit -m "Updating translations to the PanicButton Application" && git push origin dev. If the newly created JSON files, mobile_<lang>.json, differ from the previous versions (eg if the git push that called the script represents an updated translation), then the changes will be committed and pushed to the PanicButton Android Application's repository. This triggers a rebuild of the app. If the JSON files, mobile_<lang>.json, have not changed, then the cloned directory will not register any change on git add -A, and so the git commit && git push origin dev will fail without triggering an error.

Panic Button Android App Additional Transifex workflow

strings.XML

[Status: .tx config creation in progress. TXGH integration TODO]

The PanicButton Android app contains an additional Transifex workflow. The app has translatable material independent of the panicbutton.io source files in the /src/main/res/values/strings.xml file.

Thus, the PanicButton Android Application repository also houses a .tx folder with its own config file, which dictates the translation source, resulting file names, and format for strings.xml.

A TXGH integration in the PanicButton app repository will register a pushed change in the strings.xml file, updating strings.xml in the Transifex repository to show that translation is no longer complete and alerting translators that an update to the translation is necessary.

Upon completion of the translation, TXGH will return the translated files to the PanicButton repository, placing them in the directories /src/main/res/values-<lang>, with each language's file named strings.xml. Existing translations in each of these directories will be overwritten by this action from TXGH. The app will be rebuilt upon return of new files.