Skip to content

FreeDict HOWTO – FreeDict Build System

Sebastian Humenda edited this page Aug 19, 2018 · 19 revisions

FreeDict Build System

Introduction

The build system is based on make and is used to build/convert, validate and distribute dictionaries. It is the common entry point for most of the tools used within FreeDict.

A dictionary is usually in our git repository. For a release, the build system is instructed to convert the dictionaries into the available output formats and creates compressed archives which can be moved to its final destination.
An exception to this procedure are auto-imported dictionaries. These are usually not in a git repository, but at a different location without version control. After these dictionaries have been generated, the make build system is used as described.

A strength of FreeDict is its support for different dictionary platforms. Once a dictionary is available in TEI format, it can be converted to many other formats, to be used with dictionary applications, spell checkers (for this only the headwords or translation equivalents are taken), for printing a book using XSL-FO etc.

This is enabled by two factors. First, XML is purposely very flexible. Second, the tools for converting the TEI files are kept in one place (the tools module) and are shared between the dictionary modules.

The subsequent sections explain the most relevant aspects of the build system, starting with the general structure, the involved Makefiles with their usage and with the API generation.

All paths are relative to the tools directory, if not stated otherwise.

Installation

You should have a local copy of the tools repository, for instance, by cloning it:

git clone https://github.com/freedict/tools

The path to this directory may not contain space, this is a restriction Make puts on us.

The environment variable FREEDICT_TOOLS

FreeDict's build system and its scripts need to find its files, located in the tools directory. This is done with the FREEDICT_TOOLS environment variable. It should be set and point to the tools directory.

On UNIX-alike systems, exporting the variable in the shell configuration as export FREEDICT_TOOLS=/path/... is enough. On Windows, the environment variable must be set in the system settings. Since the approach changes over time, it is best to search for the exact steps on the internet.

Python Scripts

Some bits for converting dictionaries (and for managing them) require Python. To make this process painless, our buildsystem will assist you in setting up the environment. Before you start, you should make sure that the following packages are installed:

  • Python >= 3.4
  • libicu-dev
  • python3-dev

(these are the names of a Debian or derived distribution).

Afterwards, you can execute the mk_venv rule from the root directory of the tools repository. A virtual environment (venv) is Python's way of installing libraries and programs locally without affecting the system-wide installation. If you want to understand how this works and what this command does, use make mk_venv-help and the excellent tutorial from https://developer.akamai.com/blog/2017/06/21/building-virtual-python-environment/. For shortness reasons, the command for installing the virtual environment to the directory ../fd_venv is given below:

make mk_venv P=../fd_venv

*Note: If the creation of your virtual environment fails with a python traceback ending on FileNotFoundError: [Errno 2] No such file or directory: 'icu-config' you need to install the libicu headers, on Debian/Ubuntu, execute sudo apt install libicu-dev.

Make System Structure

The tools directory contains, among other things:

  • XSL conversion style sheets for conversion into other formats
  • the mk directory with the heart of the make-based build system
  • importer scripts, which export dictionaries into FreeDict
  • the API generator
  • and much more

mk/dicts.mk

This file provides all the rules for building a dictionary and is included by the dictionary Makefile. It works exactly on one dictionary and implements all the logic for the conversion process. A minimal Makefile for a dictionary usually looks like this:

FREEDICT_TOOLS ?= ../../tools # fallback, if variable is unset
DISTFILES = AUTHORS ChangeLog COPYING lg1-lg2.tei \
    freedict-P5.xml freedict-P5.rng freedict-P5.dtd freedict-dictionary.css INSTALL Makefile NEWS README
include $(FREEDICT_TOOLS)/mk/dicts.mk

In the first line, the fall back for the FREEDICT_TOOLS variable is set. As said, it is better to have this variable set globally on the system. The second lines gives all the files which should be distributed when building a release archive. The contents may vary. Most of the dictionaries follow GNU conventions and ship files like COPYING, AUTHORS, etc. FreeDict only mandates a ChangeLog, the Makefile, the dictionary (with icensing information) and some XML schemas.

mk/dicts.mk provides the support for the following targets (as well as some more internally used targets). If you want a quick yet mor extensive overview, just type make help.

all (or build)

The default target converts the TEI XML source into the supported output formats. Please run make list-platforms for a list of supported output formats.

changelog

Updating all the pieces of a TEI header for a new release can be tedious. This rule assists by update date, edition, extent, copyright year and change information. For the changelog entry, an editor is opened. The edition has to be given on the command line for instance as:

make E=1.8.2 changelog

Please note that this rule requires the value user_name and optionally full_name from the FreeDict configuration. Please see the section on how to create a FreeDict configuration for more details.

A help screen for this rule can be obtained using make changelog-help.

clean

This removes the non-source files generated during the build of anything from the dictionary module.

deploy

This builds and deploys a release to the place where releases should go to, something the make system knows best. It requires a FreeDict configuration. If you want to deploy a release again, use make FORCE=y deploy.

Note: After the deployment, you should use make api to generate a new API file.

list-platforms

This lists all supported output formats / platforms.

install

Install the dictionary to the locale file system. The variables DESTDIR and PREFIX can be used to control the destination.

list-platforms

List all available platforms.

qa

This runs all quality assurance helpers of FreeDict. This is a strongly advised step before a new release of a dictionary.

release-PLATFORM

This puts a release file for the specified platform into the corresponding directory below $BUILD_DIR), usually ../build.

Example: make release-dictd

rm_duplicates

This tries to find duplicated entries or empty XML nodes and removes them. Afterwards, a human-readable diff of the changes is presented to the user.

validation

This target is used to check the TEI XML file against the FreeDict RNG schema. It is used to spot errors in the dictionary and should be used by each dictionary maintainer, to make sure that their dictionaries adhere to the rules.

version

Output the current version of the dictionary.

mk/dictroot.mk

This file is included by the top-level Makefile of the FreeDict repository and provides convenience functionality for all dictionaries at once. As for all Makefiles, make help will explain most of the relevant targets.

The default target invokes a build of all dictionaries in the repository for all available output formats. This is potentially a very time-consuming process, so it can be parallelized. Try make -j8 if you have a system with eight CPU cores.

There is, as for each dictionary, a install rule. Additionally, there's a rule make install-restart, which will also attempt to restart the dictd daemon after a successful installation. install-core will install all dictionaries, where install will also attempt to restart involved services. The variables DESTDIR and PREFIX can be used to control the destination of the installed dictionaries.

tools/Makefile

Within the tools directory, there is a Makefile which defines targets relevant for the management of the tools. These targets are mostly relevant for project administrators.

Creating A FreeDict Configuration

Some of the commands available in the tools directory commands require a configuration. This configuration configures paths and user credentials to access certain parts of FreeDict's release infrastructure or to automate the changelog creation.

A configuration has to be in %LOCALAPPDATA% on Windows and in $HOME/.config/freedict/freedictrc on UNIX-alike systems. A absolute minimal configuration could look like this:

[DEFAULT]
file_access_via = sshfs
api_output_path = ~/freedict/fd-dictionaries/build

[release]
user = humenda,freedict
local_path = ~/freedict/release

[generated]
user=humenda
local_path = ~/freedict/generated

[crafted]
local_path = ~/freedict/fd-dictionaries

The default section contains global options. The file_access_via is used to determine the method to access remote files of the project, including releases and auto-importred dictionaries. At the moment, SSHFS and unison are supported (spelled in lower case in the configuration). SSHFS will mount the files as a remote file system (UNIX only) and Unison will synchronise these files with the server, so that you have a copy to work with.

The api_output_path specifies the resulting directory name of the API files (AKA freedict-database.xml and freedict-database.json). It is also advised to add user_name and full_name to the DEFAULT section to the GitHub and real name respectively. They will be used for instance in the make changelog rule.

The subsequent sections described different locations for dictionaries. The crafted location is the repository with all hand-crafted dictionaries. The section for generated dictionaries is a remote folder which contains all automatically imported dictionaries. Since these dictionaries are generated, it doesn't make sense to version-control them, only the script needs to be under version control. To access these generated files, a local path and a user name is mandatory. Other fields are the server and the remote_path, but these values should be set to the correct values by default. The sections generated and release work the same way, the section crafted has only the option to set a local path. For the crafted section, it is assumed that the dictionaries are accessed using git and hence this can be kept up-to-date by different procedures.

If you want to skip a section for testing, e.gg. the generated section, you can just write skip = yes as first argument into the section.

Make Rules

As usual, make help gives an overview about all commands, the following are used most frequently

api

This generates the FreeDict API file with information about all available dictionaries and their release candidates. This target assumes that you have python3 and SSHFS or Unison and have set up a configuration file as explained in the previous section.

api-validation

There is a Relax NG schema to validate the contents of the generated API. For this to work, the configuration option api_output_path' has to be set and a file has to exist at the specified location. This is the case, if you have run make api` before.

Beside the XML structure, the validation step will also check whether the date format specified is correct and whether the version adheres to the version.major.minor versioning schema.

install

This will install the tools to $DESTDIR/$PREFIX/share/freedict. Default is /usr/local/share/freedict.

install-deps

This probes the current operating systems and starts up the package manager to install the required dependencies for dictionary development and conversion. At the time of writing, Debian-based distributions and Arch GNU/Linux are supported.

mount

Releases and generated dictionaries are on remote machines and need to be made accessible. This can be done with either SSHFS or Unison. Sshfs can mount remote volumes securely, but may be undesirable for slower internet connections. Unison downloads and synchronises remote files with a local copy and only needs to transfer data, if files have been changed. This rule will either mount or synchronize the remote data.

need-update

This will parse the source of all dictionaries and the list of released files to detect unreleased changes. It will present a table with dictionaries to release.

To execute this target, a configuration has to exist. Please see the corresponding section of this chapter.

release

This builds a release tarball for the tools directory.

umount

Please see the section on mount fore more details.

This rule umounts remote shares, if they were mounted before.