This is a new synchronized facsimile and transcription reader for the TEI files on
It is a fork of the Glossematics source code with many changes made to TEI styling, metadata retrieval and page structure fitting these TEI files, which are quite different from the ones at
I downloaded the "everyman" dataset from and extracted every zip file.
The extracted TIF files were recursively converted and renamed using the following commands (taken from kuhumcst#20):
find . -name '*.tif' -exec mogrify -format jpg -quality 70 {} +
find . -name '*.jpg' -exec rename 's/(?<!.tif).jpg/.tif.jpg/g' {} +
And to remove the remaining TIF files:
find . -name "*.tif" -type f -exec rm -f {} \;
To create thumbnails for search results:
mkdir thumbs
find . -name '*.jpg' -exec convert '{}' -resize 360x640 -set filename:newname "%t.%e" 'thumbs/thumb-%[filename:newname]' \;
The directory /etc/clarin-tei
serves as the home directory of the system. The image and TEI files are to be found somewhere within the directory structure of /etc/clarin-tei/files
while this Git repository is cloned at /etc/clarin-tei/clarin-tei
The system requires Docker to run and is initialised as a systemd
cp system/clarin-tei.service /etc/systemd/system/clarin-tei.service
systemctl enable clarin-tei
systemctl start clarin-tei
Currently, this system requires a separate reverse proxy to be available on the public Internet.
For e.g. an nginx setup such as the one running on
, the following snippet should be included:
location /clarin {
include proxy_params;
This will proxy requests to the CLARIN TEI web service running on localhost:6789.
For whatever reason it might be necessary to debug the Docker container in production.
To stop and rebuild the Docker image from scratch with visible terminal output:
# run these as a superuser
cd /etc/clarin-tei/clarin-tei/docker
systemctl stop clarin-tei
CLARIN_TEI_FILES_DIR=/etc/clarin-tei/files docker-compose build --no-cache
systemctl start clarin-tei