Thesaurus Linguae Aegyptiae (TLA) backend.
Copyright (C) 2019-2023 Berlin-Brandenburgische Akademie der Wissenschaften
The TLA backend server is a Spring Boot application using Elasticsearch as a search engine.
TL;DR: run
SAMPLE_URL=http://aaew64.bbaw.de/resources/tla-data/tla-sample-20210115-1000t.tar.gz docker-compose up -d
There are two methods for getting this thing up and running.
Requirements:
- Docker Compose
- Create an environment variable template file
.env
based on the template coming with this repo:cp .env.template .env
- Specify the location where a TLA corpus data archive can be downloaded using the
SAMPLE_URL
environment variable, e.g.:SAMPLE_URL=http://example.org/sample.tar.gz
Start the docker container setup configured in docker-compose.yml
:
docker-compose up -d
This will build and run three containers:
tla-es
: Elasticsearch containertla-ingest
: temporarily executed instance of the backend application, used for populating the Elasticsearch containertla-backend
: the actual backend app
The tla-ingest
container will take its time downloading the TLA corpus data archive file and uploading it into Elasticsearch.
You can check its progress by taking a look into its log output:
docker logs -f tla-ingest
Requirements:
- Java 11
- Elasticsearch 7.15.2 or Docker Compose
-
This method requires you to provide a running Elasticsearch instance. Either you install it as an independent application from https://www.elastic.co/downloads/elasticsearch (https://www.elastic.co/de/downloads/past-releases/elasticsearch-7-15-2; e.g., as MSI) or, if you have Docker Compose, you can simply start one in a container by using the configuration coming with this repository:
docker-compose up -d es
Before continuing, make sure Elasticsearch is running by checking the output of
docker ps --all
or accessing its REST interface in a browser (change9200
in case that you set a different port via theES_PORT
environment variable).In case you have changed host name or default port, now follow the instructions above to make sure you have set the correct environment variables
ES_HOST
andES_PORT
(forSAMPLE_URL
, see next step). -
Once Elasticsearch is up and running, TLA corpus data needs to be loaded into it. In order to do so, you at least need to set the
SAMPLE_URL
environment variable to a URL pointing to a tar-compressed TLA corpus data file. One way to do this is to create a.env
file in the directory containing this README (cf. the instructions above), and setting the variableSAMPLE_URL
in there:SAMPLE_URL=http://aaew64.bbaw.de/resources/tla-data/tla-sample-20210115-1000t.tar.gz
Make sure that the lines with ES_HOST and ES_PORT in
.env
are either deleted or filled with values, but not with values left empty. -
Finally, download and store TLA corpus data from the specified source by running the
populate
gradle task:./gradlew populate
If you are on a Windows machine, you have to use the
gradlew.bat
wrapper instead.)
Run the app using the bootRun
task:
./gradlew tasks # lists available gradle tasks
./gradlew bootrun
If you are on a Windows machine, you need to execute the
gradlew.bat
wrapper shipped with this repository.
There are 3 Gradle tasks for running tests:
:test
: run unit tests:testSearch
: run search tests against live Elasticsearch instance:testAll
: run all of those tests
Note that due to the way Spring-Data works, there is an Elasticsearch instance required even for the unit tests,
although it may well be entirely empty. For the search tests however, the Elasticsearch instance must be fully
populated so that search results can actually be verified against the specified expectations. This means you must
have executed the :populate
task (./gradlew populate
) prior executing :testSearch
or :testAll
.
Search tests are being performed based on search scenarios specified in JSON files. The specification model can be
found in SearchTestSpecs.java
. Individual specification
instances consist of at least a name and a search command. JSON files containing a list of several search test
specifications have to be located within the classpath directory set via the
application property tla.searchtest.path
, each under a sub-directory
whose name can be used to identify the entity service to be used to execute the contained search commands.
The paths used to identify the entity services can be found in the @ModelClass
annotations of the entity services.
Test runs create JUnit and Jacoco reports at the usual output locations.
Limit test runs to single classes by using the --test
option:
./gradlew test --tests=QueryResultTest
Note: You can configure the Elasticsearch HTTP port to which the application will try to connect.
Both the Docker Compose configuration and the bootRun
and test
gradle tasks are going to read
it from the local .env
file.
When running the application using the bootRun
task, comma-separated arguments can be passed via
args
property in the following ways:
./gradlew bootRun --Pargs=--data-file=sample.tar.gz,--foo=bar
./gradlew bootRun --args="--data-file=sample.tar.gz --foo=bar"
Populate database with a corpus dump and shut down after:
./gradlew bootRun --args="--data-file=sample.tar.gz --shutdown"
There is a gradle task for populating the backend app's elasticsearch indices with corpus data obtained
from a URL specified via the SAMPLE_URL
environment variable:
./gradlew populate
You can check for the newest version of package dependencies by running:
./gradlew dependencyUpdates