Skip to content

Latest commit

 

History

History
165 lines (107 loc) · 6.1 KB

README.md

File metadata and controls

165 lines (107 loc) · 6.1 KB

mediaTUM View - Backend

Overview

We add several PostgreSQL views and functions to the original mediaTUM database design.

The new code lives in these PostgreSQL schemas:

Schema Description
config Tables for configuring the application (top-level folders, aspects, masks)
preprocess Tables ufts and aspect with preprocessed tsvector values for full text search.
entity Views (some of them materialized) that map the original generic node structure to the specific entities of interest.
api Types and functions that define the GraphQL schema and the resolvers implementing the GraphQL operations.
aux Helper functions

On top of that we use PostGraphile (formerly known as PostGraphQL), to expose the functions from PostgreSQL schema api (and optionally from schema debug) as a GraphQL service.

Prerequisites

PostgreSQL

In general, we use an up-to-date version of PostgreSQL.

Currently we are using version 13.3.

Minimum required version is 10.

Configuration Variables

We use environment variables to configure the database name and the relevant database users.

$ export MEDIATUM_DATABASE_NAME="mediatum"
$ export MEDIATUM_DATABASE_USER="mediatum"
$ export MEDIATUM_DATABASE_USER_VIEW_API="mediatum_view_api"

RUM

RUM is an extension for PostgreSQL that adds a new indexing method, similar to GIN. We use it for efficient ranked full-text search.

To install and integrate RUM do something like the following.

$ git clone https://github.com/postgrespro/rum
$ cd rum
$ make USE_PGXS=1

$ sudo make USE_PGXS=1 install

$ make USE_PGXS=1 installcheck
$ dropdb contrib_regression

$ psql -d $MEDIATUM_DATABASE_NAME -c "CREATE EXTENSION rum;"

Bilingual Dictionary

In order to make full text search working with English and German texts we use a custom made dictionary for configuring FTS in PostgreSQL. This dictionary (english_german) is similar to the builtin Snowball stemmers, but uses both language settings for stemming and produces lexeme variants if the stems differ.

The dictionary is realized as a PostgreSQL extension snowball_bilingual. It's source code is included in this repository.

To build and install it do something like the following.

$ cd backend/pg-extensions/snowball_bilingual
$ make

$ sudo make install

$ make installcheck
$ dropdb contrib_regression

$ psql -d $MEDIATUM_DATABASE_NAME -c "CREATE EXTENSION snowball_bilingual;"

PostGraphile

PostGraphile is a Node.js application, installed utilizing npm:

$ npm install

Installation

The new code lives in dedicated schemas as noted above. In order to create those schemas you may have to grant the permission to do so to the database user that is used to build the database application:

$ psql -d $MEDIATUM_DATABASE_NAME -c "GRANT CREATE ON DATABASE $MEDIATUM_DATABASE_NAME TO $MEDIATUM_DATABASE_USER;"

We need a new database user with restricted privileges that will be used to access the API functions. Note that to create this user you need the privilege CREATEROLE, e.g. as a superuser.

$ psql -d $MEDIATUM_DATABASE_NAME -c "CREATE ROLE $MEDIATUM_DATABASE_USER_VIEW_API LOGIN;"

To submit the new code to a running mediaTUM database execute the SQL and PL/pgSQL code with the appropriate scripts in bin/. Please review these scripts before using them.

$ bin/build-init
$ bin/build-preprocess
$ bin/build-api
$ bin/preprocess-all
$ bin/enable-update-triggers
$ bin/preprocess-catch-up

Preprocessing the fulltext and meta-data to build the FTS index takes some time, e.g. about 120 minutes for a mediaTUM installation with 700000 nodes.

Running PostGraphile

We use PostGraphile in library mode, embedded in a simple Express setup. The relevant options are set in src/ts/main.ts. Please review and customize these settings (e.g. database connection and binding the service to a network interface).

The backend may be run in one of two tiers, development and production.

$ export NODE_ENV=development # or production

Start the backend via npm:

$ npm start

In tier development you may explore the exposed API and its documentation via GraphiQL at the URL http://localhost:5000/graphiql.

For production use you may also consult the suggestions from the PostGraphile documentation.

Database Maintenance

You probably want to enable updating of the preprocessed data automatically by running the script bin/enable-update-triggers. Alternatively you may use the script bin/preprocess-catch-up periodically to add missing preprocessed data.

Note that changes to the table config.aspect_def don't trigger an update of table preprocess.aspect automatically. Please run bin/preprocess-catch-up manually in this case. This will also delete obsolete preprocessed aspect data if you have removed some aspect definitions.

We use some materialized views for performance reasons. Currently these are: folder, folder_node, mask, maskitem, metadatatype, metafield, which are all defined within schema entity. They are derived from the related node types in table mediatum.node.

These materialized views need to be kept up-to-date via the script bin/refresh-materialized-views, either periodically or when needed. Refreshing needs about 7 seconds, so it may be run e.g. hourly.

Deinstallation

You may drop all changes introduced by this backend via bin/drop-all.

You may also want to drop the newly introduced role used for accessing the API:

$ psql -d $MEDIATUM_DATABASE_NAME -c "DROP OWNED BY $MEDIATUM_DATABASE_USER_VIEW_API; DROP ROLE $MEDIATUM_DATABASE_USER_VIEW_API;"