SciCatLive

Get set up with an instance of SciCat to explore the metadata catalog. SciCatlive provides a flexible and easy way to learn about SciCat and its features for people who are looking to integrate SciCat into their environment. For a user guide please see original documentation.

This project requires docker and docker compose. The docker version must be later than 2.29.0 to support this project.

First stable version

Release v3.0 is the first stable and reviewed version of SciCatLive.

Steps

Windows specific instructions (click to expand)

⚠️ Running this project on Windows is not officialy supported, you should use Windows Subsystem for Linux (WSL).

However, if you want to run it on Windows you have to be careful about:

This project makes use of symbolic links, Windows and git for Windows have to be configured to handle them.
End of lines, specifically in shell scripts. If you have the git config parameter auto.crlf set to true, git will replace LF by CRLF causing shell scripts and maybe other things to fail.
This project uses the variable ${PWD} to ease path resolution in bind mounts. In PowerShell/Command Prompt, the PWD environment variable doesn't exist so you would need to set in manually before running any docker compose command.

Clone the repository

git clone https://github.com/SciCatProject/scicatlive.git

Run with the following command inside the directory
```
docker compose up -d
```

Default setup

By running docker compose up -d these steps take place:

a mongodb container is created with some initial data.
the SciCat backend v4 container is created and connected to (1).
the SciCat frontend container is created and connected to (2).
a reverse proxy container is created and routes traffic to (2) and (3) through localhost subdomains, in the form: http://${service}.localhost. The frontend is available at simply http://localhost.
Some services have additional endpoints that can be explored in SciCatLive which would follow http://${service}.localhost/${prefix}. For example, the backend API can be explored through a Swagger UI at http://backend.localhost/explorer. For more information on the paths used by these routes see the original documentation for these services.

Extra services and features

SciCat has extra features as part of its core as well as integrating with external services.

SciCat features that extend the backend are:

Jobs - this mechanism posts to a message broker, which can then trigger down stream processes. To use this a RabbitMQ server is enabled.
Elasticsearch - creates an elasticsearch service to provide full text search in the backend.

Services that can be integrated with SciCat are:

LDAP - authentication and authorization from an LDAP server
OIDC - authentication and authorization using an OIDC provider
SearchAPI - for better free text search in the metadata based on the PANOSC search-api
LandingPage - a public interface for published datasets landingpage
JupyterHub - Adds an instance of JupyterHub which demonstrates ingestion and extraction of metadata using pyscicat.

To simply enable one or more of these extra services configure them by setting the proper environment variable(s) and/or compose profile(s) from this table.

For a complete guide on how to customise or configure any service, including the default ones, please refer to these sections:

manually select the services
use docker compose env variables to enable features (supported values from this table)
use docker compose profiles to enable extra services (supported values from this table)
modify the service-specific config to customise specific services
add entrypoints to control startup logic

For a guide on how to add a new service, please refer to this section.

Dependencies

Here below we show the dependencies, including the ones of the extra services (if B depends on A, then we visualize it as A --> B):

graph TD
   subgraph services
      subgraph backend
         backends[v3*/v4*]
      end
      mongodb --> backend
      backend --> frontend
      backend --> searchapi
      backend --> landingpage
      backend --> jupyter
   end

   proxy -.- services
   
   %% CSS Styling
   linkStyle 5 marker-end:none

We flag with * the services which have extra internal dependencies, which are not shared.

Select the services

The user can selectively decide the containers to spin up and the dependencies will be resolved accordingly. The available services are in the services folder and are called consistently.

For example, one could decide to only run the backend by running (be aware that this will not run the proxy, so the service will not be available at backend.localhost):

docker compose up -d backend

(or a list of services, for example, with the proxy docker compose up -d backend proxy)

This will run, from the previous section, (1) and (2) but skip the rest.

Accordingly (click to expand)...

docker compose up -d frontend

Will run, from the previous section, (1), (2) and (4) but skip (5).

And

docker compose --profile search up -d searchapi

Will run, from the previous section, (1) and (2), skip (3) and (4), and add the searchapi service.

Make sure to check the backend compatibility when choosing services and setting docker compose env vars and profiles.

Docker compose env variables

They are used to modify existing services where whenever enabling the feature requires changes in multiple services. They also have the advantage, compared to docker profiles, of not needing to define a new profile when a new combination of features becomes available. To set an env variable for docker compose, either assign it in the shell or change the .env file. To later unset it, either unset it from the shell or assign it an empty value, either in the shell or in the .env file.

For example, to use the Jobs functionality of SciCat change JOBS_ENABLED to true before running your docker compose command or simply export it in the shell. For all env configuration options see here.

Docker compose profiles

They are used when adding new services or grouping services together (and do not require changes in multiple services). To enable any, run docker compose --profile <PROFILE> up -d, or export the COMPOSE_PROFILES env variable as described here. If needed, the user can specify more than one profile in the CLI by using the flag as --profile <PROFILE1> --profile <PROFILE2>.

For example docker compose --profile analysis sets up a jupyter hub with some notebooks for ingesting data into SciCat, as well as the related services (backend, mongodb, proxy). For more information on profiles available in SciCat live see the following table.

Docker compose profiles and env variables configuration options

Type	Env key	Value: Service/Feature	Default	Backend Compatibility	Description	Other impacted services
profile	`COMPOSE_PROFILES`	`analysis`: jupyter `search`: searchapi,landingpage `'*'`: jupyter,searchapi,landingpage	`''`	*	analysis: enables additional jupyter notebook with python SciCat SDK installed and example notebooks search: enables a SciCat interface for standardized search and a public interface for published datasets
env	`BE_VERSION`	`v3`: backend/v3 `v4`: backend/v4	`v4`	as set	Sets the BE version to use in (2) of default setup to v3	mongodb,frontend
env	`JOBS_ENABLED`	`true`: rabbitmq,archivemock,jobs feature	`''`	v3	Creates a RabbitMQ message broker which the BE posts to and the archivemock listens to. It emulates the data long-term archive/retrieve workflow
env	`ELASTIC_ENABLED`	`true`: elastic,elastic feature	`''`	v4	Creates an elastic search service and sets the BE to use it for full-text searches
env	`LDAP_ENABLED`	`true`: ldap auth	`''`	*	Creates an LDAP service and sets the BE to use it as authentication backend
env	`OIDC_ENABLED`	`true`: oidc auth	`''`	*	Creates an OIDC identity provider and sets the BE to use it as authentication backend
env	`DEV`	`true`: backend,frontend,searchapi,archivemock in DEV mode	`''`	*	The SciCat services' environment is prepared to ease the development in a standardized environment
env	`<SERVICE>_HTTPS_URL`	`<URL>`: HTTPS termination	`''`	*	Requests the TLS certificate for the URL to LetsEncrypt through the proxy
env	`DEV_BBACKUP`	`true`: bidirectional synchronization of DEV volume	`''`	*	Enables DEV bidirectional synchronization between ${PWD}/bbackup/${APP} on the host and the dev volume

After optionally setting any configuration option, one can still select the services to run as described here.

DEV configuration

(click to expand)

To provide a consistent environment where developers can work, the DEV=true option creates the SciCat services (see DEV from here for the list), but instead of running them, it just creates the base environment that each service requires. For example, for the backend, instead of running the web server, it creates a NODE environment with git where one can develop and run the unit tests. This is useful as often differences in environments create collaboration problems. It should also provide an example of the configuration for running tests. Please refer to the services' README for additional information, or to the Dockerfile CMD of the components' GitHub repo if not specified otherwise. The DEV=true affects the SciCat services only.

Please be patient when using DEV as each container sets the env for dev, including the requirements for testing, which might take a little to finish. To see if any special precaution is required to run the tests, refer to the compose.dev.test.yaml file where tests files are referenced and refer to their content. When DEV=true, if you want to run tests when the containers start, you can do so by including the compose.dev.test.yaml compose file.

docker compose -f compose.yaml -f .github/compose.dev.test.yaml ...

It is very convenient if using VSCode, as, after the docker services are running, one can attach to it and start developing using all VSCode features, including version control and debugging.

Please note that entrypoints when DEV=true are only run when the component's container is created for the first time. This is done to avoid clashes with local changes.

To ease writing DEV configuration, a dev template is provided here and each component inhearits from it, as you can see here setting the componenent specific variables from the relative .env file. ⚠️ Docker compose applies a precedence mechanism whenever the same variable is defined in .env files in nested folders, with precedence to the folder where the default COMPOSE_FILE lives. This means that the current template cannot be used in case of nested components, at least for the parts where local variables are used. There is no conflict with variables defined multiple times in .env files at the same level.

⚠️ To prevent git unpushed changes from being lost when a container is restarted, the work folder of each service, when in DEV mode, is mounted to a docker volume, with naming convention ${COMPOSE_PROJECT_NAME}_<service>_dev. Make sure, to commit and push frequently, especially before removing docker volumes to push the relevant changes.

⚠️ As the DEV containers pull from upstream/latest, there is no guarantee of their functioning outside of releases. If they fail to start, try, as a first option, to build the image from a tag (e.g. build context) using the TAG and then git checkout to that tag (e.g. set GITHUB_REPO including the branch using the same syntax and value as the build context). You can achieve this, by setting the GITHUB_REPO env variable in the component .env file (e.g. the frontend env file) as follows:

-  GITHUB_REPO=https://github.com/SciCatProject/frontend.git
+  GITHUB_REPO=https://github.com/SciCatProject/frontend.git#v4.4.1

The repo is checkout at that particular commit only if the docker volume does not yet exist.

DEV bidirectional synchronization

Setting DEV_BBACKUP=true in the .env file enables bidirectional synchronization between the DEV volume of each component (e.g. frontend_dev) and a directory on the host placed at ${PWD}/bbackup/${APP} (e.g. ${PWD}/bbackup/${APP}). This is sometimes convenient both to have a backup of the volume and to enable the use of additional tools installed on the host, which require file access.

TLS configuration

You can enable TLS termination of desired services by setting the <SERVICE>_HTTPS_URL, by setting the full URL, including https://. The specified HTTPS URL will get a letsencrypt generated certificate through the proxy setting. For more details see the proxy instructions. After setting some URLs, the required changes in dependent services are automatically resolved, as explained for example here. Whenever possible, we use either the docker internal network or the localhost subdomains.

⚠️ Please make sure to set all required <SERVICE>_HTTPS_URL whenever enabling one, as mixing public URLs and localhost ones might be tricky. See, for example, what is described in the frontend documentation and the backend documentation.

Service-specific config

It can be changed whenever needing to configure a service independently from the others.

Every service folder (inside the services parent directory) contains its configuration and some instructions, at least for the non-third-party containers.

For example, to configure the frontend, the user can change any file in the frontend config folder, for which instructions are available in the README file.

After any configuration change, docker compose up -d must be rerun, to allow loading the changes.

Entrypoints

Sometimes, it is useful to run init scripts (entrypoints) before the service starts. For example, for the frontend composability, it is useful to specify its configuration through multiple JSON files, with different scopes, which are then merged by a init script. For this reason, one can define common entrypoints and service-specific ones (e.g. backend v4 ones) which can be run inside the container, before the service starts (i.e. before the docker compose command is executed). Whenever these entrypoints are shared between services, it is recommended to place them in an entrypoints folder below the outermost service (e.g. this one).

To ease the iterative execution of multiple init scripts, one can leverage the loop_entrypoints utility, which loops alphabetically over /docker-entrypoinst/*.sh and executes each. This is in use in some services (e.g. in the frontend), so one can add additional init steps by mounting them, one by one, as volumes inside the container in the /docker-entrypoints folder and naming them depending on the desired order (eventually rename the existing ones as well).

If the service does not support entrypoints yet, one needs to:

(click to expand):

mount the loop_entrypoint.sh as a volume inside the container
mount any service-specific init script as a volume in the container in the folder /docker-entrypoints/*.sh, naming them sequentially, depending on the desired execution order
override the entrypoint field in the service
specify the service command

See for example here.

Add a new service

Please note that services should, in general, be defined by their responsibility, rather than by their underlying technology, and should be named so.

Basic

To add a new service (see the jupyter service for a minimal example):

create a dedicated folder in the services one *
name it as the service
create the compose.yaml file
eventually, add a README.md file in the service
eventually, add the platform field, as described here
include the reference to (3) to the global compose include list *
eventually, update the main README.md

* if the service to add is not shared globally, but specific to one particular service or another implementation of the same component, add it to the services folder relative to the affected service, and in (6) add it to its inclusion list. See an example of a service relative services folder here and a relative inclusion list here.

Supported OS architectures

Since some images are not built with multi-arch, in particular the SciCat ones, make sure to specify the platform of the service in the compose, when needed, to avoid possible issues when running docker compose up on different platforms, for example on MAC with arm64 architecture. See for example the searchapi compose.

Advanced

(click to expand)

To add a new service, with advanced configuration (see the backend for an extensive example, or/and this PR which added the landingpage):

follow the steps from the basic section
eventually, include any service, in the service-specific folder which is specific to the service and not shared by other, more general services, e.g. here. This folder should also include different versions of the same service, e.g. v3 and v4 here
eventually, if the service supports ENVs, leverage the include override feature from docker compose. For this:
1. create a compose.base.yaml file, e.g. here, which should contain the base configuration, i.e. the one where all ENVs are unset, i.e. the features are disabled
2. create the ENV-specific (e.g. ELASTIC_ENABLED) compose.<ENV>.yaml file, e.g. backend v4 compose.elastic.yaml, with the additional/override config, specific to the enabled feature
3. create a symlink from .empty.yaml to each .compose.<ENV>.yaml, e.g. here. This is used whenever the ENV is unset, as described in the next step
4. use compose.yaml to merge the compose*.yaml files together, making sure to default to .compose.<ENV>.yaml whenever the ENV is not set. See an example here
5. if the service is another version of an existing one, e.g. v3 and v4 versions of the backend service, add the selective include in the parent compose.yaml, e.g. here
6. eventually, modify the compose workflow to add the toggle to the matrix. If the toggle depends on the changed files, remember to create the toggle configuration here and create the exclude rule in the workflow.
eventually, add entrypoints for init logics, as described here, e.g. like here, including any ENVs specific logic. Remember to set the environment variable in the compose.yaml file.

General use of SciCat

To use SciCat, please refer to the original documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciCatLive

First stable version

Steps

Default setup

Extra services and features

Dependencies

Select the services

Docker compose env variables

Docker compose profiles

Docker compose profiles and env variables configuration options

DEV configuration

DEV bidirectional synchronization

TLS configuration

Service-specific config

Entrypoints

If the service does not support entrypoints yet, one needs to:

Add a new service

Basic

Supported OS architectures

Advanced

General use of SciCat

About

Releases 25

Packages

Contributors 12

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 387 Commits
.github		.github
entrypoints		entrypoints
services		services
.env		.env
.gitignore		.gitignore
README.md		README.md
compose.yaml		compose.yaml

SciCatProject/scicatlive

Folders and files

Latest commit

History

Repository files navigation

SciCatLive

First stable version

Steps

Default setup

Extra services and features

Dependencies

Select the services

Docker compose env variables

Docker compose profiles

Docker compose profiles and env variables configuration options

DEV configuration

DEV bidirectional synchronization

TLS configuration

Service-specific config

Entrypoints

If the service does not support entrypoints yet, one needs to:

Add a new service

Basic

Supported OS architectures

Advanced

General use of SciCat

About

Resources

Stars

Watchers

Forks

Releases 25

Packages 0

Contributors 12

Languages

Packages