Software to automatically fill a Virtuoso DB with Europeana datasets from the Europeana FTP server and update the data sets regularly
- Check if the configuration in the file
/src/main/resources/sparql-updater.user.properties
is present and values are correct - Run
mvn clean install
to create the file/target/sparql-updater.jar
. This file contains the code to automatically load sets from the Europeana FTP server and write it to Virtuoso. It will also check regularly if datasets were modified and if so will update Virtuoso by uploading the set again and deleting the old set. - Run
docker build . -t europeana/sparql-updater-virtuoso
to create a Docker image containing both Virtuoso and the sparql-updater.jar. This file will now contain the sparql-updater.user.properties file, so don't push this to DockerHub! - Start using the file
docker-compose-localtest.yml
. The Virtuoso GUI is available at http://localhost:8890/
Some things to be aware of:
- Loading all Europeana datasets will require around 500GB of disk space!
- For local testing purposes we use a hard-coded password (see
DBA_PASSWORD
variable indocker-compose-localtest.yml
file. For production purposes the credentials in this .yml file and in the user.properties file should be changed. - After startup a folder named
/database
is created relative to the startup location. This folder contains the virtuoso database files but also has a folder namedtmp-ingest
where files are stored that are downloaded from the ftp-server and generated by the sparql-updater for ingestion. These files are automatically deleted when they are no longer needed. - You can check which datasets are loaded using this SPARQL query:
SELECT DISTINCT ?g WHERE { GRAPH ?g {?s a ?o} }
If you are making changes to the sparql-updater don't forget to:
- Rebuild the jar
- Rebuild the Docker image
- Recreate the container