-
Notifications
You must be signed in to change notification settings - Fork 141
Specifying CrawlDB in Config
The config file in question is sparkler-default.yaml. However, there are 3 sparkler-default.yaml config files in use:
- sparkler-core/conf/sparkler-default.yaml
- sparkler-core/sparkler-api/src/test/resources/sparkler-default.yaml
- sparkler-core/sparkler-app/src/test/resources/sparkler-default.yaml
Changes to the config file should be made across all 3 files for consistency.
The section of the config file pertaining to crawldb is set up as following (subject to change):
crawldb.backend: solr
solr.uri: http://localhost:8983/solr/crawldb
elasticsearch.uri: http://localhost:9200
The 'crawldb.backend' field specifies which crawldb to use. Note, the value for 'crawldb.backend' must match one of the following '*.uri' fields. For example, the following specifies elasticsearch as the crawldb to use:
crawldb.backend: elasticsearch
solr.uri: http://localhost:8983/solr/crawldb
elasticsearch.uri: http://localhost:9200
To add a crawldb to this config file, add in the URI and specify the new crawldb. The following is an example done with an hypothetical crawldb called 'testdb'.
crawldb.backend: testdb
solr.uri: http://localhost:8983/solr/crawldb
elasticsearch.uri: http://localhost:9200
testdb.uri: http://localhost:9999 # replace http://localhost:9999 with the appropriate URI
Constants.java holds an interface through which the config file values can be accessed. In code, this will look like:
import edu.usc.irds.sparkler.Constants Constants.key.CRAWLDB_BACKEND # for example, this may equal 'solr' or 'elasticsearch'
To get the crawldb URI, use SparklerConfiguration.java's getDatabaseURI() method. This uses Constants.key.CRAWLDB_BACKEND to determine the appropriate backend URI to return. In code, this might look like:
import edu.usc.irds.sparkler.SparklerConfiguration config.getDatabaseURI() # where config is a SparklerConfiguration instance