-
Notifications
You must be signed in to change notification settings - Fork 141
Storing data in SolrCloud
Download and unzip solr(tested on 8.6.2) https://lucene.apache.org/solr/downloads.html
Then copy the crawldb schema into the solr configsets directory:
cp -rf conf/solr/crawldb ${SOLR_HOME}/server/solr/configsets/
Start solr cloud: ./bin/solr -e cloud
Accept defaults until you see: Please provide a name for your new collection
at which point name it crawldb for consistency
Then accept the defaults until:
Please choose a configuration for the crawldb collection, available options are: _default or sample_techproducts_configs [_default]
Again enter crawldb
When started solr should show you the list of schema fields for sparkler: http://localhost:8983/solr/#/crawldb/schema
Then to run a crawl:
./bin/sparkler.sh inject -su https://news.bbc.co.uk -cdb crawldb::localhost:9983
./bin/sparkler.sh crawl -cdb crawldb::localhost:9983 -id sjob-<id>
In the CDB url the crawldb before the :: is the collection name, the latter part is the solr cloud url.