-
Notifications
You must be signed in to change notification settings - Fork 20
User Guide
This section provides instructions for running SolRDF. We divided the section in two different parts because the different architecture introduced with Solr 5. Prior to that (i.e. Solr 4.x) Solr was distributed as a JEE web application and therefore, being SolRDF a Maven project, you could use Maven for starting up a live instance without downloading Solr (Maven would do that for you, behind the scenes).
Solr 5 is now delivered as a standalone jar and therefore the SolRDF installation is different; it requires some manual steps in order to deploy configuration files and libraries within an external Solr (which needs to be downloaded separately).
First, you need Java 8, Apache Maven and Apache Solr installed on your machine. Open a new shell and type the following:
# cd /tmp
# git clone https://github.com/agazzarini/SolRDF.git solrdf-download
# cd solrdf-download/solrdf
# mvn clean install
At the end of the build, after seeing
[INFO] --------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Solr RDF plugin .................... SUCCESS [ 3.151 s]
[INFO] solrdf-core ........................ SUCCESS [ 10.191 s]
[INFO] solrdf-client ...................... SUCCESS [ 3.554 s]
[INFO] solrdf-integration-tests ........... SUCCESS [ 14.910 s]
[INFO] --------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] --------------------------------------------------------
[INFO] Total time: 32.065 s
[INFO] Finished at: 2015-10-20T14:42:09+01:00
[INFO] Final Memory: 43M/360M
you can find the solr-home directory, with everything required for running SolRDF, under the
/tmp/solr/solrdf-download/solrdf/solrdf-integration-tests/target/solrdf-integration-tests-1.1-dev/solrdf
We refer to this directory as $SOLR_HOME. At this point, open a shell under the bin folder of your Solr and type:
> ./solr -p 8080 -s $SOLR_HOME -a "-Dsolr.data.dir=/work/data/solrdf"
Waiting to see Solr listening on port 8080 [/]
Started Solr server on port 8080 (pid=10934). Happy searching!
If you're using Solr 4.x, you can point to the solrdf-1.0 branch and use the automatic procedure described below for downloading, installing and run it. There's no need to download Solr, as Maven will do that for you.
Open a new shell and type the following:
# cd /tmp
# git clone https://github.com/agazzarini/SolRDF.git solrdf-download
# cd solrdf-download/solrdf
# mvn clean install
# cd solrdf-integration-tests
# mvn clean package cargo:run
The very first time you run this command a lot of things will be downloaded, Solr included. At the end you should see something like this:
[INFO] Jetty 7.6.15.v20140411 Embedded started on port [8080]
[INFO] Press Ctrl-C to stop the container...
SolRDF is up and running!
Now let's add some data. You can do that in one of the following ways:
Open your favourite browser and type the follwing URL (line has been split for readability):
http://localhost:8080/solr/store/update/bulk?commit=true
&update.contentType=application/n-triples
&stream.file=/tmp/solrdf-download/solrdf/solrdf-integration-tests/src/test/resources/sample-data/bsbm-generated-dataset.nt
This is an example with the bundled sample data. If you have a file somehere (i.e. remotely) you can use the stream.url parameter to indicate the file URL. For example:
http://localhost:8080/solr/store/update/bulk?commit=true
&update.contentType=application/rdf%2Bxml
&stream.url=http://ec.europa.eu/eurostat/ramon/rdfdata/countries.rdf
Open a shell and type the following
# curl -v http://localhost:8080/solr/store/update/bulk?commit=true \
-H "Content-Type: application/n-triples" \
--data-binary @/tmp/solrdf-download/solrdf/src/test/resources/sample_data/bsbm-generated-dataset.nt
Ok, you just added (about) 5000 triples.
SolRDF is a fully compliant SPARQL 1.1. endpoint. In order to issue a query just run a query like this:
# curl "http://127.0.0.1:8080/solr/store/sparql" \
--data-urlencode "q=SELECT * WHERE { ?s ?p ?o } LIMIT 10" \
-H "Accept: application/sparql-results+json"
Or
# curl "http://127.0.0.1:8080/solr/store/sparql" \
--data-urlencode "**q=SELECT * WHERE { ?s ?p ?o } LIMIT 10**" \
-H "Accept: application/sparql-results+xml"
The Hybrid mode has been temporarily disabled as there are some issues that need to be fixed
If the request contains a valid SPARQL query and at least one of the parameters listed below, SolRDF switches in a so-called "Hybrid" mode. That enables a set of interesting features like results pagination (without using the LIMIT keyword) and faceting (on the overall results of the SPARQL query).
Parameter | Description | Reference |
---|---|---|
rows | The maximum number of results that will be returned in response. In case of negative or invalid value it defaults to 10. | Solr Wiki (rows) |
start | The start offset in the complete result set. In case of negative or invalid value it defaults to 0. | Solr Wiki (start) |
facet | A boolean value that enables or disables (default) faceting | Solr Wiki (facet) |
facet.field | The name of the field which should be treated as a facet. In case of multiple fields, the parameter can be repeated in the request | Solr Wiki (facet.field) |
For more information about Solr query and facet parameters see here [1] and here [2]. Remember that only parameters listed in the table above are "supported". Hopefully I will gradually all the other parameters.
When SolRDF runs in Hybrid mode, it will produce a response like this:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">25</int>
<int name="rows">2</int>
<int name="start">100</int>
<str name="query">SELECT *
WHERE
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o}
</str>
</lst>
<result name="response" numFound="18176" start="100" maxScore="1.0">
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="s" />
<variable name="o" />
</head>
<results>
<result>
<binding name="s">
<bnode>b0</bnode>
</binding>
<binding name="o">
<uri>http://purl.org/dc/terms/W3CDTF</uri>
</binding>
</result>
<result>
<binding name="s">
<uri>http://www.gutenberg.org/feeds/catalog.rdf#etext20867</uri>
</binding>
<binding name="o">
<uri>http://www.gutenberg.org/rdfterms/etext</uri>
</binding>
</result>
</results>
</sparql>
</result>
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="p">
<int name="<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>">18176</int>
</lst>
</lst>
<lst name="facet_dates" />
<lst name="facet_ranges" />
</lst>
</response>
SolRDF supports a subset of the SPARQL 1.1 Graph Store HTTP Protocol specs [3]. The protocol describes a set of HTTP operations for managing a collection of RDF graphs:
HTTP METHOD | Description | Supported by SolRDF |
---|---|---|
GET | Retrieves the content of a graph (named or default) | Yes |
POST | Adds data to a given graph (named or default) | Yes |
PUT | Replaces data of a given graph (named or default) | No |
DELETE | Deletes a given graph (named or default) | No |
PUT and DELETE requests are not supported because in Solr those HTTP methods are reserved for REST operations related with schema and configuration. As consequence of that a custom handler (in this case Sparql11GraphStoreProtocolHandler) won't never get a chance to be notified in case of PUT or DELETE requests.
The target graph in HTTP requests is indicated by means of "graph" or "default" parameters. Some examples:
Request | Named Graph | Default Graph |
---|---|---|
/rdf-graph-store?default |
No | Yes |
/rdf-graph-store |
No | Yes |
/rdf-graph-store?graph=http://a.b.c |
Yes | No |
As you can see, in case of absence, the request is supposed to refer to the default graph.
A request that uses the HTTP GET method will retrieve an RDF payload that is a serialization of a given graph. Some example:
> curl "http://localhost:8080/solr/store/rdf-graph-store
> curl "http://localhost:8080/solr/store/rdf-graph-store?default
> curl "http://localhost:8080/solr/store/rdf-graph-store?graph=http://a.b.c
A request that uses the HTTP POST method will add an RDF payload to a given graph. Some example:
> curl -X POST "http://localhost:8080/solr/store/rdf-graph-store \
-H "Content-Type: application/n-triples" \
--data-binary @/path/to/your/datafile.nt
> curl -X POST "http://localhost:8080/solr/store/rdf-graph-store?default
-H "Content-Type: application/n-triples" \
--data-binary @/path/to/your/datafile.nt
> curl -X POST "http://localhost:8080/solr/store/rdf-graph-store?graph=http://a.b.c
-H "Content-Type: application/n-triples" \
--data-binary @/path/to/your/datafile.nt
[1] http://wiki.apache.org/solr/CommonQueryParameters
[2] https://wiki.apache.org/solr/SimpleFacetParameters
[3] http://www.w3.org/TR/sparql11-http-rdf-update
1. Introduction
2. User Guide
2.1 Get me up and running
2.2 Add Data
2.3 RDF Mode
2.3.1 SPARQL 1.1 Protocol
2.3.1.1 Query
2.3.1.2 Update
2.3.3 Graph Store Protocol
2.4 Hybrid mode
2.4.1 Querying
2.4.2 Faceted search
2.4.2.1 Fields
2.4.2.2 Objects queries
2.4.2.3 Objects ranges queries
2.5 Deployments
2.6.1 Standalone
2.6.2 SolrCloud
2.6 Message Catalog
3. Developer Guide
3.1 Development Environment
3.2 (Java) Client API
3.3 Solr Configuration
3.3.1 schema.xml
3.3.2 solrconfig.xml
3.4 Components
3.4.1 Stream Loader
3.4.2 Query Parser
3.4.3 Search Component
3.4.4 Facet Component
3.4.5 Response Writer
4. Continuous Integration
5. Roadmap
6. Mailing lists