-
Notifications
You must be signed in to change notification settings - Fork 63
Lesson: control indexing of your rdf metadata
Every ActiveFedora datastream can make a representation of itself to store in solr by using the built-in method to_solr
. This method requires that the datastream is associated with a model, so that the RDF can be written with the pid as the subject of its assertions. Let's take a look at what the default to_solr
method gives us:
class DublinCoreDatastream < ActiveFedora::NtriplesRDFDatastream
map_predicates do |map|
map.title(in: RDF::DC)
map.created(in: RDF::DC)
#...
end
end
class MyObj < ActiveFedora::Base
has_metadata 'descMetadata', type: DublinCoreDatastream
end
m = MyObj.new(title: 'One Hundred Years of Solitude', created: '1967')
=> #<MyObj pid: nil, title: "One Hundred Years of Solitude", created: "1967">
m.descMetadata.to_solr
=> {}
As you can see the default behavior is an empty hash. Nothing will be indexed in solr by default. Now, let's tweak the behavior to produced a more interesting and useful document:
class DublinCoreDatastream < ActiveFedora::NtriplesRDFDatastream
map_predicates do |map|
map.title(in: RDF::DC) do |index|
index.as :sortable, :searchable
end
map.created(in: RDF::DC) do |index|
index.as :stored_searchable
end
#...
end
end
class MyObj < ActiveFedora::Base
has_metadata 'descMetadata', type: DublinCoreDatastream
end
m = MyObj.new(title: 'One Hundred Years of Solitude', created: '1967')
=> #<MyObj pid: nil, title: "One Hundred Years of Solitude", created: "1967">
m.descMetadata.to_solr
=> {"desc_metadata__title_si"=>"One Hundred Years of Solitude",
"desc_metadata__title_teim"=>["One Hundred Years of Solitude"],
"desc_metadata__created_tesim"=>["1967"]}
This time we can see that the Solr document has three fields, which are derived from the two data fields. You will notice that different arguments on the index.as
line produce different suffixes in the output. These suffixes control how solr indexes this behavior. The first one or two characters if the suffix determine the Solr field type:
For example:
- dt = date
- s = string (not tokenized)
- te = text (tokenized with english assumptions)
- i = integer
- b = boolean
The last characters are:
- s = if present, stored (can be displayed after retrieval)
- i = if present, index this field (for searching)
- m = if present, multivalued (can't sort on multivalued fields)
See https://github.com/projecthydra/active_fedora/blob/master/lib/generators/active_fedora/config/solr/templates/solr_conf/conf/schema.xml#L13-L152 for an exhaustive list.
Solrizer gives us some macros that help build the appropriate shortcuts:
-
:stored_searchable
- _tesim - for strings or text fields
- _dtsim - for dates
- _isim - for integers
-
:searchable
- _teim - for strings or text fields
- _dtim - for dates
- _iim - for integers
-
:facetable
- _sim
-
:symbol
- _ssim
- and others. See https://github.com/projecthydra/solrizer/blob/master/lib/solrizer/default_descriptors.rb
Go on to Lesson: using typed predicates in your models or return to the Tame your RDF Metadata with ActiveFedora landing page.