Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#90 Adds global text search field that includes TEI Header nodes #100

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

randalldfloyd
Copy link
Contributor

Adding a new field for text searching that includes text nodes from the TEI header. An additional field allows for creating separate behaviors between the advanced search text field and the global site search.

@Conal-Tuohy
Copy link
Collaborator

Conal-Tuohy commented Feb 4, 2021

I don't think the change to the XSLT is needed. The additional field element in the search-fields.xml file should do the trick by itself.

The update-schema-from-field-definitions.xsl stylesheet already includes code to automatically add all the fields which are defined in the search-fields.xml file.

NB that stylesheet also explicitly adds the three "full text" fields diplomatic, normalized, and introduction independently, because they aren't defined in the search-fields.xml file. The reason why those "full text" fields aren't also defined in search-fields.xml is a bit complicated, but in brief it's because we want to populate those fields with text which exactly matches the web pages (so that the web pages can have hits highlighted in them, based on matches returned by Solr's hit-highlighting). Those web pages are the output of a chain of quite complex XSLT transformations (which have to suppress orig or reg elements, etc), so although in theory you could extract equivalent text from the TEI using an XPath expression that also suppressed the appropriate elements' content, in practice it seemed to me unwise, since any discrepancy between the field value and the web page's content would break the hit highlighting.

@Conal-Tuohy
Copy link
Collaborator

Conal-Tuohy commented Feb 5, 2021

NB if you want the new tei-header field to also appear in the search form, you would need to give it a label attribute, e.g.

	<field name="tei-header" label="Metadata" xpath="/TEI/teiHeader//text()"/>

Without a label attribute, any new field will get added to the Solr schema, and the Solr field will get populated by the indexer (evaluating its xpath attribute), but it would not appear on the main search form. Fields without a label are invisible to the search UI, though they can have their uses, such as e.g. the id field, which is purely there to provide a unique ID for the record in Solr.

@Conal-Tuohy
Copy link
Collaborator

The only other thing I'd be wary of is the potential that this XPath expression might merge the content of adjacent elements into a single word, if there were no white space between the elements. e.g.

<p>Blah blah ... blah</p><p>Blah blah blah.</p>

Would produce Blah blah ... blahBlah blah blah.

Maybe it would be safer to use the string-join() function to explicitly add white space between each text node? e.g. string-join(/TEI/teiHeader//text(), ' ')

@randalldfloyd
Copy link
Contributor Author

@Conal-Tuohy Thanks for the guidance on this. Also thanks for the additional comments you left over in the issue conversation. That helped solve a major mystery in my mind, which was how the actual document text was being put into the Solr fields after their definition. Going by their names only, I thought the xproc steps and stylesheets you pointed out were just for transforming P5 to HTML in the request so I didn't ever look at them, but now I see how they are used to transform to the Solr doc in the index pipeline.

@randalldfloyd randalldfloyd marked this pull request as draft February 28, 2022 15:00
@mdalmau
Copy link
Collaborator

mdalmau commented Mar 8, 2022

@randalldfloyd : I am not really sure where we left off with this .... maybe when you get a breather (ha!) later in April, we can revisit?

@randalldfloyd
Copy link
Contributor Author

@mdalmau
I'll tell you honestly what I remember from this, and then you can tell me if it was just wishful thinking or not. After putting in a fair amount of work to demonstrate the ability to alter the keyword search behavior, you put out a message to the group asking for xpaths that could be included in the indexing of the text search field. To that, someone (Bill maybe?) responded that they couldn't see what the real need for this was, or what the problem was as it currently works, and nobody else responded that I was copied on. I had a test branch deployed somewhere, but it was probably lost in the moving around of services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants