Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Stop using deprecated textNoStem fields - use textUnstemmed instead #1069

Open
52 tasks
ndushay opened this issue Dec 8, 2023 · 0 comments
Open
52 tasks

Stop using deprecated textNoStem fields - use textUnstemmed instead #1069

ndushay opened this issue Dec 8, 2023 · 0 comments

Comments

@ndushay
Copy link
Contributor

ndushay commented Dec 8, 2023

Prereq: integration tests in Argo for appropriate functionality of Solr fields.

The goal of this ticket is to stop using some deprecated solr field types for Argo.

In argo-xx schema.xml:

    <!-- Text tokenized without stemming -->
    <dynamicField name="*_text_unstemmed_i"   type="textUnstemmed" indexed="true"  stored="false" multiValued="false"/>
    <dynamicField name="*_text_unstemmed_im"  type="textUnstemmed" indexed="true"  stored="false" multiValued="true"/>
    <dynamicField name="*_text_unstemmed_si"  type="textUnstemmed" indexed="true"  stored="true"  multiValued="false"/>
    <dynamicField name="*_text_unstemmed_sim" type="textUnstemmed" indexed="true"  stored="true"  multiValued="true"/>
    <!-- DEPRECATED:  textNoStem is a deprecated type -->
    <dynamicField name="*_text_nostem_i"  type="textNoStem" indexed="true"  stored="false" multiValued="false"/>
    <dynamicField name="*_text_nostem_im" type="textNoStem" indexed="true"  stored="false" multiValued="true"/>
...
snip
...0
   <!-- DEPRECATED:  use textUnstemmed, as WordDelimiterFilterFactory is deprecated.  Analyzed Text, no Stemming or Synonyms -->
    <fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <!-- NFKC, case folding, diacritics removed -->
        <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" generateWordParts="1" catenateWords="1" splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1" catenateAll="0" preserveOriginal="0" stemEnglishPossessive="0"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>

Existing fields using type TextNoStem

  • descriptive_text_nostem_i
  • source_id_text_nostem_i
  • author_text_nostem_im
  • contributor_text_nostem_im

NOTE: if a single stored field will do the job, store it (e.g. (primary) author field can be used for display and search if stored)

Steps

  • sul-solr-configs allow new fields
    • qa
    • stage
    • prod
  • argo's solr config files
  • dor-service-app solr config files
  • dor_indexing gem populates new fields of type TextUnstemmed (in addition to old types)
    • PR for this
    • release new dor-indexing gem version
  • dor_indexing_app uses new gem release (and populates new fields)
    • PR for this
    • deployed to qa
    • deployed to stage
    • deployed to prod
  • dor-services-app app uses new gem release (and populates new fields)
    • PR for this
    • deployed to qa
    • deployed to stage
    • deployed to prod
  • new field population tested
    • solr queries for argo-qa
    • solr queries for argo-stage
  • argo solr prod fields have begun populated
  • argo solr prod fields have finished populated
  • argo PR (on HOLD) for replacing usage of old field names
    • argo changes deployed to qa
    • argo changes deployed to stage
    • argo tested and vetted on qa and/or stage
  • sul-solr-configs argo_xx files PR to use new fields in search results for stage and qa
    • solr config changes deployed to qa
    • solr config changes deployed to stage
  • argo solr prod new fields have begun populating (starting: )
  • argo solr prod new fields have finished populating
  • roll out new fields in argo prod
    • add prod to argo PR (take out of HOLD) for replacing usage of old field names
      • argo changes deployed to prod
    • argo solr config file PRs to use new fields in search results
      • sul-solr config changes deployed to prod
      • argo repo gets solr config changes
      • dor-services-app repo gets solr config changes
  • remove old fields
    • argo
    • dor_indexing_app
    • indexes
      • qa
      • stage
      • prod
  • sul-solr-config removes field types
  • argo Solr config files updated in the Solr cluster
    • qa
    • stage
    • prod
  • argo repo solr config changes
  • dor-services-app repo solr config changes
@ndushay ndushay changed the title Get rid of deprecated textNoStem fields - use textUnstemmed instead Stop using deprecated textNoStem fields - use textUnstemmed instead Feb 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant