From 830a7ce94c8de2f8755a6e3311e5fb8317b98c55 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 21 Feb 2024 12:21:47 -0500 Subject: [PATCH 1/8] use built in sphinx-tabs capablity instead of homegrown --- docs/requirements.txt | 1 + docs/source/conf.py | 1 + docs/source/querqy/index.rst | 945 +++++++++++++++++++---------------- 3 files changed, 528 insertions(+), 419 deletions(-) diff --git a/docs/requirements.txt b/docs/requirements.txt index 857695b..f3964dc 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,2 +1,3 @@ sphinx==7.2.6 sphinx_rtd_theme==2.0.0 +sphinx-tabs==3.4.5 diff --git a/docs/source/conf.py b/docs/source/conf.py index 70ada5e..1057608 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -33,6 +33,7 @@ # ones. extensions = ['sphinx.ext.intersphinx', 'sphinx.ext.imgmath', + 'sphinx_tabs.tabs', 'sphinx.ext.ifconfig'] # Add any paths that contain templates here, relative to this directory. diff --git a/docs/source/querqy/index.rst b/docs/source/querqy/index.rst index 180ee42..77fed92 100644 --- a/docs/source/querqy/index.rst +++ b/docs/source/querqy/index.rst @@ -23,442 +23,549 @@ You might want to ... Installation ============ +.. tabs:: + + .. group-tab:: Elasticsearch + + .. rubric:: Installation under Elasticsearch + + * Stop Elasticsearch if it is running. + * Open a shell and :code:`cd` into your Elasticsearch directory. + * Run Elasticsearch's plugin install script: + + .. code-block:: shell + + ./bin/elasticsearch-plugin install + + Select your version below and we will generate the install command for you: + + .. raw:: html + + +
+
+ + + .. code-block:: shell + + + ./bin/elasticsearch-plugin install \ + "https://repo1.maven.org/maven2/org/querqy/querqy-elasticsearch/1.7.es892.0/querqy-elasticsearch-1.7.es892.0.zip" + + * Answer :code:`yes` to the security related questions (Querqy needs special + permissions to load query rewriters dynamically). + * When you start Elasticsearch, you should see an INFO log message + :code:`loaded plugin [querqy]`. + + .. group-tab:: OpenSearch + + .. rubric:: Installation under OpenSearch + + * Stop OpenSearch if it is running. + * Open a shell and :code:`cd` into your OpenSearch directory. + * Run OpenSearch's plugin install script: + + .. code-block:: shell + + ./bin/opensearch-plugin install + + Querqy is available for OpenSearch 2.3.0. + + .. code-block:: shell + + + ./bin/opensearch-plugin install \ + "https://repo1.maven.org/maven2/org/querqy/opensearch-querqy/1.0.os2.3.0/opensearch-querqy-1.0.os2.3.0.zip" + + * Answer :code:`yes` to the security related questions (Querqy needs special + permissions to load query rewriters dynamically). + * When you start OpenSearch, you should see an INFO log message + :code:`loaded plugin [querqy]`. + + .. group-tab:: Solr + + .. rubric:: Installation under Solr + + The Querqy plugin is installed as a .jar file. + + .. warning:: When upgrading your Querqy version, please make sure to read the + :doc:`release notes! `! + + * Download the Querqy .jar file that matches your Solr version from the table + below. + + + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Solr version | Querqy version [#]_ | + +================+==========================================================================================================================================================+ + | 9.1.0 | :download:`5.5.lucene900.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 9.0.0 | :download:`5.5.lucene900.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.11.x | :download:`5.5.lucene811.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.10.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.9.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.8.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.7.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.6.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.5.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.4.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.3.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.2.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.1.x | :download:`5.4.lucene810.1` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 8.0.x | :download:`5.2.lucene800.0` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 7.7.x | :download:`5.2.lucene720.2` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 7.6.x | :download:`5.2.lucene720.2` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 7.5.x | :download:`5.2.lucene720.2` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 7.4.x | :download:`5.2.lucene720.2` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 7.3.x | :download:`5.2.lucene720.2` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 7.2.x | :download:`5.2.lucene720.2` | + +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. [#] For older Solr versions, please see `here `_. + + You can also browse the `Central Maven Repository`_ and pick + `jar-with-dependencies` from the Downloads dropdown of the corresponding + Querqy version. + + * Put the .jar file into `Solr's lib folder`_. We currently recommend putting querqy into the + :code:`/server/solr-webapp/webapp/WEB-INF/lib/` folder. + * Add the Querqy request handler (Querqy 5 only), the Querqy query parser and + the Querqy query component to your ``solrconfig.xml`` file: + + **Querqy 5** + + .. code-block:: xml + + + + + + + + + + + **Querqy 4** + + .. code-block:: xml + + + + + + + + + .. _`Solr's lib folder`: https://solr.apache.org/guide/solr/latest/configuration-guide/libs.html + .. _`Central Maven Repository`: https://search.maven.org/artifact/org.querqy/querqy-solr -.. include:: se-section.txt - - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -.. rubric:: Installation under Elasticsearch/OpenSearch - -* Stop Elasticsearch/OpenSearch if it is running. -* Open a shell and :code:`cd` into your Elasticsearch/OpenSearch directory. -* Run Elasticsearch/OpenSearch's plugin install script: - -.. code-block:: shell - - ./bin/elasticsearch-plugin install - -Or - -.. code-block:: shell - - ./bin/opensearch-plugin install - -The :code:`` depends on your search engine version: - -Elasticsearch -.............. - -Select your version below and we will generate the install command for you: - - -.. raw:: html - - -

- - -.. rst-class:: elasticsearch-version - - -.. code-block:: shell - - - ./bin/elasticsearch-plugin install \ - "https://repo1.maven.org/maven2/org/querqy/querqy-elasticsearch/1.7.es892.0/querqy-elasticsearch-1.7.es892.0.zip" - - -OpenSearch -.......... - -Querqy is available for OpenSearch 2.3.0. - -.. code-block:: shell - - - ./bin/opensearch-plugin install \ - "https://repo1.maven.org/maven2/org/querqy/opensearch-querqy/1.0.os2.3.0/opensearch-querqy-1.0.os2.3.0.zip" - -* Answer :code:`y`\es to the security related questions (Querqy needs special - permissions to load query rewriters dynamically). -* When you start Elasticsearch/OpenSearch, you should see an INFO log message - :code:`loaded plugin [querqy]`. - -.. raw:: html - -
- - -.. rst-class:: solr - -.. raw:: html - -
- -.. rubric:: Installation under Solr - -The Querqy plugin is installed as a .jar file. - -.. warning:: When upgrading your Querqy version, please make sure to read the - :doc:`release notes! `! - -* Download the Querqy .jar file that matches your Solr version from the table - below. - - - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Solr version | Querqy version [#]_ | - +================+==========================================================================================================================================================+ - | 9.1.0 | :download:`5.5.lucene900.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 9.0.0 | :download:`5.5.lucene900.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.11.x | :download:`5.5.lucene811.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.10.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.9.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.8.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.7.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.6.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.5.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.4.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.3.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.2.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.1.x | :download:`5.4.lucene810.1` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 8.0.x | :download:`5.2.lucene800.0` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 7.7.x | :download:`5.2.lucene720.2` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 7.6.x | :download:`5.2.lucene720.2` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 7.5.x | :download:`5.2.lucene720.2` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 7.4.x | :download:`5.2.lucene720.2` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 7.3.x | :download:`5.2.lucene720.2` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - | 7.2.x | :download:`5.2.lucene720.2` | - +----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ - - .. [#] For older Solr versions, please see `here `_. - - You can also browse the `Central Maven Repository`_ and pick - `jar-with-dependencies` from the Downloads dropdown of the corresponding - Querqy version. - -* Put the .jar file into `Solr's lib folder`_. We currently recommend putting querqy into the - :code:`/server/solr-webapp/webapp/WEB-INF/lib/` folder. -* Add the Querqy request handler (Querqy 5 only), the Querqy query parser and - the Querqy query component to your ``solrconfig.xml`` file: - -**Querqy 5** - -.. code-block:: xml - - - - - - - - - - -**Querqy 4** - -.. code-block:: xml - - - - - - - - -.. _`Solr's lib folder`: https://solr.apache.org/guide/solr/latest/configuration-guide/libs.html -.. _`Central Maven Repository`: https://search.maven.org/artifact/org.querqy/querqy-solr - -.. raw:: html - -
.. _querqy-making-queries: Making queries using Querqy =========================== -.. include:: se-section.txt - -.. rst-class:: elasticsearch - - -.. raw:: html - -
- -Querqy defines its own query builder which can be executed with a rich set of -parameters. We will walk through these parameters step by step, starting with a -minimal query, which does not use any rewriter, then adding a 'Common Rules -Rewriter' and finally explaining the full set of parameters, many of them not -related to query rewriting but to search relevance tuning in general. - - -.. rubric:: Minimal Query - -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"] - } - } - } - -Querqy provides a new query builder, :code:`querqy` (line #3), that can be used -in a query just like any other Elasticsearch query type. The -:code:`matching_query` (#4) defines the query for which documents will be -matched and retrieved. - -The matching query is different from boosting queries which would only influence -the ranking but not the matching. We will later see that Querqy allows to -specify information for boosting outside the matching_query object and that the -set of matching documents can be changed in query rewriting, for example, by -adding synonyms or by deleting query tokens. - -The :code:`query` element (#5) contains the query string. In most cases this is -just the query string as it was typed into the search box by the user. - -The list of :code:`query_fields` (#7) specifies in which fields to search. A -field name can have an optional field weight. In the example, the field weight -for title is 3.0. The default field weight is 1.0. Field weights must be -positive. We will later see that the query_fields can be applied to parts of the -querqy query other than the matching_query as well. That's why the query_fields -list is not a child element of the matching_query. - -The combination of a query string with a list of fields and field weights -resembles Elasticsearch's built-in :code:`multi_match` query. We will later see -that there are some differences in matching and scoring. - - -.. rubric:: Querqy inside the known Elasticsearch Query DSL - -The following example shows, how easy it is to replace a Elasticsearch query type like :code:`multi_match` with a Querqy :code:`matching_query`, so you can profit from Querqy's rewriters. -Let's say you have an index that contains forum posts and want to find a certain post in the topic "hobby", that was made 10-12 days ago and was about "fishing". - -A simple `Boolean query `__ with a :code:`multi_match` and a :code:`match` query inside the :code:`must` occurrence and a :code:`range` query in the :code:`filter` occurrence should do the trick. - -:code:`POST /index/_search` - -.. code-block:: JSON - :linenos: - - { - "query": { - "bool": { - "must": [ - { - "match": { - "topic": "hobby" +.. tabs:: + + .. group-tab:: Elasticsearch + + Querqy defines its own query builder which can be executed with a rich set of + parameters. We will walk through these parameters step by step, starting with a + minimal query, which does not use any rewriter, then adding a 'Common Rules + Rewriter' and finally explaining the full set of parameters, many of them not + related to query rewriting but to search relevance tuning in general. + + + .. rubric:: Minimal Query + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"] + } + } + } + + Querqy provides a new query builder, :code:`querqy` (line #3), that can be used + in a query just like any other Elasticsearch query type. The + :code:`matching_query` (#4) defines the query for which documents will be + matched and retrieved. + + The matching query is different from boosting queries which would only influence + the ranking but not the matching. We will later see that Querqy allows to + specify information for boosting outside the matching_query object and that the + set of matching documents can be changed in query rewriting, for example, by + adding synonyms or by deleting query tokens. + + The :code:`query` element (#5) contains the query string. In most cases this is + just the query string as it was typed into the search box by the user. + + The list of :code:`query_fields` (#7) specifies in which fields to search. A + field name can have an optional field weight. In the example, the field weight + for title is 3.0. The default field weight is 1.0. Field weights must be + positive. We will later see that the query_fields can be applied to parts of the + querqy query other than the matching_query as well. That's why the query_fields + list is not a child element of the matching_query. + + The combination of a query string with a list of fields and field weights + resembles Elasticsearch's built-in :code:`multi_match` query. We will later see + that there are some differences in matching and scoring. + + + .. rubric:: Querqy inside the known Elasticsearch Query DSL + + The following example shows, how easy it is to replace a Elasticsearch query type like :code:`multi_match` with a Querqy :code:`matching_query`, so you can profit from Querqy's rewriters. + Let's say you have an index that contains forum posts and want to find a certain post in the topic "hobby", that was made 10-12 days ago and was about "fishing". + + A simple `Boolean query `__ with a :code:`multi_match` and a :code:`match` query inside the :code:`must` occurrence and a :code:`range` query in the :code:`filter` occurrence should do the trick. + + :code:`POST /index/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + "bool": { + "must": [ + { + "match": { + "topic": "hobby" + } + }, + { + "multi_match": { + "query": "fishing", + "fields": ["title", "content"] + } + } + ], + "filter": [ + { + "range": { + "dateField": { + "gte": "now-12d", + "lte": "now-10d" + } + } + } + ] } - }, - { - "multi_match": { - "query": "fishing", - "fields": ["title", "content"] + } + } + + + To use the :code:`matching_query` from the :code:`querqy` query builder, your request would look like this: + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 11 + + { + "query": { + "bool": { + "must": [ + { + "match": { + "topic": "hobby" + } + }, + { + "querqy": { + "matching_query": { + "query": "fishing" + }, + "query_fields": ["title", "content"], + "rewriters": ["my_replace_rewriter", "my_common_rules"] + } + } + ], + "filter": [ + { + "range": { + "dateField": { + "gte": "now-12d", + "lte": "now-10d" + } + } + } + ] } } - ], - "filter": [ - { - "range": { - "dateField": { - "gte": "now-12d", - "lte": "now-10d" + } + + + + As you can see, to use a :code:`matching_query` instead of a :code:`multi_match` you need to use :code:`querqy` (line #11) as a "wrapper" for the :code:`matching_query`. + + + .. group-tab:: OpenSearch + + Querqy defines its own query builder which can be executed with a rich set of + parameters. We will walk through these parameters step by step, starting with a + minimal query, which does not use any rewriter, then adding a 'Common Rules + Rewriter' and finally explaining the full set of parameters, many of them not + related to query rewriting but to search relevance tuning in general. + + + .. rubric:: Minimal Query + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"] } - } } - ] } - } - } - - -To use the :code:`matching_query` from the :code:`querqy` query builder, your request would look like this: - -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 11 - - { - "query": { - "bool": { - "must": [ - { - "match": { - "topic": "hobby" - } - }, - { - "querqy": { - "matching_query": { - "query": "fishing" - }, - "query_fields": ["title", "content"], - "rewriters": ["my_replace_rewriter", "my_common_rules"] + + Querqy provides a new query builder, :code:`querqy` (line #3), that can be used + in a query just like any other Elasticsearch query type. The + :code:`matching_query` (#4) defines the query for which documents will be + matched and retrieved. + + The matching query is different from boosting queries which would only influence + the ranking but not the matching. We will later see that Querqy allows to + specify information for boosting outside the matching_query object and that the + set of matching documents can be changed in query rewriting, for example, by + adding synonyms or by deleting query tokens. + + The :code:`query` element (#5) contains the query string. In most cases this is + just the query string as it was typed into the search box by the user. + + The list of :code:`query_fields` (#7) specifies in which fields to search. A + field name can have an optional field weight. In the example, the field weight + for title is 3.0. The default field weight is 1.0. Field weights must be + positive. We will later see that the query_fields can be applied to parts of the + querqy query other than the matching_query as well. That's why the query_fields + list is not a child element of the matching_query. + + The combination of a query string with a list of fields and field weights + resembles OpenSearch's built-in :code:`multi_match` query. We will later see + that there are some differences in matching and scoring. + + + .. rubric:: Querqy inside the known OpenSearch Query DSL + + The following example shows, how easy it is to replace a OpenSearch query type like :code:`multi_match` with a Querqy :code:`matching_query`, so you can profit from Querqy's rewriters. + Let's say you have an index that contains forum posts and want to find a certain post in the topic "hobby", that was made 10-12 days ago and was about "fishing". + + A simple `Boolean query `__ with a :code:`multi_match` and a :code:`match` query inside the :code:`must` occurrence and a :code:`range` query in the :code:`filter` occurrence should do the trick. + + :code:`POST /index/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + "bool": { + "must": [ + { + "match": { + "topic": "hobby" + } + }, + { + "multi_match": { + "query": "fishing", + "fields": ["title", "content"] + } + } + ], + "filter": [ + { + "range": { + "dateField": { + "gte": "now-12d", + "lte": "now-10d" + } + } + } + ] } } - ], - "filter": [ - { - "range": { - "dateField": { - "gte": "now-12d", - "lte": "now-10d" - } + } + + + To use the :code:`matching_query` from the :code:`querqy` query builder, your request would look like this: + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 11 + + { + "query": { + "bool": { + "must": [ + { + "match": { + "topic": "hobby" + } + }, + { + "querqy": { + "matching_query": { + "query": "fishing" + }, + "query_fields": ["title", "content"], + "rewriters": ["my_replace_rewriter", "my_common_rules"] + } + } + ], + "filter": [ + { + "range": { + "dateField": { + "gte": "now-12d", + "lte": "now-10d" + } + } + } + ] } } - ] - } - } - } - - - -As you can see, to use a :code:`matching_query` instead of a :code:`multi_match` you need to use :code:`querqy` (line #11) as a "wrapper" for the :code:`matching_query`. - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -If you followed the instructions for installing Querqy, you have configured a -Querqy query parser in your solrconfig.xml file. This query parser can be used -with a rich set of parameters. We will walk through these parameters step by -step, starting with a minimal query, which does not use any rewriter, then -adding a 'Common Rules Rewriter' and finally explaining the full set of -parameters, many of them not related to query rewriting but to search relevance -tuning in general. - -We will not encode URL parameters in the example for better readability. - -.. rubric:: Minimal Query - - -:code:`/solr/mycollection/select?q=notebook&defType=querqy&qf=title^3.0 brand^2.1 shortSummary` - - -The Querqy query parser is enabled using the :code:`defType` parameter. - -As usual in Solr, the :code:`q`\ parameter defines the query for which -documents will be matched and retrieved. In most cases the value of parameter q -is just the query string as it was typed into the search box by the user. Querqy -query rewriting can add boosting information outside that query or change the -set of matching documents, for example, by adding synonyms or by deleting query -tokens. - - -The :code:`qf` parameter specifies in which fields to search. A field name can -have an optional field weight. In the example, the field weight for title is -3.0. The default field weight is 1.0. Field weights must be positive. - -The use of the q and qf parameters resembles Solr's built-in :code:`dismax` and -:code:`edismax` query parsers. We will later see that there are some differences -in how scoring works. - -.. raw:: html - -
+ } + + + + As you can see, to use a :code:`matching_query` instead of a :code:`multi_match` you need to use :code:`querqy` (line #11) as a "wrapper" for the :code:`matching_query`. + + .. group-tab:: Solr + + If you followed the instructions for installing Querqy, you have configured a + Querqy query parser in your solrconfig.xml file. This query parser can be used + with a rich set of parameters. We will walk through these parameters step by + step, starting with a minimal query, which does not use any rewriter, then + adding a 'Common Rules Rewriter' and finally explaining the full set of + parameters, many of them not related to query rewriting but to search relevance + tuning in general. + + We will not encode URL parameters in the example for better readability. + + .. rubric:: Minimal Query + + + :code:`/solr/mycollection/select?q=notebook&defType=querqy&qf=title^3.0 brand^2.1 shortSummary` + + + The Querqy query parser is enabled using the :code:`defType` parameter. + + As usual in Solr, the :code:`q`\ parameter defines the query for which + documents will be matched and retrieved. In most cases the value of parameter q + is just the query string as it was typed into the search box by the user. Querqy + query rewriting can add boosting information outside that query or change the + set of matching documents, for example, by adding synonyms or by deleting query + tokens. + + + The :code:`qf` parameter specifies in which fields to search. A field name can + have an optional field weight. In the example, the field weight for title is + 3.0. The default field weight is 1.0. Field weights must be positive. + + The use of the q and qf parameters resembles Solr's built-in :code:`dismax` and + :code:`edismax` query parsers. We will later see that there are some differences + in how scoring works. + Where to go next ================ From 8d183c8c98e5d37c16a0fbe6335a3bb81503f244 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 21 Feb 2024 12:36:45 -0500 Subject: [PATCH 2/8] Migrated to sphinx native tabs --- README.md | 23 +- docs/source/querqy/rewriters.rst | 755 +++++++++++++++---------------- 2 files changed, 395 insertions(+), 383 deletions(-) diff --git a/README.md b/README.md index ab567ef..5e90013 100644 --- a/README.md +++ b/README.md @@ -48,4 +48,25 @@ Browse the docs at [http://localhost:8000](http://localhost:8000): ``` docker run -it -p 8000:80 -v $PWD/docs/build/html:/usr/share/caddy/ caddy -``` \ No newline at end of file +``` + + +### Tabs Structure + +Here is a template for the tabs: + +``` +.. tabs:: + + .. group-tab:: Elasticsearch + + Elasticsearch tab content - tab set 1 + + .. group-tab:: OpenSearch + + OpenSearch tab content - tab set 1 + + .. group-tab:: Solr + + Solr tab content - tab set 1 +``` diff --git a/docs/source/querqy/rewriters.rst b/docs/source/querqy/rewriters.rst index ca360d0..d084aaa 100644 --- a/docs/source/querqy/rewriters.rst +++ b/docs/source/querqy/rewriters.rst @@ -24,392 +24,383 @@ configured in principle. As search engines differ in how configurations are supplied to them, select your search engine below. - -.. include:: se-section.txt - -.. rst-class:: elasticsearch - -Rewriters in Elasticsearch/OpenSearch -------------------------------------- - -Querqy adds a REST endpoint to Elasticsearch/OpenSearch for managing rewriters at - -:code:`/_querqy/rewriter` - -Creating/configuring a 'Common Rules rewriter': - - -:code:`PUT /_querqy/rewriter/common_rules` - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - } - } - -.. include:: rewriters/hint-opensearch.txt - -Rewriter definitions are uploaded by sending a PUT request to the rewriter -endpoint. The last part of the request URL path (:code:`common_rules`) will -become the name of the rewriter. - -A rewriter definition must contain a class element (line #2). Its value -references an implementation of a querqy.elasticsearch.ESRewriterFactory which -will provide the rewriter that we want to use. - -The rewriter definition can also have a config object (#3) which contains the -rewriter-specific configuration. - -In the case of the SimpleCommonRulesRewriter, the configuration must contain the -rewriting rules (#4). Remember to escape line breaks etc. when you include your -rules in a JSON document. - - -We can now apply one or more rewriters to a query: - -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], - "rewriters": ["common_rules"] +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + Querqy adds a REST endpoint to Elasticsearch/OpenSearch for managing rewriters at + + :code:`/_querqy/rewriter` + + Creating/configuring a 'Common Rules rewriter': + + + :code:`PUT /_querqy/rewriter/common_rules` + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + } } - } - } - -The rewriters are added to the -:ref:`minimal query that we constructed earlier ` using a -list of named :code:`rewriters` (line #8). This list contains the rewrite chain -- the list of rewriters in the order in which they will be applied and in which -they will manipulate the query. The above example contains only a single -rewriter. - -Rewriters are referenced in the :code:`rewriters` element either just by their -name or by the :code:`name` property of an object which allows to pass request -parameters to the rewriter. The following example shows two rewriters, one of -them with additional parameters: - -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], - "rewriters": [ - "word_break", - { - "name": "common_rules", - "params": { - "criteria": { - "filter": "$[?(!@.prio || @.prio == 1)]" - } - } - } - ] + + .. include:: rewriters/hint-opensearch.txt + + Rewriter definitions are uploaded by sending a PUT request to the rewriter + endpoint. The last part of the request URL path (:code:`common_rules`) will + become the name of the rewriter. + + A rewriter definition must contain a class element (line #2). Its value + references an implementation of a querqy.elasticsearch.ESRewriterFactory which + will provide the rewriter that we want to use. + + The rewriter definition can also have a config object (#3) which contains the + rewriter-specific configuration. + + In the case of the SimpleCommonRulesRewriter, the configuration must contain the + rewriting rules (#4). Remember to escape line breaks etc. when you include your + rules in a JSON document. + + + We can now apply one or more rewriters to a query: + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], + "rewriters": ["common_rules"] + } + } } - } - } - - -The first rewriter, word_break (line #9), is just referenced by its name (we -will see a 'word break rewriter' configuration later. The second rewriter -is called in a JSON object. Its :code:`name` property references the rewriter -definition by the rewriter name, 'common_rules' (#11). The :code:`params` object -(#12) is passed to the rewriter. - -In the example, params contains a :code:`criteria` object (#13). This parameter -is specific to the Common Rules rewriter. The filter expression in the example -ensures that only rules that either have a prio property set to 1 or that don't -have any prio property at all will be applied. - -In the above example rewrite chain, the word_break rewriter will be applied -before the common_rules rewriter due to the order of the rewriters in the -:code:`rewriters` JSON list element. - -Updating and deleting rewriters -............................... - -To update a rewriter configuration, just send the updated configuration in a -:code:`PUT` request to the same rewriter URL again. - -To delete a rewriter, send a request with HTTP method :code:`DELETE` to the -rewriter URL. For example, - -:code:`DELETE /_querqy/rewriter/common_rules` - -will delete your common_rules rewriter. - - -.. rst-class:: solr - -Rewriter configuration in Solr ------------------------------- - -.. include:: hint-querqy-5-solr.txt - -**Querqy 5** - -Querqy adds a URL endpoint to Solr for managing rewriters. When you set up -Querqy in :code:`solrconfig.xml`, you've added a request handler for this: - -.. code-block:: XML - - - -You can then manage your rewriters at - -:code:`http://:/solr/mycollection/querqy/rewriter` - -Creating/configuring a 'Common Rules rewriter': - - -| :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` -| :code:`Content-Type: application/json` - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - } - } - -Rewriter definitions are uploaded by sending a POST request and appending the -:code:`action=save` parameter to the rewriter endpoint. The last part of the -request URL path (:code:`common_rules`) will become the name of the rewriter. - -A rewriter definition must contain a class element (line #2). Its value -references an implementation of a querqy.solr.SolrRewriterFactoryAdapter which -will provide the rewriter that we want to use. - -The rewriter definition can also have a config object (#3-5), which contains the -rewriter-specific configuration. In the case of the CommonRulesRewriterFactory, -the configuration must contain the rewriting rules (#4). Remember to escape line -breaks etc. when you include your rules in a JSON document. - -If you work with SolrJ, you can create your configuration request using a -request that comes with most of the Querqy-supplied rewriters. Just look out for -the :code:`*ConfigRequestBuilder` classes in the Java packages under -:code:`querqy.solr.rewriter`. - -Once we have managed our rewriter configuration, We can apply one or more -rewriters to a query: - -:code:`GET /solr/mycollection/select?q=notebook&defType=querqy&querqy.rewriters=common_rules&qf=title^3.0...` - -The parameter :code:`defType=querqy` enables the Querqy query parser. The -optional parameter :code:`querqy.rewriters` contains a list of comma-separated -rewriter names. These rewriters form the rewrite chain and they are processed in -their order of occurrence. In this specific example, we only used the rewriter -that we defined in our POST request above and we reference it by its name -:code:`common_rules`. Had we configured another rewriter under -:code:`/solr/mycollection/querqy/rewriter/replace`, we could apply the -'replace' rewriter before the 'common_rules' rewriter using the URL parameter -:code:`querqy.rewriters=replace,common_rules`. - -By default, Solr will reply with a :code:`400 Bad Request` response, if a -rewriter that was passed in in the 'querqy.rewriters' parameter does not exist. -Please see :ref:`this section ` in the 'Advanced Solr -Plugin Configuration' documentation for an option to ignore missing rewriters. - -Updating and deleting rewriters (Querqy 5) -.......................................... - -To update a rewriter configuration, just send the updated configuration in a -POST request with :code:`action=save` to the same rewriter URL again. - -To delete a rewriter, send a POST request with :code:`action=delete` to the -rewriter URL. For example, - -:code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=delete` - -will delete your common_rules rewriter. - -Getting rewriter information (Querqy 5) -....................................... - -You can get a list of configured rewriters at: - -:code:`GET /solr/mycollection/querqy/rewriter` - -To retrieve the configuration of a specific rewriter, you can make a GET call -against its endpoint. In the case of the :code:`common_rules` rewriter above, -the call would be: - -:code:`GET /solr/mycollection/querqy/rewriter/common_rules` - - -**Querqy 4** - -The rewrite chain is configured at the Querqy query parser in solrconfig.xml: - -.. code-block:: xml - :linenos: - :emphasize-lines: 6,8-22 - - - - - - - - - - commonRules - - querqy.solr.SimpleCommonRulesRewriterFactory - - - rules.txt - - - + + The rewriters are added to the + :ref:`minimal query that we constructed earlier ` using a + list of named :code:`rewriters` (line #8). This list contains the rewrite chain + - the list of rewriters in the order in which they will be applied and in which + they will manipulate the query. The above example contains only a single + rewriter. + + Rewriters are referenced in the :code:`rewriters` element either just by their + name or by the :code:`name` property of an object which allows to pass request + parameters to the rewriter. The following example shows two rewriters, one of + them with additional parameters: + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], + "rewriters": [ + "word_break", + { + "name": "common_rules", + "params": { + "criteria": { + "filter": "$[?(!@.prio || @.prio == 1)]" + } + } + } + ] + } + } + } + + + The first rewriter, word_break (line #9), is just referenced by its name (we + will see a 'word break rewriter' configuration later. The second rewriter + is called in a JSON object. Its :code:`name` property references the rewriter + definition by the rewriter name, 'common_rules' (#11). The :code:`params` object + (#12) is passed to the rewriter. + + In the example, params contains a :code:`criteria` object (#13). This parameter + is specific to the Common Rules rewriter. The filter expression in the example + ensures that only rules that either have a prio property set to 1 or that don't + have any prio property at all will be applied. + + In the above example rewrite chain, the word_break rewriter will be applied + before the common_rules rewriter due to the order of the rewriters in the + :code:`rewriters` JSON list element. + + .. rubric:: Updating and deleting rewriters + + To update a rewriter configuration, just send the updated configuration in a + :code:`PUT` request to the same rewriter URL again. + + To delete a rewriter, send a request with HTTP method :code:`DELETE` to the + rewriter URL. For example, + + :code:`DELETE /_querqy/rewriter/common_rules` + + will delete your common_rules rewriter. + + + + .. group-tab:: Solr + + .. include:: hint-querqy-5-solr.txt + + **Querqy 5** + + Querqy adds a URL endpoint to Solr for managing rewriters. When you set up + Querqy in :code:`solrconfig.xml`, you've added a request handler for this: + + .. code-block:: XML + + + + You can then manage your rewriters at + + :code:`http://:/solr/mycollection/querqy/rewriter` + + Creating/configuring a 'Common Rules rewriter': + + + | :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` + | :code:`Content-Type: application/json` + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + } + } + + Rewriter definitions are uploaded by sending a POST request and appending the + :code:`action=save` parameter to the rewriter endpoint. The last part of the + request URL path (:code:`common_rules`) will become the name of the rewriter. + + A rewriter definition must contain a class element (line #2). Its value + references an implementation of a querqy.solr.SolrRewriterFactoryAdapter which + will provide the rewriter that we want to use. + + The rewriter definition can also have a config object (#3-5), which contains the + rewriter-specific configuration. In the case of the CommonRulesRewriterFactory, + the configuration must contain the rewriting rules (#4). Remember to escape line + breaks etc. when you include your rules in a JSON document. + + If you work with SolrJ, you can create your configuration request using a + request that comes with most of the Querqy-supplied rewriters. Just look out for + the :code:`*ConfigRequestBuilder` classes in the Java packages under + :code:`querqy.solr.rewriter`. + + Once we have managed our rewriter configuration, We can apply one or more + rewriters to a query: + + :code:`GET /solr/mycollection/select?q=notebook&defType=querqy&querqy.rewriters=common_rules&qf=title^3.0...` + + The parameter :code:`defType=querqy` enables the Querqy query parser. The + optional parameter :code:`querqy.rewriters` contains a list of comma-separated + rewriter names. These rewriters form the rewrite chain and they are processed in + their order of occurrence. In this specific example, we only used the rewriter + that we defined in our POST request above and we reference it by its name + :code:`common_rules`. Had we configured another rewriter under + :code:`/solr/mycollection/querqy/rewriter/replace`, we could apply the + 'replace' rewriter before the 'common_rules' rewriter using the URL parameter + :code:`querqy.rewriters=replace,common_rules`. + + By default, Solr will reply with a :code:`400 Bad Request` response, if a + rewriter that was passed in in the 'querqy.rewriters' parameter does not exist. + Please see :ref:`this section ` in the 'Advanced Solr + Plugin Configuration' documentation for an option to ignore missing rewriters. + + .. rubric:: Updating and deleting rewriters (Querqy 5) + + To update a rewriter configuration, just send the updated configuration in a + POST request with :code:`action=save` to the same rewriter URL again. + + To delete a rewriter, send a POST request with :code:`action=delete` to the + rewriter URL. For example, + + :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=delete` + + will delete your common_rules rewriter. + + .. rubric:: Getting rewriter information (Querqy 5) + + You can get a list of configured rewriters at: + + :code:`GET /solr/mycollection/querqy/rewriter` + + To retrieve the configuration of a specific rewriter, you can make a GET call + against its endpoint. In the case of the :code:`common_rules` rewriter above, + the call would be: + + :code:`GET /solr/mycollection/querqy/rewriter/common_rules` + + + **Querqy 4** + + The rewrite chain is configured at the Querqy query parser in solrconfig.xml: + + .. code-block:: xml + :linenos: + :emphasize-lines: 6,8-22 + - - - - - -The :code:`lst` element :code:`rewriteChain` (line #6) serves as a container for -the rewriters. - -Each rewriter is defined in a :code:`rewriter` :code:`lst` element (#11). - -All rewriters must have a :code:`class` property (#15) that specifies a factory -for creating the rewriter. - -The :code:`id` property (#13) is optional. In some cases the id is used to route -request parameters to a specific rewriter. - -The 'id' and 'class' properties are the only properties that are available for -all rewriters. Rewriters can have additional properties that will only have a -meaning for the specific rewriter implementation. - -In the example, the ``rules`` property specifies the resource that contains rule -definitions for the 'Common Rules Rewriter'. Resources are files that are either -kept in ZooKeeper as part of the configset (SolrCloud) or in the 'conf' folder -of a Solr core in standalone or master-slave Solr. They can be gzipped, which -will be auto-detected by Querqy, regardless of the file name. If you keep your -files in ZooKeeper, remember the maximum file size in ZooKeeper (default: 1 MB). - - -Example: Configuring rewriter via curl (Querqy 5) -................................................. - -.. note:: - In these examples we use :code:`curl` and :code:`jq` to retrieve and edit - rewriter configuration from a running Solr installation. We assume, that - the Solr instance is reachable at :code:`http://localhost:8983`. Configure - your Solr target using the environment variables below. - -**List configured rewriters** - -This will list all configured rewriters as JSON response. Use the -rewriters :code:`id` to retrieve it's details using the subsequent -examples. - -.. code-block:: console - :linenos: - - SOLR_URL="http://localhost:8983" - SOLR_COLLECTION="collection" - curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter" \ - | jq '.response.rewriters' - -.. code-block:: JSON - :linenos: - - { - "filter": { - "id": "filter", - "path": "/querqy/rewriter/filter" - }, - "synonyms": { - "id": "synonyms", - "path": "/querqy/rewriter/synonyms" - } - } - -**Get rules for a single rewriter** - -This example will return the Querqy rules configured for a single rewriter as raw -output on the console. - -.. code-block:: console - :linenos: - - SOLR_URL="http://localhost:8983" - SOLR_COLLECTION="collection" - QUERQY_REWRITER="synonyms" - curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \ - | jq -r '.rewriter.definition.config.rules' - -**Edit rules for a single rewriter** - -Downloads the Querqy rules for a single rewriter into -a temporary file to edit. - -.. code-block:: console - :linenos: - - SOLR_URL="http://localhost:8983" - SOLR_COLLECTION="collection" - QUERQY_REWRITER="synonyms" - curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \ - | jq -r '.rewriter.definition.config.rules' \ - > /tmp/${QUERQY_REWRITER}.txt - -Edit the Querqy rules in :code:`/tmp/${QUERQY_REWRITER}.txt`. Afterwards upload them -using the following :code:`curl` call. - -.. code-block:: console - :linenos: - - curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \ - | jq -r --arg rules "$(cat /tmp/${QUERQY_REWRITER}.txt)" \ - '.rewriter.definition | .config.rules |= $rules' \ - | curl -X POST -H "Content-Type: application/json" --data-binary @- \ - "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}?action=save" + + + + + + + + commonRules + + querqy.solr.SimpleCommonRulesRewriterFactory + + + rules.txt + + + + + + + + + + The :code:`lst` element :code:`rewriteChain` (line #6) serves as a container for + the rewriters. + + Each rewriter is defined in a :code:`rewriter` :code:`lst` element (#11). + + All rewriters must have a :code:`class` property (#15) that specifies a factory + for creating the rewriter. + + The :code:`id` property (#13) is optional. In some cases the id is used to route + request parameters to a specific rewriter. + + The 'id' and 'class' properties are the only properties that are available for + all rewriters. Rewriters can have additional properties that will only have a + meaning for the specific rewriter implementation. + + In the example, the ``rules`` property specifies the resource that contains rule + definitions for the 'Common Rules Rewriter'. Resources are files that are either + kept in ZooKeeper as part of the configset (SolrCloud) or in the 'conf' folder + of a Solr core in standalone or master-slave Solr. They can be gzipped, which + will be auto-detected by Querqy, regardless of the file name. If you keep your + files in ZooKeeper, remember the maximum file size in ZooKeeper (default: 1 MB). + + + .. rubric:: Example: Configuring rewriter via curl (Querqy 5) + + .. note:: + In these examples we use :code:`curl` and :code:`jq` to retrieve and edit + rewriter configuration from a running Solr installation. We assume, that + the Solr instance is reachable at :code:`http://localhost:8983`. Configure + your Solr target using the environment variables below. + + **List configured rewriters** + + This will list all configured rewriters as JSON response. Use the + rewriters :code:`id` to retrieve it's details using the subsequent + examples. + + .. code-block:: console + :linenos: + + SOLR_URL="http://localhost:8983" + SOLR_COLLECTION="collection" + curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter" \ + | jq '.response.rewriters' + + .. code-block:: JSON + :linenos: + + { + "filter": { + "id": "filter", + "path": "/querqy/rewriter/filter" + }, + "synonyms": { + "id": "synonyms", + "path": "/querqy/rewriter/synonyms" + } + } + + **Get rules for a single rewriter** + + This example will return the Querqy rules configured for a single rewriter as raw + output on the console. + + .. code-block:: console + :linenos: + + SOLR_URL="http://localhost:8983" + SOLR_COLLECTION="collection" + QUERQY_REWRITER="synonyms" + curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \ + | jq -r '.rewriter.definition.config.rules' + + **Edit rules for a single rewriter** + + Downloads the Querqy rules for a single rewriter into + a temporary file to edit. + + .. code-block:: console + :linenos: + + SOLR_URL="http://localhost:8983" + SOLR_COLLECTION="collection" + QUERQY_REWRITER="synonyms" + curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \ + | jq -r '.rewriter.definition.config.rules' \ + > /tmp/${QUERQY_REWRITER}.txt + + Edit the Querqy rules in :code:`/tmp/${QUERQY_REWRITER}.txt`. Afterwards upload them + using the following :code:`curl` call. + + .. code-block:: console + :linenos: + + curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \ + | jq -r --arg rules "$(cat /tmp/${QUERQY_REWRITER}.txt)" \ + '.rewriter.definition | .config.rules |= $rules' \ + | curl -X POST -H "Content-Type: application/json" --data-binary @- \ + "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}?action=save" + .. _querqy-list-of-rewriters: From 5223a1d95249d176c8ffb6589cdcc934c1d965ab Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 21 Feb 2024 14:09:00 -0500 Subject: [PATCH 3/8] migrate to sphinx tabs --- README.md | 15 + docs/source/querqy/rewriters/common-rules.rst | 944 ++++++++---------- 2 files changed, 448 insertions(+), 511 deletions(-) diff --git a/README.md b/README.md index 5e90013..950d2fc 100644 --- a/README.md +++ b/README.md @@ -70,3 +70,18 @@ Here is a template for the tabs: Solr tab content - tab set 1 ``` + +or + + +``` +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + Elasticsearch/OpenSearch tab content - tab set 1 + + .. group-tab:: Solr + + Solr tab content - tab set 1 +``` diff --git a/docs/source/querqy/rewriters/common-rules.rst b/docs/source/querqy/rewriters/common-rules.rst index 6a72e89..9840107 100644 --- a/docs/source/querqy/rewriters/common-rules.rst +++ b/docs/source/querqy/rewriters/common-rules.rst @@ -20,92 +20,78 @@ with two exceptions: Configuring rules ================= -.. include:: ../se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -The rules for the 'Common Rules Rewriter' are passed as the value of the -``rules`` element when you create a configuration with the -SimpleCommonRulesRewriterFactory in Elasticsearch/OpenSearch. - -``PUT /_querqy/rewriter/common_rules`` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4 - - { - "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - } - } - -.. include:: hint-opensearch.txt - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -**Querqy 5** - -The rules for the 'Common Rules Rewriter' are passed as the ``rules`` property -in the rewriter configuration: - -| :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` -| :code:`Content-Type: application/json` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4 - - { - "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - } - } - -Remember to JSON-escape your rules. - - -**Querqy 4** - -The rules for the 'Common Rules Rewriter' are maintained in the resource that -you configured as property ``rules`` for the -SimpleCommonRulesRewriterFactory. - -.. code-block:: xml - :linenos: - :emphasize-lines: 5 - - - - - querqy.solr.SimpleCommonRulesRewriterFactory - rules.txt - - - - -This `rules` files must be in UTF-8 character encoding. The maximum file size of -is 1 MB if Solr runs as SolrCloud and if you didn't change the maximum file size -in Zookeeper. You can however gzip the file - Querqy will automatically detect -this and uncompress the file. - -.. raw:: html +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + The rules for the 'Common Rules Rewriter' are passed as the value of the + ``rules`` element when you create a configuration with the + SimpleCommonRulesRewriterFactory in Elasticsearch/OpenSearch. + + ``PUT /_querqy/rewriter/common_rules`` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4 + + { + "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + } + } + + .. include:: hint-opensearch.txt + + + .. group-tab:: Solr + + **Querqy 5** + + The rules for the 'Common Rules Rewriter' are passed as the ``rules`` property + in the rewriter configuration: + + | :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` + | :code:`Content-Type: application/json` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4 + + { + "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + } + } + + Remember to JSON-escape your rules. + + + **Querqy 4** + + The rules for the 'Common Rules Rewriter' are maintained in the resource that + you configured as property ``rules`` for the + SimpleCommonRulesRewriterFactory. + + .. code-block:: xml + :linenos: + :emphasize-lines: 5 + + + + + querqy.solr.SimpleCommonRulesRewriterFactory + rules.txt + + + + + This `rules` files must be in UTF-8 character encoding. The maximum file size of + is 1 MB if Solr runs as SolrCloud and if you didn't change the maximum file size + in Zookeeper. You can however gzip the file - Querqy will automatically detect + this and uncompress the file. -
Structure of a rule ------------------- @@ -241,11 +227,6 @@ the current wildcard implementation, which might be removed in the future: * The wildcard can only occur at the very end of the input matching. * It cannot be combined with the right-hand input boundary marker (..."). -.. rst-class:: solr - -.. raw:: html - -
**Querqy 5** @@ -289,9 +270,6 @@ sub-group (like in ``smart* AND (mobile app*)``). However, a boolean input expression that contains a wildcard cannot be combined with SYNONYM or DELETE instructions. -.. raw:: html - -
SYNONYM rules ------------- @@ -445,36 +423,40 @@ a simple parser that splits on whitespace and marks tokens prefixed by ``-`` as 'querqyParser' in the `configuration` to set a different parser.) A special case are right-hand side definitions that start with ``*``. The -string following the \* will be treated as a query in the syntax of the +string following the ``*`` will be treated as a query in the syntax of the search engine. In the following example we favour a certain price range as an interpretation of -'cheap' and penalise documents from category 'accessories' using raw Solr -queries: +'cheap' and penalise documents from category 'accessories': -.. code-block:: Text - :linenos: +.. tabs:: - cheap notebook => - UP(10): * price:[350 TO 450] - DOWN(20): * category:accessories + .. group-tab:: Elasticsearch/OpenSearch -The same example in Elasticsearch/OpenSearch: + .. code-block:: Text + :linenos: + + cheap notebook => + UP(10): * {"range": {"price": {"gte": 350, "lte": 450}}} + DOWN(20): * {"term": {"category": "accessories"}} + + + .. group-tab:: Solr + + .. code-block:: Text + :linenos: + + cheap notebook => + UP(10): * price:[350 TO 450] + DOWN(20): * category:accessories -.. code-block:: Text - :linenos: - cheap notebook => - UP(10): * {"range": {"price": {"gte": 350, "lte": 450}}} - DOWN(20): * {"term": {"category": "accessories"}} FILTER rules ------------ -.. include:: ../se-section.txt - Filter rules work similar to UP and DOWN rules, but instead of moving search results up or down the result list they restrict search results to those that match the filter query. The following rule looks similar to the 'iphone' example @@ -489,29 +471,34 @@ not 'case': FILTER: -case The filter is applied to all query fields defined in the -:raw-html:`'generated.query_fields' or 'query_fields' -'gqf' or 'qf'` :ref:`request parameters `. +``generated.query_fields`` or ``query_fields`` in Elasticsearch/OpenSearch or ``gqf`` or ``qf`` in Solr. In the case of a required keyword ('apple') the filter matches if the keyword occurs in one or more query fields. The negative filter ('-case') only matches documents where the keyword occurs in none of the query fields. The right-hand side of filter instructions accepts raw queries. To completely -exclude results from category 'accessories' for query 'notebook' you would -write in Solr: +exclude results from category 'accessories' for query 'notebook': -.. code-block:: Text - :linenos: +.. tabs:: - notebook => - FILTER: * -category:accessories + .. group-tab:: Elasticsearch/OpenSearch -The same filter in Elasticsearch/OpenSearch: + .. code-block:: Text + :linenos: + + notebook => + FILTER: * {"bool": { "must_not": [ {"term": {"category":"accessories"}}]}} + + + .. group-tab:: Solr + + .. code-block:: Text + :linenos: + + notebook => + FILTER: * -category:accessories -.. code-block:: Text - :linenos: - notebook => - FILTER: * {"bool": { "must_not": [ {"term": {"category":"accessories"}}]}} DELETE rules @@ -574,59 +561,50 @@ The following restrictions apply to delete rules: DECORATE rules -------------- -.. include:: ../se-section.txt - -.. note:: - - This feature is only available for Solr. - - -.. rst-class:: solr - -.. raw:: html - -
- -Decorate rules are not strictly query rewriting rules but they are quite handy -to add query-dependent information to search results. For example, in online -shops there are almost always a few search queries that have nothing to do with -the products in the shop but with deliveries, T&C, FAQs and other service -information. A decorate rule matches those search terms and adds the configured -information to the search results: - -.. code-block:: Text - :linenos: - - faq => - DECORATE: REDIRECT /service/faq - - -The Solr response will then contain an array 'querqy_decorations' with the -right-hand side expressions of the matching decorate rules: - -.. code-block:: xml - :linenos: - :emphasize-lines: 5-8 - - - ... - ... - ... - - REDIRECT /service/faq - ... - - - -Querqy does not inspect the right-hand side of the decorate instruction -('REDIRECT /service/faq') but returns the configured value 'as is'. You could -even configure a JSON-formatted value in this place but you have to assure that -the value does not contain any line break. - -.. raw:: html - -
- +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + This feature is only available for Solr. + + .. group-tab:: Solr + + Decorate rules are not strictly query rewriting rules but they are quite handy + to add query-dependent information to search results. For example, in online + shops there are almost always a few search queries that have nothing to do with + the products in the shop but with deliveries, T&C, FAQs and other service + information. A decorate rule matches those search terms and adds the configured + information to the search results: + + .. code-block:: Text + :linenos: + + faq => + DECORATE: REDIRECT /service/faq + + + The Solr response will then contain an array 'querqy_decorations' with the + right-hand side expressions of the matching decorate rules: + + .. code-block:: xml + :linenos: + :emphasize-lines: 5-8 + + + ... + ... + ... + + REDIRECT /service/faq + ... + + + + Querqy does not inspect the right-hand side of the decorate instruction + ('REDIRECT /service/faq') but returns the configured value 'as is'. You could + even configure a JSON-formatted value in this place but you have to assure that + the value does not contain any line break. + Properties: ordering, filtering and tracking of rules ----------------------------------------------------- @@ -783,102 +761,75 @@ and selecting rules depending on the context. We will tell Querqy to only apply the first rule after sorting them by the 'priority' property in descending order. -.. include:: ../se-section.txt - -.. rst-class:: solr - -.. raw:: html - -
- -In order to enable rule selection we need to make sure that a rewriter ID has -been configured for the Common Rules rewriter in solrconfig.xml (Querqy 4 only, -Querqy 5 provides the ID automatically by using the rewriter configuration API): - -.. code-block:: xml - :linenos: - :emphasize-lines: 8 - - - - - - - common1 - querqy.solr.SimpleCommonRulesRewriterFactory - rules.txt - - - - - -.. raw:: html - -
- -We can order the rules by the value of the 'priority' property in descending -order and tell Querqy that it should only apply the rule with the highest -priority using the following request parameters: - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -``POST /myindex/_search`` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 12-15 - - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], - "rewriters": [ - { - "name": "common_rules", - "params": { - "criteria": { - "sort": "priority desc", - "limit": 1 - } - } - } - ] - } - } - } - -.. raw:: html - -
- - -.. rst-class:: solr - -.. raw:: html - -
- -.. code-block:: Text - - querqy.common1.criteria.sort=priority desc - querqy.common1.criteria.limit=1 - -The parameters have a common prefix 'querqy.common1.criteria' where 'common1' -matches the rewriter ID that was configured in solrconfig.xml. This allows us to -scope the rule selection and ordering per rewriter. - -.. raw:: html +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``POST /myindex/_search`` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 12-15 + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], + "rewriters": [ + { + "name": "common_rules", + "params": { + "criteria": { + "sort": "priority desc", + "limit": 1 + } + } + } + ] + } + } + } -
+ .. group-tab:: Solr + + In order to enable rule selection we need to make sure that a rewriter ID has + been configured for the Common Rules rewriter in solrconfig.xml (Querqy 4 only, + Querqy 5 provides the ID automatically by using the rewriter configuration API): + + .. code-block:: xml + :linenos: + :emphasize-lines: 8 + + + + + + + common1 + querqy.solr.SimpleCommonRulesRewriterFactory + rules.txt + + + + + + We can order the rules by the value of the 'priority' property in descending + order and tell Querqy that it should only apply the rule with the highest + priority using the following request parameters: + + .. code-block:: Text + + querqy.common1.criteria.sort=priority desc + querqy.common1.criteria.limit=1 + + The parameters have a common prefix 'querqy.common1.criteria' where 'common1' + matches the rewriter ID that was configured in solrconfig.xml. This allows us to + scope the rule selection and ordering per rewriter. ``sort`` specifies the property to sort by and the sort order, which can take the values 'asc' and 'desc' @@ -897,56 +848,53 @@ there are for the top priority value. The problem can be solved by adding another parameter: -.. rst-class:: elasticsearch - -.. raw:: html - -
- - -``POST /myindex/_search`` - - -.. code-block:: JSON - :linenos: - :emphasize-lines: 15 +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``POST /myindex/_search`` + + + .. code-block:: JSON + :linenos: + :emphasize-lines: 15 + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], + "rewriters": [ + { + "name": "common_rules", + "params": { + "criteria": { + "sort": "priority desc", + "limit": 1, + "limitByLevel": true + } + } + } + ] + } + } + } - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], - "rewriters": [ - { - "name": "common_rules", - "params": { - "criteria": { - "sort": "priority desc", - "limit": 1, - "limitByLevel": true - } - } - } - ] - } - } - } + .. group-tab:: Solr + .. code-block:: Text + :emphasize-lines: 3 + + querqy.common1.criteria.sort=priority desc + querqy.common1.criteria.limit=1 + querqy.common1.criteria.limitByLevel=true -.. raw:: html -
-.. rst-class:: solr -.. code-block:: Text - :emphasize-lines: 3 - querqy.common1.criteria.sort=priority desc - querqy.common1.criteria.limit=1 - querqy.common1.criteria.limitByLevel=true @@ -960,35 +908,38 @@ would select the first 5 elements in the list [10, 10, 8, 8, 8, 5, 4, 4]. Rules can also be filtered by properties using `JsonPath`_ expressions, where the general parameter syntax is: -.. rst-class:: elasticsearch - -.. code-block:: JSON - :linenos: - :emphasize-lines: 9 - - { - "query": { - "querqy": { - "rewriters": [ - { - "name": "common_rules", - "params": { - "criteria": { - "filter": "" - } - } - } - ] - } - } - } +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + .. code-block:: JSON + :linenos: + :emphasize-lines: 9 + + { + "query": { + "querqy": { + "rewriters": [ + { + "name": "common_rules", + "params": { + "criteria": { + "filter": "" + } + } + } + ] + } + } + } + .. group-tab:: Solr -.. rst-class:: solr + .. code-block:: Text + + querqy.common1.criteria.filter= -.. code-block:: Text - querqy.common1.criteria.filter= The properties that where defined at a given Querqy rule are considered a JSON document and a rule filter matches the rule if the JsonPath expression @@ -1034,186 +985,157 @@ Reference Configuration ------------- -.. include:: ../se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -``PUT /_querqy/rewriter/common_rules`` - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop", - "ignoreCase": true, - "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" - } - } - -rules - The rule definitions - - Default: (empty = no rules) - -ignoreCase - Ignore case in input matching for rules? - - Default: ``true`` - -querqyParser - The querqy.rewrite.commonrules.QuerqyParserFactory to use for parsing strings - from the right-hand side of rules into query objects - - Default: ``querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory`` - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -**Querqy 5** - -.. code-block:: JSON - - { - "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop", - "ignoreCase" : true, - "buildTermCache": true, - "boostMethod": "MULTIPLICATIVE", - "allowBooleanInput": true, - "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" - } - } - - -**Querqy 4** - - -.. code-block:: xml - - - querqy.solr.SimpleCommonRulesRewriterFactory - rules.txt - true - true - MULTIPLICATIVE - querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory - - -rules - *Querqy 5*: A property containing the rules for rewriting. Remember to escape - the rules for JSON. - - *Querqy 4*: The rule definitions file containing the rules for rewriting. The - file is kept in the configset of the collection in ZooKeeper (SolrCloud) or in - the 'conf' folder of the Solr core in standalone or master-slave Solr. - - Note that the default maximum file size in ZooKeeper is 1 MB. For Querqy 4, - the file can be gzipped. Querqy will auto-detect whether the file is - compressed, regardless of the file name. Querqy 5 will compress and split - files automatically. - - Required. - -ignoreCase - Ignore case in input matching for rules? - - Default: ``true`` - -buildTermCache - Whether to build a term cache from matching terms. This is a optimization - that might not be feasable for very large rule lists. - - Default: ``true`` - -boostMethod - *Querqy 5.4*: How to combine UP/DOWN boosts with the score of the main user - query. Available methods are ADDITIVE and MULTIPLICATIVE. - - Default: ``ADDITIVE`` - -allowBooleanInput - *Querqy 5.0*: Whether to interpret the rule input definitions as boolean - expressions. - - Default: ``false`` - -querqyParser - The querqy.rewrite.commonrules.QuerqyParserFactory to use for parsing strings - from the right-hand side of rules into query objects - - Default: ``querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory`` - - -.. raw:: html - -
- +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``PUT /_querqy/rewriter/common_rules`` + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop", + "ignoreCase": true, + "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" + } + } + + rules + The rule definitions + + Default: (empty = no rules) + + ignoreCase + Ignore case in input matching for rules? + + Default: ``true`` + + querqyParser + The querqy.rewrite.commonrules.QuerqyParserFactory to use for parsing strings + from the right-hand side of rules into query objects + + Default: ``querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory`` + + .. group-tab:: Solr + + **Querqy 5** + + .. code-block:: JSON + + { + "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop", + "ignoreCase" : true, + "buildTermCache": true, + "boostMethod": "MULTIPLICATIVE", + "allowBooleanInput": true, + "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" + } + } + + + **Querqy 4** + + + .. code-block:: xml + + + querqy.solr.SimpleCommonRulesRewriterFactory + rules.txt + true + true + MULTIPLICATIVE + querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory + + + rules + *Querqy 5*: A property containing the rules for rewriting. Remember to escape + the rules for JSON. + + *Querqy 4*: The rule definitions file containing the rules for rewriting. The + file is kept in the configset of the collection in ZooKeeper (SolrCloud) or in + the 'conf' folder of the Solr core in standalone or master-slave Solr. + + Note that the default maximum file size in ZooKeeper is 1 MB. For Querqy 4, + the file can be gzipped. Querqy will auto-detect whether the file is + compressed, regardless of the file name. Querqy 5 will compress and split + files automatically. + + Required. + + ignoreCase + Ignore case in input matching for rules? + + Default: ``true`` + + buildTermCache + Whether to build a term cache from matching terms. This is a optimization + that might not be feasable for very large rule lists. + + Default: ``true`` + + boostMethod + *Querqy 5.4*: How to combine UP/DOWN boosts with the score of the main user + query. Available methods are ADDITIVE and MULTIPLICATIVE. + + Default: ``ADDITIVE`` + + allowBooleanInput + *Querqy 5.0*: Whether to interpret the rule input definitions as boolean + expressions. + + Default: ``false`` + + querqyParser + The querqy.rewrite.commonrules.QuerqyParserFactory to use for parsing strings + from the right-hand side of rules into query objects + + Default: ``querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory`` + Request ------- -.. rst-class:: elasticsearch - -.. raw:: html - -
- -.. code-block:: JSON - :linenos: - :emphasize-lines: 8-13 - - { - "query": { - "querqy": { - "rewriters": [ - { - "name": "common_rules", - "params": { - "criteria": { - "filter": "", - "sort": "", - "limit": 1, - "limitByLevel": true - } - } - } - ] - } - } - } - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -Parameters must be prefixed by ``querqy..`` +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + .. code-block:: JSON + :linenos: + :emphasize-lines: 8-13 + + { + "query": { + "querqy": { + "rewriters": [ + { + "name": "common_rules", + "params": { + "criteria": { + "filter": "", + "sort": "", + "limit": 1, + "limitByLevel": true + } + } + } + ] + } + } + } + -Example: ``querqy.common1.criteria.sort=priority desc`` - set 'criteria.sort' -for rewriter 'common1'. + .. group-tab:: Solr -.. raw:: html + Parameters must be prefixed by ``querqy..`` + + Example: ``querqy.common1.criteria.sort=priority desc`` - set 'criteria.sort' + for rewriter 'common1'. -
criteria.filter Only apply rules that match the filter. A JsonPath_ expression that is From 26450a3fe1703e7050172a8b6b0495d121000cc7 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Mon, 26 Feb 2024 12:40:37 -0500 Subject: [PATCH 4/8] migrate more --- docs/source/querqy/rewriters/replace.rst | 129 ++++----- docs/source/querqy/rewriters/word-break.rst | 298 +++++++++----------- 2 files changed, 190 insertions(+), 237 deletions(-) diff --git a/docs/source/querqy/rewriters/replace.rst b/docs/source/querqy/rewriters/replace.rst index 3e2662b..0e66d69 100644 --- a/docs/source/querqy/rewriters/replace.rst +++ b/docs/source/querqy/rewriters/replace.rst @@ -31,78 +31,63 @@ Setup As a first step, the Replace Rewriter is configured -.. include:: ../se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -``PUT /_querqy/rewriter/replace`` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4-7 - - { - "class": "querqy.elasticsearch.rewriter.ReplaceRewriterFactory", - "config": { - "rules": "mobiles => mobile", - "ignoreCase": true, - "inputDelimiter": ";", - "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" - } - } - -.. include:: hint-opensearch.txt - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -**Querqy 5** - -| :code:`POST /solr/mycollection/querqy/rewriter/replace?action=save` -| :code:`Content-Type: application/json` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4-7 - - { - "class": "querqy.solr.rewriter.replace.ReplaceRewriterFactory", - "config": { - "rules": "mobiles => mobile", - "ignoreCase": true, - "inputDelimiter": ";", - "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" - } - } - - -**Querqy 4** - -.. code-block:: xml - :linenos: - - - querqy.solr.contrib.ReplaceRewriterFactory - replace-rules.txt - true - ; - querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory - - -.. raw:: html - -
+.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``PUT /_querqy/rewriter/replace`` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4-7 + + { + "class": "querqy.elasticsearch.rewriter.ReplaceRewriterFactory", + "config": { + "rules": "mobiles => mobile", + "ignoreCase": true, + "inputDelimiter": ";", + "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" + } + } + + .. include:: hint-opensearch.txt + + .. group-tab:: Solr + + **Querqy 5** + + | :code:`POST /solr/mycollection/querqy/rewriter/replace?action=save` + | :code:`Content-Type: application/json` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4-7 + + { + "class": "querqy.solr.rewriter.replace.ReplaceRewriterFactory", + "config": { + "rules": "mobiles => mobile", + "ignoreCase": true, + "inputDelimiter": ";", + "querqyParser": "querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory" + } + } + + + **Querqy 4** + + .. code-block:: xml + :linenos: + + + querqy.solr.contrib.ReplaceRewriterFactory + replace-rules.txt + true + ; + querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory + + The replace rules must be specified in a property ``rules`` (Elasticsearch, Querqy 5 for Solr). Remember to JSON-escape the value of this property. diff --git a/docs/source/querqy/rewriters/word-break.rst b/docs/source/querqy/rewriters/word-break.rst index 6aa8aa2..ca088b3 100644 --- a/docs/source/querqy/rewriters/word-break.rst +++ b/docs/source/querqy/rewriters/word-break.rst @@ -32,126 +32,108 @@ category and product type fields. Setting up a Word Break Rewriter ================================ -.. include:: ../se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -``PUT /_querqy/rewriter/word_break`` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4-11 - - { - "class": "querqy.elasticsearch.rewriter.WordBreakCompoundRewriterFactory", - "config": { - "dictionaryField" : "dictionary", - "lowerCaseInput": true, - "decompound": { - "maxExpansions": 5, - "verifyCollation": true - }, - "reverseCompoundTriggerWords": ["for"], - "morphology": "GERMAN" - } - } - -.. include:: hint-opensearch.txt - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -**Querqy 5** - -| :code:`POST /solr/mycollection/querqy/rewriter/word_break?action=save` -| :code:`Content-Type: application/json` - -Querqy 5.3 and greater: - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.solr.rewriter.wordbreak.WordBreakCompoundRewriterFactory", - "config": { - "dictionaryField" : "dictionary", - "lowerCaseInput": true, - "decompound": { - "maxExpansions": 5, - "verifyCollation": true, - "morphology": "GERMAN" - }, - "compound": { - "morphology": "GERMAN" - }, - - "reverseCompoundTriggerWords": ["for"], - "protectedWords": ["slipper"] - } - } - -For backward compatibility, you can configure ``morphology`` still as in -Querqy for Solr < 5.3 (= 'above' the ``decompound``/``compound`` level) but it -would then only be applied for decompounding, mimicking the behaviour of versions -< 5.3. - -Querqy 5.0 to 5.2: - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.solr.rewriter.wordbreak.WordBreakCompoundRewriterFactory", - "config": { - "dictionaryField" : "dictionary", - "lowerCaseInput": true, - "decompound": { - "maxExpansions": 5, - "verifyCollation": true - }, - "morphology": "GERMAN", - "reverseCompoundTriggerWords": ["for"], - "protectedWords": ["slipper"] +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``PUT /_querqy/rewriter/word_break`` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4-11 + + { + "class": "querqy.elasticsearch.rewriter.WordBreakCompoundRewriterFactory", + "config": { + "dictionaryField" : "dictionary", + "lowerCaseInput": true, + "decompound": { + "maxExpansions": 5, + "verifyCollation": true + }, + "reverseCompoundTriggerWords": ["for"], + "morphology": "GERMAN" + } } - } - -**Querqy 4** - -.. code-block:: xml - :linenos: - - - querqy.solr.contrib.WordBreakCompoundRewriterFactory - f1 - true - 5 - true - GERMAN - - for - - - slipper - wissenschaft - - - - -.. raw:: html - -
- + + .. include:: hint-opensearch.txt + + .. group-tab:: Solr + + **Querqy 5** + + | :code:`POST /solr/mycollection/querqy/rewriter/word_break?action=save` + | :code:`Content-Type: application/json` + + Querqy 5.3 and greater: + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.solr.rewriter.wordbreak.WordBreakCompoundRewriterFactory", + "config": { + "dictionaryField" : "dictionary", + "lowerCaseInput": true, + "decompound": { + "maxExpansions": 5, + "verifyCollation": true, + "morphology": "GERMAN" + }, + "compound": { + "morphology": "GERMAN" + }, + + "reverseCompoundTriggerWords": ["for"], + "protectedWords": ["slipper"] + } + } + + For backward compatibility, you can configure ``morphology`` still as in + Querqy for Solr < 5.3 (= 'above' the ``decompound``/``compound`` level) but it + would then only be applied for decompounding, mimicking the behaviour of versions + < 5.3. + + Querqy 5.0 to 5.2: + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.solr.rewriter.wordbreak.WordBreakCompoundRewriterFactory", + "config": { + "dictionaryField" : "dictionary", + "lowerCaseInput": true, + "decompound": { + "maxExpansions": 5, + "verifyCollation": true + }, + "morphology": "GERMAN", + "reverseCompoundTriggerWords": ["for"], + "protectedWords": ["slipper"] + } + } + + **Querqy 4** + + .. code-block:: xml + :linenos: + + + querqy.solr.contrib.WordBreakCompoundRewriterFactory + f1 + true + 5 + true + GERMAN + + for + + + slipper + wissenschaft + + The Word Break Rewriter is backed by a dictionary of known words. The @@ -182,21 +164,9 @@ word splits. For example, the word 'action' will not be split into 'act + ion' as long as the 'act' and 'ion' do not co-occur in the dictionaryField of a document. - -.. rst-class:: solr - -.. raw:: html - -
- -Words provided on the list of ``protectedWords`` will be exempt from +.. hint:: When using Solr, words provided on the list of ``protectedWords`` will be exempt from decompounding. -.. raw:: html - -
- - By default, it is assumed that words that together form compound word were just joined together without changing their form. But in some languages @@ -234,42 +204,40 @@ Reference Configuration ------------- -.. include:: ../se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
-``PUT /_querqy/rewriter/word_break`` - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.elasticsearch.rewriter.WordBreakCompoundRewriterFactory", - "config": { - "dictionaryField" : "dictionary", - "minSuggestionFreq": 3, - "minBreakLength": 4, - "maxCombineLength": 30, - "lowerCaseInput": true, - "decompound": { - "maxExpansions": 5, - "verifyCollation": true - }, - "morphology": "GERMAN", - "reverseCompoundTriggerWords": ["for", "from", "of"], - "alwaysAddReverseCompounds": true - - } - } - -.. raw:: html +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``PUT /_querqy/rewriter/word_break`` + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.elasticsearch.rewriter.WordBreakCompoundRewriterFactory", + "config": { + "dictionaryField" : "dictionary", + "minSuggestionFreq": 3, + "minBreakLength": 4, + "maxCombineLength": 30, + "lowerCaseInput": true, + "decompound": { + "maxExpansions": 5, + "verifyCollation": true + }, + "morphology": "GERMAN", + "reverseCompoundTriggerWords": ["for", "from", "of"], + "alwaysAddReverseCompounds": true + + } + } + -
+ .. group-tab:: Solr + Needs example + dictionaryField The field containing the words for splitting compounds. Should be lowercased. From 2b573f732bfcd31c36f68787c70c533fecf74df0 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Mon, 26 Feb 2024 13:00:03 -0500 Subject: [PATCH 5/8] and some more converted --- docs/source/querqy/more-about-queries.rst | 1089 +++++++++--------- docs/source/querqy/rewriters/number-unit.rst | 108 +- 2 files changed, 564 insertions(+), 633 deletions(-) diff --git a/docs/source/querqy/more-about-queries.rst b/docs/source/querqy/more-about-queries.rst index bc9c653..69ab1f0 100644 --- a/docs/source/querqy/more-about-queries.rst +++ b/docs/source/querqy/more-about-queries.rst @@ -9,55 +9,43 @@ More about queries In 'Getting started with Querqy' we showed how to build :ref:`a minimal query ` with Querqy: -.. rst-class:: elasticsearch -.. raw:: html - -
- -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 3,5,7 - - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"] - } - } - } - - - - -All we had to do was to use a ``querqy`` query (line #3), define a query string -for matching (#5) and specify which fields to query (#7). - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -:code:`/solr/mycollection/select?q=notebook&defType=querqy&qf=title^3.0 brand^2.1 shortSummary` - - -All we had to do was to use the Querqy query parser (``defType=querqy``), -define a query string for matching (``q=...``) and specify which fields to query -(``qf=...``). - -.. raw:: html - -
+.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 3,5,7 + + { + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"] + } + } + } + + + + + All we had to do was to use a ``querqy`` query (line #3), define a query string + for matching (#5) and specify which fields to query (#7). + + .. group-tab:: Solr + + :code:`/solr/mycollection/select?q=notebook&defType=querqy&qf=title^3.0 brand^2.1 shortSummary` + + + All we had to do was to use the Querqy query parser (``defType=querqy``), + define a query string for matching (``q=...``) and specify which fields to query + (``qf=...``). + Querqy has many more query parameters. We will introduce a few underlying concepts before we explain them in the Reference section. @@ -112,542 +100,501 @@ are no boost queries on the Querqy query use the query paramer ``querqy.rq={!ltr Reference --------- -.. include:: se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - - { - "query": { - - "querqy": { - - "matching_query": { - "query": "notebook", - "similarity_scoring": "dfc", - "weight": 0.75 - }, - - "query_fields": [ - "title^3.0", "brand^2.1", "shortSummary" - ], - - "minimum_should_match": "100%", - "tie_breaker": 0.01, - "field_boost_model": "prms", - - "rewriters": [ - "word_break", - { - "name": "common_rules", - "params": { - "criteria": { - "filter": "$[?(!@.prio || @.prio == 1)]" - } - } - } - ], - - "boosting_queries": { - "rewritten_queries": { - "use_field_boost": false, - "similarity_scoring": "off", - "positive_query_weight": 1.2, - "negative_query_weight": 2.0 - }, - "phrase_boosts": { - "full": { - "fields": ["title", "brand^4"], - "slop": 2 - }, - "bigram": { - "fields": ["title"], - "slop": 3 - }, - "trigram": { - "fields": ["title", "brand", "shortSummary"], - "slop": 6 - }, - "tie_breaker": 0.5 - } - }, - - "generated" : { - "query_fields": [ - "title^2.0", "brand^1.5", "shortSummary^0.0007" - ], - "field_boost_factor": 0.8 - } - - } - } - } - -.. raw:: html - -
+.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + + { + "query": { + + "querqy": { + + "matching_query": { + "query": "notebook", + "similarity_scoring": "dfc", + "weight": 0.75 + }, + + "query_fields": [ + "title^3.0", "brand^2.1", "shortSummary" + ], + + "minimum_should_match": "100%", + "tie_breaker": 0.01, + "field_boost_model": "prms", + + "rewriters": [ + "word_break", + { + "name": "common_rules", + "params": { + "criteria": { + "filter": "$[?(!@.prio || @.prio == 1)]" + } + } + } + ], + + "boosting_queries": { + "rewritten_queries": { + "use_field_boost": false, + "similarity_scoring": "off", + "positive_query_weight": 1.2, + "negative_query_weight": 2.0 + }, + "phrase_boosts": { + "full": { + "fields": ["title", "brand^4"], + "slop": 2 + }, + "bigram": { + "fields": ["title"], + "slop": 3 + }, + "trigram": { + "fields": ["title", "brand", "shortSummary"], + "slop": 6 + }, + "tie_breaker": 0.5 + } + }, + + "generated" : { + "query_fields": [ + "title^2.0", "brand^1.5", "shortSummary^0.0007" + ], + "field_boost_factor": 0.8 + } + + } + } + } + + .. group-tab:: Solr + + Need example .. rubric:: Global parameters and matching query -.. rst-class:: elasticsearch +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + query_field + The list of fields in which to search for query terms. A field weight can be + appended to the field name using the ``^``\-symbol. Field weights are positive + integer or decimal numbers. The default field weight is ``1.0`` + + Required + + minimum_should_match + *The minimum number of query clauses that must match for a document to be + returned.* (Copied from Elasticsearch's `match query documentation `_, + which also see for valid parameter values). + + The minimum number of query clauses is counted across fields. For example, + if the query ``a b`` is searched in ``"query_fields":["f1", "f2"]`` with + ``"minimum_should_match":"100%"``, the two terms need not match in the same + field so that a document matching ``f1:a`` and ``f2:b`` will be included in + the result set. + + Default: ``1`` + + tie_breaker + When a query term ``a`` is searched across fields (``f1``, ``f2`` and ``f3``), + the query is expanded into term queries (``f1:a``, ``f2:a``, ``f3:a``). The + expanded query will use as its own score the score of the highest scoring term + query plus the sum of the scores of the other term queries multiplied with + ``tie_breaker``. Let's assume that ``f2:a`` produces the highest score, the + resulting score will be + ``score(f2:a) + tie_breaker * (score(f1:a) + score(f3:a))``. + + Default: ``0.0`` + + field_boost_model + Values: ``fixed`` ``prms`` + + Querqy allows to choose between two approaches for field boosting in scoring: + + * ``fixed``: field boosts are specified at field names in 'query_fields'. + The same field weight will be used across all query terms for a given query + field. + * ``prms``: field boosts are derived from the distribution of the query terms + in the index. More specifically, they are derived from the probability that + a given query term occurs in a given field in the index. For example, given + the query 'apple iphone black' with query fields 'brand', 'category' and + 'color', the term 'apple' will in most data sets have a greater probability + and weight for the 'brand' field compared to 'category' and 'color', whereas + 'black' will have the greatest probability in the 'color' field. [1]_ + + Field weights specified in 'query_fields' will be ignored if + 'field_boost_model' is set to 'prms'. + + Default: ``fixed`` + + matching_query.similarity_scoring + Values: ``dfc`` ``on`` ``off`` + + Controls how Lucene's scoring implementation (= *similarity*) is used when an + input query term is expanded across fields and when it is expanded during + query rewriting: + + * ``dfc``: 'document frequency correction' - use the same document frequency + value for all terms that are derived from the same input query term. For + example, let 'a b' be the input query and let it be rewritten to + '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x)) (f1:b \| f2:b)` by + synonym and field expansion, then + '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x))' (all derived from 'a') + will use the same document frequency value. More specifically, Querqy will + use the maximum document frequency of these terms as the document frequency + value for all of them. Similarily, the maximum document frequency of + '(f1:b | f2:b)' will be used for these two terms. + * ``off``: Ignore the output of Lucene's similarity scoring. Only field boosts + will be used for scoring. + * ``on``: Use Lucene's similarity scoring output. Note that field + boosting (normally part of Lucene similarity scoring) is handled outside + the similarity in Querqy and it can be configured using the + 'field_boost_model' parameter. + + Default: ``dfc`` + + matching_query.weight + A weight that is multiplied with the score that is produced by the matching + query before the score of the boosting queries is added. + + Default: ``1.0`` + + .. group-tab:: Solr + + qf (query fields) + The list of fields in which to search for query terms. A field weight can be + appended to the field name using the ``^``\-symbol. Field weights are + positive integer or decimal numbers. The default field weight is ``1.0``. See + Solr Documentation for parameter value syntax. [2]_ + + Example: ``qf=title^3 brand^2.1 shortDescription^0.2`` + + Required + + mm (minimum should match) + The minimum number of optional query clauses that must match for a document + to be returned. + + The minimum number of query clauses is counted across fields. For example, + if the query ``a b`` is searched in ``qf=f1 f2`` with ``mm=100%``, the two + terms need not match in the same field so that a document matching ``f1:a`` + and ``f2:b`` will be included in the result set. See Solr Documentation + for value syntax. [2]_ + + Example: ``mm=100% 2<-1`` + + Default: ``1`` + + tie (tie breaker) + When a query term ``a`` is searched across fields (``f1``, ``f2`` and ``f3``), + the query is expanded into term queries (``f1:a``, ``f2:a``, ``f3:a``). The + expanded query will use as its own score the score of the highest scoring term + query plus the sum of the scores of the other term queries multiplied with + ``tie``. Let's assume that ``f2:a`` produces the highest score, the + resulting score will be + ``score(f2:a) + tie * (score(f1:a) + score(f3:a))``. [2]_ + + Default: ``0.0`` + + fbm (field boost model) + Values: ``fixed`` ``prms`` + + Querqy allows to choose between two approaches for field boosting in scoring: + + * ``fixed``: field boosts are specified at field names in 'query_fields'. + The same field weight will be used across all query terms for a given query + field. + * ``prms``: field boosts are derived from the distribution of the query terms + in the index. More specifically, they are derived from the probability that + a given query term occurs in a given field in the index. For example, given + the query 'apple iphone black' with query fields 'brand', 'category' and + 'color', the term 'apple' will in most data sets have a greater probability + and weight for the 'brand' field compared to 'category' and 'color', whereas + 'black' will have the greatest probability in the 'color' field. [1]_ + + Field weights specified in 'query_fields' will be ignored if + 'fbm' is set to 'prms'. + + Default: ``fixed`` + + uq.similarityScore + Values: ``dfc`` ``on`` ``off`` + + Controls how Lucene's scoring implementation (= *similarity*) is used when an + input query term is expanded across fields and when it is expanded during + query rewriting: + + * ``dfc``: 'document frequency correction' - use the same document frequency + value for all terms that are derived from the same input query term. For + example, let 'a b' be the input query and let it be rewritten to + '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x)) (f1:b \| f2:b)` by + synonym and field expansion, then + '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x))' (all derived from 'a') + will use the same document frequency value. More specifically, Querqy will + use the maximum document frequency of these terms as the document frequency + value for all of them. Similarily, the maximum document frequency of + '(f1:b | f2:b)' will be used for these two terms. + * ``off``: Ignore the output of Lucene's similarity scoring. Only field boosts + will be used for scoring. + * ``on``: Use Lucene's similarity scoring output. Note that field + boosting (normally part of Lucene similarity scoring) is handled outside + the similarity in Querqy and that it can be configured using the + 'fbm' parameter. + + Default: ``dfc`` + + uq.boost + A weight that is multiplied with the score that is produced by the matching + query before the score of the boosting queries is added. + + Default: ``1.0`` -.. raw:: html - -
- -query_field - The list of fields in which to search for query terms. A field weight can be - appended to the field name using the ``^``\-symbol. Field weights are positive - integer or decimal numbers. The default field weight is ``1.0`` - - Required - -minimum_should_match - *The minimum number of query clauses that must match for a document to be - returned.* (Copied from Elasticsearch's `match query documentation `_, - which also see for valid parameter values). - - The minimum number of query clauses is counted across fields. For example, - if the query ``a b`` is searched in ``"query_fields":["f1", "f2"]`` with - ``"minimum_should_match":"100%"``, the two terms need not match in the same - field so that a document matching ``f1:a`` and ``f2:b`` will be included in - the result set. - - Default: ``1`` - -tie_breaker - When a query term ``a`` is searched across fields (``f1``, ``f2`` and ``f3``), - the query is expanded into term queries (``f1:a``, ``f2:a``, ``f3:a``). The - expanded query will use as its own score the score of the highest scoring term - query plus the sum of the scores of the other term queries multiplied with - ``tie_breaker``. Let's assume that ``f2:a`` produces the highest score, the - resulting score will be - ``score(f2:a) + tie_breaker * (score(f1:a) + score(f3:a))``. - - Default: ``0.0`` - -field_boost_model - Values: ``fixed`` ``prms`` - - Querqy allows to choose between two approaches for field boosting in scoring: - - * ``fixed``: field boosts are specified at field names in 'query_fields'. - The same field weight will be used across all query terms for a given query - field. - * ``prms``: field boosts are derived from the distribution of the query terms - in the index. More specifically, they are derived from the probability that - a given query term occurs in a given field in the index. For example, given - the query 'apple iphone black' with query fields 'brand', 'category' and - 'color', the term 'apple' will in most data sets have a greater probability - and weight for the 'brand' field compared to 'category' and 'color', whereas - 'black' will have the greatest probability in the 'color' field. [1]_ - - Field weights specified in 'query_fields' will be ignored if - 'field_boost_model' is set to 'prms'. - - Default: ``fixed`` - -matching_query.similarity_scoring - Values: ``dfc`` ``on`` ``off`` - - Controls how Lucene's scoring implementation (= *similarity*) is used when an - input query term is expanded across fields and when it is expanded during - query rewriting: - - * ``dfc``: 'document frequency correction' - use the same document frequency - value for all terms that are derived from the same input query term. For - example, let 'a b' be the input query and let it be rewritten to - '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x)) (f1:b \| f2:b)` by - synonym and field expansion, then - '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x))' (all derived from 'a') - will use the same document frequency value. More specifically, Querqy will - use the maximum document frequency of these terms as the document frequency - value for all of them. Similarily, the maximum document frequency of - '(f1:b | f2:b)' will be used for these two terms. - * ``off``: Ignore the output of Lucene's similarity scoring. Only field boosts - will be used for scoring. - * ``on``: Use Lucene's similarity scoring output. Note that field - boosting (normally part of Lucene similarity scoring) is handled outside - the similarity in Querqy and it can be configured using the - 'field_boost_model' parameter. - - Default: ``dfc`` - -matching_query.weight - A weight that is multiplied with the score that is produced by the matching - query before the score of the boosting queries is added. - - Default: ``1.0`` - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -qf (query fields) - The list of fields in which to search for query terms. A field weight can be - appended to the field name using the ``^``\-symbol. Field weights are - positive integer or decimal numbers. The default field weight is ``1.0``. See - Solr Documentation for parameter value syntax. [2]_ - - Example: ``qf=title^3 brand^2.1 shortDescription^0.2`` - - Required - -mm (minimum should match) - The minimum number of optional query clauses that must match for a document - to be returned. - - The minimum number of query clauses is counted across fields. For example, - if the query ``a b`` is searched in ``qf=f1 f2`` with ``mm=100%``, the two - terms need not match in the same field so that a document matching ``f1:a`` - and ``f2:b`` will be included in the result set. See Solr Documentation - for value syntax. [2]_ - - Example: ``mm=100% 2<-1`` - Default: ``1`` - -tie (tie breaker) - When a query term ``a`` is searched across fields (``f1``, ``f2`` and ``f3``), - the query is expanded into term queries (``f1:a``, ``f2:a``, ``f3:a``). The - expanded query will use as its own score the score of the highest scoring term - query plus the sum of the scores of the other term queries multiplied with - ``tie``. Let's assume that ``f2:a`` produces the highest score, the - resulting score will be - ``score(f2:a) + tie * (score(f1:a) + score(f3:a))``. [2]_ - - Default: ``0.0`` - -fbm (field boost model) - Values: ``fixed`` ``prms`` - - Querqy allows to choose between two approaches for field boosting in scoring: - - * ``fixed``: field boosts are specified at field names in 'query_fields'. - The same field weight will be used across all query terms for a given query - field. - * ``prms``: field boosts are derived from the distribution of the query terms - in the index. More specifically, they are derived from the probability that - a given query term occurs in a given field in the index. For example, given - the query 'apple iphone black' with query fields 'brand', 'category' and - 'color', the term 'apple' will in most data sets have a greater probability - and weight for the 'brand' field compared to 'category' and 'color', whereas - 'black' will have the greatest probability in the 'color' field. [1]_ - - Field weights specified in 'query_fields' will be ignored if - 'fbm' is set to 'prms'. - - Default: ``fixed`` - -uq.similarityScore - Values: ``dfc`` ``on`` ``off`` - - Controls how Lucene's scoring implementation (= *similarity*) is used when an - input query term is expanded across fields and when it is expanded during - query rewriting: - - * ``dfc``: 'document frequency correction' - use the same document frequency - value for all terms that are derived from the same input query term. For - example, let 'a b' be the input query and let it be rewritten to - '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x)) (f1:b \| f2:b)` by - synonym and field expansion, then - '(f1:a \| f2:a \| ((f1:x \| f2:x) \| (f1:y \| f2:x))' (all derived from 'a') - will use the same document frequency value. More specifically, Querqy will - use the maximum document frequency of these terms as the document frequency - value for all of them. Similarily, the maximum document frequency of - '(f1:b | f2:b)' will be used for these two terms. - * ``off``: Ignore the output of Lucene's similarity scoring. Only field boosts - will be used for scoring. - * ``on``: Use Lucene's similarity scoring output. Note that field - boosting (normally part of Lucene similarity scoring) is handled outside - the similarity in Querqy and that it can be configured using the - 'fbm' parameter. - - Default: ``dfc`` - -uq.boost - A weight that is multiplied with the score that is produced by the matching - query before the score of the boosting queries is added. - - Default: ``1.0`` - -.. raw:: html - -
.. rubric:: Boosting queries -.. rst-class:: elasticsearch - -.. raw:: html - -
- -boosting_queries - Controls sub-queries that do not influence the matching of documents but - contribute to the score of documents that are retrieved by the - 'matching_query'. A 'querqy' query allows to control two main types of - boosting queries: - - #. ``rewritten_queries`` - boost queries that are produced as part of query - rewriting - #. ``phrase_boosts`` - (partial) phrases that are derived from the query - string for boosting documents that contain corresponding phrase matches - - Scores from both types of boosting queries will be *added* to the score of the - 'matching_query'. - -boosting_queries.rewritten_queries.use_field_boost - If ``true``, the scores of the boost queries will include field weights. A - field boost of ``1.0`` will be used otherwise. - - Default: ``true`` - - -boosting_queries.rewritten_queries.similarity_scoring - Values: ``dfc`` ``on`` ``off`` - - Controls how Lucene's scoring implementation (= *similarity*) is used when the - boosting query is expanded across fields. - - * ``dfc``: 'document frequency correction' - use the same document frequency - (df) value for all term queries that are produced from the same boost - term. Querqy will use the maximum document frequency of the produced terms - as the df value for all of them. If the 'matching_query' also uses - 'similarity_scoring=dfc', the maximum (df) of the matching query will be - added to the df of the boosting query terms in order to put the (dfs) of - the two query parts on a comparable scale and to avoid giving extremely - high weight to very sparse boost terms. - * ``off``: Ignore the output of Lucene's similarity scoring. - * ``on``: Use Lucene's similarity scoring output. - - Default: ``dfc`` - -boosting_queries.rewritten_queries.positive_query_weight / .negative_query_weight` - Query rewriting in Querqy can produce boost queries that either promote - matching documents to the top of the search result (positive boost) or that - push matching documents to the bottom of the search result list (negative - boost). - - Scores of positive boost queries are multiplied with 'positive_query_weight'. - Scores of negative boost queries are multiplied with `negative_query_weight`. - Both weights must be positive decimal numbers. Note that increasing the value - of 'negative_query_weight' means to demote matching documents more strongly. - - Default: ``1.0`` - -boosting_queries.phrase_boosts.full / .bigram / .trigram / .tie_breaker` - Unlike 'rewritten_queries', ``phrase_boosts`` can be applied regardless of - query rewriting. If enabled, a boost query will be created from phrases - which are derived from the query string. Documents matching this boost query - will be promoted to towards the top of the search result. - - The parameter objects ``full``, ``bigram`` and ``trigram`` control how phrase - boost queries will be formed: - - - ``full``: boosts documents that contain the entire input query as a phrase - - ``bigram``: creates phrase queries for boosting from pairs of adjacent query - tokens - - ``trigram``: creates phrase queries for boosting from triples of adjacent - query tokens - - The ``fields`` lists under each of these parameters define the fields and - their weights in which the phrases will be looked up. The ``slop`` defines the - number of positions the phrase tokens are allowed to shift while still - counting as a phrase. A 'slop' of two or greater allows for token - transposition (compare Elasticsearch's `Match phrase query `_). - The default 'slop' is 0. - - Depending on the number of query tokens, a matching 'full' phrase query can - imply one or more 'bigram' and 'trigram' matches. The scores of these matches - will be summed up, which can quickly result in a very large score for - documents that match a long full query phrase. Setting ``tie_breaker`` for - 'phrase_boosts' to a low value will reduce this aggregation effect. Querqy - will use the highest score produced by 'full', 'bigram' and 'trigram' matches - and multiply the score of the remaining phrase matches with the 'tie_breaker' - value. A 'tie_breaker' of 0.0 - which is the default value - will only use the - highest score. - - The concept of phrase boosting is very similar to the pf/pf2/pf3/ps/ps2/ps3 - parameters of Solr's `Extended DisMax `_ / `DisMax `_ - query parsers. However, Querqy adds control over the aggregation of the scores - from the different phrase boost types using the 'tie_breaker'. - - The score produced by 'phrase_boosts' is added to the boost of the - 'matching_query'. - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -qboost.fieldBoost - Values: ``on`` ``off`` - - If ``on``, the scores of the boost queries that are produced by query - rewriting will include field weights. A field boost of ``1.0`` will be used - otherwise. - - Default: ``on`` - +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + boosting_queries + Controls sub-queries that do not influence the matching of documents but + contribute to the score of documents that are retrieved by the + 'matching_query'. A 'querqy' query allows to control two main types of + boosting queries: + + #. ``rewritten_queries`` - boost queries that are produced as part of query + rewriting + #. ``phrase_boosts`` - (partial) phrases that are derived from the query + string for boosting documents that contain corresponding phrase matches + + Scores from both types of boosting queries will be *added* to the score of the + 'matching_query'. + + boosting_queries.rewritten_queries.use_field_boost + If ``true``, the scores of the boost queries will include field weights. A + field boost of ``1.0`` will be used otherwise. + + Default: ``true`` + + + boosting_queries.rewritten_queries.similarity_scoring + Values: ``dfc`` ``on`` ``off`` + + Controls how Lucene's scoring implementation (= *similarity*) is used when the + boosting query is expanded across fields. + + * ``dfc``: 'document frequency correction' - use the same document frequency + (df) value for all term queries that are produced from the same boost + term. Querqy will use the maximum document frequency of the produced terms + as the df value for all of them. If the 'matching_query' also uses + 'similarity_scoring=dfc', the maximum (df) of the matching query will be + added to the df of the boosting query terms in order to put the (dfs) of + the two query parts on a comparable scale and to avoid giving extremely + high weight to very sparse boost terms. + * ``off``: Ignore the output of Lucene's similarity scoring. + * ``on``: Use Lucene's similarity scoring output. + + Default: ``dfc`` + + boosting_queries.rewritten_queries.positive_query_weight / .negative_query_weight` + Query rewriting in Querqy can produce boost queries that either promote + matching documents to the top of the search result (positive boost) or that + push matching documents to the bottom of the search result list (negative + boost). + + Scores of positive boost queries are multiplied with 'positive_query_weight'. + Scores of negative boost queries are multiplied with `negative_query_weight`. + Both weights must be positive decimal numbers. Note that increasing the value + of 'negative_query_weight' means to demote matching documents more strongly. + + Default: ``1.0`` + + boosting_queries.phrase_boosts.full / .bigram / .trigram / .tie_breaker` + Unlike 'rewritten_queries', ``phrase_boosts`` can be applied regardless of + query rewriting. If enabled, a boost query will be created from phrases + which are derived from the query string. Documents matching this boost query + will be promoted to towards the top of the search result. + + The parameter objects ``full``, ``bigram`` and ``trigram`` control how phrase + boost queries will be formed: + + - ``full``: boosts documents that contain the entire input query as a phrase + - ``bigram``: creates phrase queries for boosting from pairs of adjacent query + tokens + - ``trigram``: creates phrase queries for boosting from triples of adjacent + query tokens + + The ``fields`` lists under each of these parameters define the fields and + their weights in which the phrases will be looked up. The ``slop`` defines the + number of positions the phrase tokens are allowed to shift while still + counting as a phrase. A 'slop' of two or greater allows for token + transposition (compare Elasticsearch's `Match phrase query `_). + The default 'slop' is 0. + + Depending on the number of query tokens, a matching 'full' phrase query can + imply one or more 'bigram' and 'trigram' matches. The scores of these matches + will be summed up, which can quickly result in a very large score for + documents that match a long full query phrase. Setting ``tie_breaker`` for + 'phrase_boosts' to a low value will reduce this aggregation effect. Querqy + will use the highest score produced by 'full', 'bigram' and 'trigram' matches + and multiply the score of the remaining phrase matches with the 'tie_breaker' + value. A 'tie_breaker' of 0.0 - which is the default value - will only use the + highest score. + + The concept of phrase boosting is very similar to the pf/pf2/pf3/ps/ps2/ps3 + parameters of Solr's `Extended DisMax `_ / `DisMax `_ + query parsers. However, Querqy adds control over the aggregation of the scores + from the different phrase boost types using the 'tie_breaker'. + + The score produced by 'phrase_boosts' is added to the boost of the + 'matching_query'. + + .. group-tab:: Solr + + qboost.fieldBoost + Values: ``on`` ``off`` + + If ``on``, the scores of the boost queries that are produced by query + rewriting will include field weights. A field boost of ``1.0`` will be used + otherwise. + + Default: ``on`` + + + qboost.similarityScore + Values: ``dfc`` ``on`` ``off`` + + Controls how Lucene's scoring implementation (= *similarity*) is used when the + boosting query is expanded across fields. + + * ``dfc``: 'document frequency correction' - use the same document frequency + (df) value for all term queries that are produced from the same boost + term. Querqy will use the maximum document frequency of the produced terms + as the df value for all of them. If the 'uq.similarityScore' also uses + 'dfc', the maximum (df) of the matching query will be added to the df of the + boosting query terms in order to put the (dfs) of the two query parts on a + comparable scale and to avoid giving extremely high weight to very sparse + boost terms. + * ``off``: Ignore the output of Lucene's similarity scoring. + * ``on``: Use Lucene's similarity scoring output. + + Default: ``dfc`` + + qboost.weight / .negWeight` + Query rewriting in Querqy can produce boost queries that either promote + matching documents to the top of the search result (positive boost) or that + push matching documents to the bottom of the search result list (negative + boost). + + Scores of positive boost queries are multiplied with 'qboost.weight'. + Scores of negative boost queries are multiplied with `qboost.negWeight`. + Both weights must be positive decimal numbers. Note that increasing the value + of 'qboost.negWeight' means to demote matching documents more strongly. + + Default: ``1.0`` + + pf/pf2/pf3/ps/ps2/ps3/qpf.tie (phrase boosts) + Phrase boosts can be applied regardless of query rewriting. If enabled, a + boost query will be created from phrases which are derived from the query + string, either turning using the entire query into as a phrase for boosting + (pf/ps), or using bigrams (pf2/ps2) or trigrams (pf3/ps3) as a phrase. + + This works very similar to the same parameters Solr (see Solr's + `DisMax `__ + and `eDismax `__ + Query Parsers) but Querqy adds another parameter, ``qpf.tie`` to control how + the scores from 'pf', 'pf2' and 'pf3' are combined: a long query that matches + as a phrase, will boost the entire query as a phrase and a lot of bigram and + trigram sub-query phrases at the same time, producing a very high boost. + + Setting ``qpf.tie`` to a low value will reduce this aggregation + effect. Querqy will use the highest score produced by 'pf', 'pf2' and 'pf3' + matches and multiply the score of the remaining phrase matches with the + 'qpf.tie' value. A 'qpf.tie' of 0.0 will only use the highest score. + + Example: ``pf=name^0.8 brand&pf2=brand&ps=2$ps2=0&ppf.tie=0.01`` + + Defaults: + + * ``pf``/``pf2``/``pf3``: (empty, no phrase boosting) + * ``ps``: ``0.0`` + * ``ps2/ps3``: value copied from ``ps`` + * ``qpf.tie``: ``0.0`` + + bf/bq/boost + Additive boost function (``bf``), additive boost query (``bq``) and + multiplicative boost query (``boost``). Same as in Solr's `DisMax `__ + and `eDismax `__ + Query Parsers. + + querqy.rq + Same as in Solr's `rq parameter `_ but only applies + the RankQuery when the Querqy query does not contain any boosts. + -qboost.similarityScore - Values: ``dfc`` ``on`` ``off`` - - Controls how Lucene's scoring implementation (= *similarity*) is used when the - boosting query is expanded across fields. - - * ``dfc``: 'document frequency correction' - use the same document frequency - (df) value for all term queries that are produced from the same boost - term. Querqy will use the maximum document frequency of the produced terms - as the df value for all of them. If the 'uq.similarityScore' also uses - 'dfc', the maximum (df) of the matching query will be added to the df of the - boosting query terms in order to put the (dfs) of the two query parts on a - comparable scale and to avoid giving extremely high weight to very sparse - boost terms. - * ``off``: Ignore the output of Lucene's similarity scoring. - * ``on``: Use Lucene's similarity scoring output. - - Default: ``dfc`` - -qboost.weight / .negWeight` - Query rewriting in Querqy can produce boost queries that either promote - matching documents to the top of the search result (positive boost) or that - push matching documents to the bottom of the search result list (negative - boost). - - Scores of positive boost queries are multiplied with 'qboost.weight'. - Scores of negative boost queries are multiplied with `qboost.negWeight`. - Both weights must be positive decimal numbers. Note that increasing the value - of 'qboost.negWeight' means to demote matching documents more strongly. - - Default: ``1.0`` - -pf/pf2/pf3/ps/ps2/ps3/qpf.tie (phrase boosts) - Phrase boosts can be applied regardless of query rewriting. If enabled, a - boost query will be created from phrases which are derived from the query - string, either turning using the entire query into as a phrase for boosting - (pf/ps), or using bigrams (pf2/ps2) or trigrams (pf3/ps3) as a phrase. - - This works very similar to the same parameters Solr (see Solr's - `DisMax `__ - and `eDismax `__ - Query Parsers) but Querqy adds another parameter, ``qpf.tie`` to control how - the scores from 'pf', 'pf2' and 'pf3' are combined: a long query that matches - as a phrase, will boost the entire query as a phrase and a lot of bigram and - trigram sub-query phrases at the same time, producing a very high boost. - - Setting ``qpf.tie`` to a low value will reduce this aggregation - effect. Querqy will use the highest score produced by 'pf', 'pf2' and 'pf3' - matches and multiply the score of the remaining phrase matches with the - 'qpf.tie' value. A 'qpf.tie' of 0.0 will only use the highest score. - - Example: ``pf=name^0.8 brand&pf2=brand&ps=2$ps2=0&ppf.tie=0.01`` - - Defaults: - - * ``pf``/``pf2``/``pf3``: (empty, no phrase boosting) - * ``ps``: ``0.0`` - * ``ps2/ps3``: value copied from ``ps`` - * ``qpf.tie``: ``0.0`` - -bf/bq/boost - Additive boost function (``bf``), additive boost query (``bq``) and - multiplicative boost query (``boost``). Same as in Solr's `DisMax `__ - and `eDismax `__ - Query Parsers. - -querqy.rq - Same as in Solr's `rq parameter `_ but only applies - the RankQuery when the Querqy query does not contain any boosts. - -.. raw:: html - -
.. rubric:: Generated query parts -.. rst-class:: elasticsearch - -.. raw:: html - -
- - -generated.query_fields - The list of fields and their weights for matching generated query terms like - synonyms or boost queries. If no 'query_fields' are specified for the - generated query parts, the global 'query_fields' will be used. - - Default: copy from global 'query_fields' - - -generated.field_boost_factor - A factor that is multiplied with the field weights of the generated query - terms. This factor can be used to apply a penalty to all terms that were not - entered by the user but inserted as part of the query rewriting, for example, - to give synonyms a smaller weight compared to the original term. - - This factor is applied regardless of where the 'query_fields' for generated - terms are defined, i.e. in the 'query_fields' of the 'generated' object or - globally. - - Default: ``1.0`` - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -gqf (generated query fields) - The list of fields and their weights for matching generated query terms like - synonyms or boost queries. If no 'generated query fields' are specified, the - global value from ``qf`` will be used. - - Example: ``qf=name^3 brand^1.2 ean&gqf=name^2.4 brand^0.9`` - - Default: copy from global ``qf`` - -gbf (generated boost factor) - A factor that is multiplied with the field weights of the generated query - terms. This factor can be used to apply a penalty to all terms that were not - entered by the user but inserted as part of the query rewriting, for example, - to give synonyms a smaller weight compared to the original term. - - This factor is applied regardless of where the query fields for generated - terms are defined, i.e. in ``gqf`` (generated query fields) or ``qf`` - (globally). - - Default: ``1.0`` - -.. raw:: html - -
+.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + generated.query_fields + The list of fields and their weights for matching generated query terms like + synonyms or boost queries. If no 'query_fields' are specified for the + generated query parts, the global 'query_fields' will be used. + + Default: copy from global 'query_fields' + + + generated.field_boost_factor + A factor that is multiplied with the field weights of the generated query + terms. This factor can be used to apply a penalty to all terms that were not + entered by the user but inserted as part of the query rewriting, for example, + to give synonyms a smaller weight compared to the original term. + + This factor is applied regardless of where the 'query_fields' for generated + terms are defined, i.e. in the 'query_fields' of the 'generated' object or + globally. + + Default: ``1.0`` + + + .. group-tab:: Solr + + gqf (generated query fields) + The list of fields and their weights for matching generated query terms like + synonyms or boost queries. If no 'generated query fields' are specified, the + global value from ``qf`` will be used. + + Example: ``qf=name^3 brand^1.2 ean&gqf=name^2.4 brand^0.9`` + + Default: copy from global ``qf`` + + gbf (generated boost factor) + A factor that is multiplied with the field weights of the generated query + terms. This factor can be used to apply a penalty to all terms that were not + entered by the user but inserted as part of the query rewriting, for example, + to give synonyms a smaller weight compared to the original term. + + This factor is applied regardless of where the query fields for generated + terms are defined, i.e. in ``gqf`` (generated query fields) or ``qf`` + (globally). + + Default: ``1.0`` + diff --git a/docs/source/querqy/rewriters/number-unit.rst b/docs/source/querqy/rewriters/number-unit.rst index d79a2dc..2c2f763 100644 --- a/docs/source/querqy/rewriters/number-unit.rst +++ b/docs/source/querqy/rewriters/number-unit.rst @@ -41,68 +41,52 @@ For Solr, the JSON file is put into the ZooKeeper; for Elasticsearch/OpenSearch, the JSON is put into a string value for the property ``config``. -.. include:: ../se-section.txt - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -``PUT /_querqy/rewriter/numberunit`` - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.elasticsearch.rewriter.NumberUnitRewriterFactory", - "config": { - "config": "{ \"numberUnitDefinitions\": [ ... ] }" - } - } - -.. include:: hint-opensearch.txt - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- -**Querqy 5** - -| :code:`POST /solr/mycollection/querqy/rewriter/number_unit?action=save` -| :code:`Content-Type: application/json` - -.. code-block:: JSON - :linenos: - - { - "class": "querqy.solr.rewriter.numberunit.NumberUnitRewriterFactory", - "config": { - "config": "{ \"numberUnitDefinitions\": [ ... ] }" - } - } - - -**Querqy 4** - -.. code-block:: xml - :linenos: - - - querqy.solr.contrib.NumberUnitRewriterFactory - number-unit-config.json - - -.. raw:: html - -
- +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + ``PUT /_querqy/rewriter/numberunit`` + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.elasticsearch.rewriter.NumberUnitRewriterFactory", + "config": { + "config": "{ \"numberUnitDefinitions\": [ ... ] }" + } + } + + .. include:: hint-opensearch.txt + + .. group-tab:: Solr + + **Querqy 5** + + | :code:`POST /solr/mycollection/querqy/rewriter/number_unit?action=save` + | :code:`Content-Type: application/json` + + .. code-block:: JSON + :linenos: + + { + "class": "querqy.solr.rewriter.numberunit.NumberUnitRewriterFactory", + "config": { + "config": "{ \"numberUnitDefinitions\": [ ... ] }" + } + } + + + **Querqy 4** + + .. code-block:: xml + :linenos: + + + querqy.solr.contrib.NumberUnitRewriterFactory + number-unit-config.json + + Configuring filter queries ========================== From 21df2e12120b2643eabad3f9c80486fe312afbb2 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Mon, 26 Feb 2024 13:54:40 -0500 Subject: [PATCH 6/8] tough one! --- .../logging-and-debugging-rewriters.rst | 1256 ++++++++--------- 1 file changed, 596 insertions(+), 660 deletions(-) diff --git a/docs/source/querqy/logging-and-debugging-rewriters.rst b/docs/source/querqy/logging-and-debugging-rewriters.rst index fbdfd58..4057301 100644 --- a/docs/source/querqy/logging-and-debugging-rewriters.rst +++ b/docs/source/querqy/logging-and-debugging-rewriters.rst @@ -37,192 +37,181 @@ The Querqy plugin finally produces a Lucene query. This query can be inspected u the standard means of your search engine to inspect the Lucene queries that the search engine produces from its query DSL. -.. rst-class:: elasticsearch - -.. raw:: html - -
- -In Elasticsearch/OpenSearch this means that you can call -``https://://_validate/query?explain=true`` and submit the -usual Querqy search query in the request body. For example: - - -| :code:`GET https://://_validate/query?explain=true` -| :code:`Content-Type: application/json` - - -.. code-block:: JSON - - { - "query":{ - "querqy":{ - "matching_query":{ - "query":"laptop" - }, - "query_fields":[ - "title^23", - "name", - "shortSummary" - ], - "rewriters":[ - "common_rules" - ] - } - } - } - -If the above case had a CommonRulesRewriter ``common_rules`` defined with rules - -.. code-block:: - - laptop => - SYNONYM: notebook - UP(100): AMD - DOWN(50): sleeve - -the output of ``_validate/query?explain=true`` will look like this: - -.. code-block:: JSON - :linenos: - :emphasize-lines: 12 - - { - "_shards":{ - "total":1, - "successful":1, - "failed":0 - }, - "valid":true, - "explanations":[ - { - "index":"myindex", - "valid":true, - "explanation":"+(+(name:notebook | shortSummary:laptop | title:laptop^23.0 | title:notebook^23.0 | shortSummary:notebook | name:laptop) AdditiveBoostFunction(100.0,query(+(name:amd | shortSummary:amd | title:amd^23.0),def=0.0)) AdditiveBoostFunction(-50.0,query(+(shortSummary:sleeve name:sleeve title:sleeve^23.0),def=0.0)))" - } - ] - } - -Line 12 contains the string representation of the parsed Lucene query and you -will probably recognise the notebook / laptop synonyms. It also shows -AdditiveBoostFunction sub-queries. ``AdditiveBoostFunction`` is a custom Lucene -query that is provided by Querqy to deal with UP/DOWN boosting. It especially -avoids producing negative document scores, which are not allowed by Lucene, and -it guarantees that documents that match for both UP(100) and DOWN(100) yield -the same score like documents that match neither UP(100) nor DOWN(100). - - -.. raw:: html - -
- - -.. rst-class:: solr - -.. raw:: html - -
- -In Solr this means that you can see the Lucene query in the parsedquery section -of the debug output that you will get if you append ``debug=true`` or -``debugQuery=true`` to a search request (see `Solr documentation `_). - -For example, the following debug output would be produced for the search request -``/select?defType=querqy&q=notebook&querqy.rewriters=common&qf=name%20cat&debug=true`` - -.. code-block:: JSON - - "debug": { - "parsedquery":"+DisjunctionMaxQuery((cat:notebook | name:notebook | name:laptop | cat:laptop)) FunctionQuery(AdditiveBoostFunction(100.0,query(+(cat:AMD | name:amd),def=0.0))) FunctionQuery(AdditiveBoostFunction(-100.0,query(+(cat:sleeve name:sleeve),def=0.0)))", - "parsedquery_toString":"+(cat:notebook | name:notebook | name:laptop | cat:laptop) AdditiveBoostFunction(100.0,query(+(cat:AMD | name:amd),def=0.0)) AdditiveBoostFunction(-100.0,query(+(cat:sleeve name:sleeve),def=0.0))" - } - - -provided that there was a CommonRulesRewriter rewriter defined for the name -``common`` with the following rules: - -.. code-block:: - - notebook => - SYNONYM: laptop - UP(100): AMD - DOWN(50): sleeve - - -You will probably recognise the notebook / laptop synonyms in the parsed query -representation in the debug output. It also shows AdditiveBoostFunction -sub-queries. ``AdditiveBoostFunction`` is a custom Lucene query that is provided -by Querqy to deal with UP/DOWN boosting. It especially avoids producing negative -document scores, which are not allowed by Lucene, and it guarantees that -documents that match for both UP(100) and DOWN(100) yield the same score like -documents that match neither UP(100) nor DOWN(100). - -.. rst-class:: solr - - -Querqy details in Solr debug mode -================================= - -.. warning:: Note that the additional details that Querqy provides to Solr's - debug output have changed in structure and content with the release of - 'Querqy for Solr' version **5.5.lucene900.0**. - - -Calling a Solr SearchHandler with ``debugQuery=true`` will add -an additional section ``querqy`` to the ``debug`` section in Solr's response, -for example: - -.. code-block:: JSON - :linenos: - :emphasize-lines: 2-4 - - "debug": { - "querqy": { - "parser":"querqy.parser.WhiteSpaceQuerqyParser", - "rewrite":{ - "rewriteChain":[ - { - "rewriterId":"common", - "actions":[ - { - "message":"notebook#0", - "match":{ - "term":"notebook", - "type":"exact" - }, - "instructions":[ - { - "type":"down", - "param":"100", - "value":"sleeve" - }, - { - "type":"up", - "param":"100", - "value":"AMD" - }, - { - "type":"synonym", - "value":"laptop" - } - ] - } - ] +.. tabs:: + + .. group-tab:: Elasticsearch/OpenSearch + + In Elasticsearch/OpenSearch this means that you can call + ``https://://_validate/query?explain=true`` and submit the + usual Querqy search query in the request body. For example: + + + | :code:`GET https://://_validate/query?explain=true` + | :code:`Content-Type: application/json` + + + .. code-block:: JSON + + { + "query":{ + "querqy":{ + "matching_query":{ + "query":"laptop" + }, + "query_fields":[ + "title^23", + "name", + "shortSummary" + ], + "rewriters":[ + "common_rules" + ] + } + } + } + + If the above case had a CommonRulesRewriter ``common_rules`` defined with rules + + .. code-block:: + + laptop => + SYNONYM: notebook + UP(100): AMD + DOWN(50): sleeve + + the output of ``_validate/query?explain=true`` will look like this: + + .. code-block:: JSON + :linenos: + :emphasize-lines: 12 + + { + "_shards":{ + "total":1, + "successful":1, + "failed":0 + }, + "valid":true, + "explanations":[ + { + "index":"myindex", + "valid":true, + "explanation":"+(+(name:notebook | shortSummary:laptop | title:laptop^23.0 | title:notebook^23.0 | shortSummary:notebook | name:laptop) AdditiveBoostFunction(100.0,query(+(name:amd | shortSummary:amd | title:amd^23.0),def=0.0)) AdditiveBoostFunction(-50.0,query(+(shortSummary:sleeve name:sleeve title:sleeve^23.0),def=0.0)))" + } + ] + } + + Line 12 contains the string representation of the parsed Lucene query and you + will probably recognise the notebook / laptop synonyms. It also shows + AdditiveBoostFunction sub-queries. ``AdditiveBoostFunction`` is a custom Lucene + query that is provided by Querqy to deal with UP/DOWN boosting. It especially + avoids producing negative document scores, which are not allowed by Lucene, and + it guarantees that documents that match for both UP(100) and DOWN(100) yield + the same score like documents that match neither UP(100) nor DOWN(100). + + .. group-tab:: Solr + + In Solr this means that you can see the Lucene query in the parsedquery section + of the debug output that you will get if you append ``debug=true`` or + ``debugQuery=true`` to a search request (see `Solr documentation `_). + + For example, the following debug output would be produced for the search request + ``/select?defType=querqy&q=notebook&querqy.rewriters=common&qf=name%20cat&debug=true`` + + .. code-block:: JSON + + "debug": { + "parsedquery":"+DisjunctionMaxQuery((cat:notebook | name:notebook | name:laptop | cat:laptop)) FunctionQuery(AdditiveBoostFunction(100.0,query(+(cat:AMD | name:amd),def=0.0))) FunctionQuery(AdditiveBoostFunction(-100.0,query(+(cat:sleeve name:sleeve),def=0.0)))", + "parsedquery_toString":"+(cat:notebook | name:notebook | name:laptop | cat:laptop) AdditiveBoostFunction(100.0,query(+(cat:AMD | name:amd),def=0.0)) AdditiveBoostFunction(-100.0,query(+(cat:sleeve name:sleeve),def=0.0))" + } + + + provided that there was a CommonRulesRewriter rewriter defined for the name + ``common`` with the following rules: + + .. code-block:: + + notebook => + SYNONYM: laptop + UP(100): AMD + DOWN(50): sleeve + + + You will probably recognise the notebook / laptop synonyms in the parsed query + representation in the debug output. It also shows AdditiveBoostFunction + sub-queries. ``AdditiveBoostFunction`` is a custom Lucene query that is provided + by Querqy to deal with UP/DOWN boosting. It especially avoids producing negative + document scores, which are not allowed by Lucene, and it guarantees that + documents that match for both UP(100) and DOWN(100) yield the same score like + documents that match neither UP(100) nor DOWN(100). + + + .. rubric:: Querqy details in Solr debug mode + + .. warning:: Note that the additional details that Querqy provides to Solr's + debug output have changed in structure and content with the release of + 'Querqy for Solr' version **5.5.lucene900.0**. + + + Calling a Solr SearchHandler with ``debugQuery=true`` will add + an additional section ``querqy`` to the ``debug`` section in Solr's response, + for example: + + .. code-block:: JSON + :linenos: + :emphasize-lines: 2-4 + + "debug": { + "querqy": { + "parser":"querqy.parser.WhiteSpaceQuerqyParser", + "rewrite":{ + "rewriteChain":[ + { + "rewriterId":"common", + "actions":[ + { + "message":"notebook#0", + "match":{ + "term":"notebook", + "type":"exact" + }, + "instructions":[ + { + "type":"down", + "param":"100", + "value":"sleeve" + }, + { + "type":"up", + "param":"100", + "value":"AMD" + }, + { + "type":"synonym", + "value":"laptop" + } + ] + } + ] + } + ] + } } - ] - } - } - } + } + + The output tells you under ``parser`` which + :ref:`query string parser ` Querqy used for processing + the user's query string. + + The section under ``rewrite`` contains information about how the query was + processed in the rewrite chain. The content of this section is the same as the + output that is produced by Querqy's :ref:`Info Logging ` with + parameter ``querqy.rewriteLogging=details`` with the difference that it is added + to the debug section here. + + -The output tells you under ``parser`` which -:ref:`query string parser ` Querqy used for processing -the user's query string. -The section under ``rewrite`` contains information about how the query was -processed in the rewrite chain. The content of this section is the same as the -output that is produced by Querqy's :ref:`Info Logging ` with -parameter ``querqy.rewriteLogging=details`` with the difference that it is added -to the debug section here. @@ -259,16 +248,7 @@ Setting up Info Logging Setting up Info Logging requires two steps: Configuring sinks and enabling the logging per request. - -.. include:: se-section.txt - -.. rst-class:: solr - -.. raw:: html - -
- -.. warning:: Note that the configuration of info logging and the required +.. warning:: With Solr, note that the configuration of info logging and the required parameters have changed in an incompatible way with the release of 'Querqy for Solr' version **5.5.lucene900.0**. The documentation for info logging in older Querqy versions can be found :ref:`here `. @@ -277,9 +257,6 @@ logging per request. call Solr with ``debugQuery=true`` instead of Info Logging, depending on your use case. -.. raw:: html - -
Sinks _____ @@ -288,157 +265,144 @@ To use Info Logging we need a mapping between each rewriter and the sink(s) to which this rewriter should send its log output. This mapping is configured within the rewriter definition: -.. rst-class:: elasticsearch - -.. raw:: html - -
- -``PUT /_querqy/rewriter/common_rules`` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 6-8 - - { - "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - }, - "info_logging": { - "sinks": ["log4j"] - } - } - -.. include:: rewriters/hint-opensearch.txt - - - -As you probably recognise at this stage, the example shows the configuration for -a Common Rules Rewriter. Lines 6-8 are new. They contain the configuration for -Info Logging. The ``sink`` property is a list of named sinks to which this -rewriter should send its log messages. - -In this case, the list contains only one -element, ``log4j``, which is a predefined sink that routes the output to the -Log4j logging framework, which is used in Elasticsearch and Opensearch and which -can be configured further. At the current stage, ``log4j`` is the only available -sink for Info Logging under Elasticsearch/Opensearch and it is not possible -(yet) to provide a custom sink implementation. - -The output in Log4j will look like this (using a file appender): - -.. code-block:: text - :linenos: - - [2021-03-26T13:23:43,006][INFO ][q.e.i.Log4jSink ] [node_s_0]DETAIL[ QUERQY ] {"id":"id-1001","msg":{"common_rules1":[{"APPLIED_RULES":["msg1"]}],"common_rules2":[{"APPLIED_RULES":["msg2"]}]}} - [2021-03-26T13:28:47,454][INFO ][q.e.i.Log4jSink ] [node_s_0]REWRITER_ID[ QUERQY ] {"id":"id-1002","msg":["common_rules"]} - -Let's decompose this. ``DETAIL[ QUERQY ]`` (line 1) and ``REWRITER_ID[ QUERQY ]`` -(line 2) are `Log4j markers -`_ that Querqy -provides and that you can use to `filter log messages `_. -The `DETAIL` and `REWRITER_ID` markers correspond to the output types that you -can set per request and that are described below. They are both children of a -common parent marker `QUERQY`. - -The log message itself is a small JSON document. The ``msg`` element contains -the messages as they were produced by the rewriters with the rewriter IDs -(such as `common_rules1`) as keys and further rewriter-specific information as -values. - -The ``id`` element is an ID that can be passed per request to help trace -requests across nodes (see below). - - -.. raw:: html - -
- - -.. rst-class:: solr +.. tabs:: -.. raw:: html + .. group-tab:: Elasticsearch/OpenSearch -
- -| :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` -| :code:`Content-Type: application/json` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 6-8 - - { - "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - }, - "info_logging": { - "sinks": ["response"] - } - } - - -As you probably recognise at this stage, the example shows the configuration for -a Common Rules Rewriter. Lines 6-8 are new. They contain the configuration for -Info Logging. The ``sink`` property is a list of named sinks to which this -rewriter sends its log messages. - -In this case, the list contains only one element, ``response``, which is a -predefined sink that adds the Info Logging output to the search response -returned by Solr. - -.. _custom_solr_sinks: - -Expert: Predefined and custom sinks in Solr -........................................... - -The ``response`` sink is currently the only predefined sink that comes with -Querqy for Solr. However, you can use your own sink by implementing the -``querqy.infologging.Sink`` interface and making it available by adding the -following configuration to the ``QuerqyRewriterRequestHandler`` in -``solrconfig.xml``: - - -.. code-block:: xml - - - - - customSink1 - my.name.CustomSinkOne - - - customSink2 - my.name.CustomSinkTwo - - - - - -and then add the mappings to the sink(s) in the rewriter configurations: - - | :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` - | :code:`Content-Type: application/json` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 6-8 - - { - "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - }, - "info_logging": { - "sinks": ["response", "customSink1", "customSink2"] - } - } - -As the sink mappings are configured per rewriter, you can decide per rewriter -to which sink you want to send their Info Logging output and even have one sink -per rewriter. + ``PUT /_querqy/rewriter/common_rules`` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 6-8 + + { + "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + }, + "info_logging": { + "sinks": ["log4j"] + } + } + + .. include:: rewriters/hint-opensearch.txt + + + + As you probably recognise at this stage, the example shows the configuration for + a Common Rules Rewriter. Lines 6-8 are new. They contain the configuration for + Info Logging. The ``sink`` property is a list of named sinks to which this + rewriter should send its log messages. + + In this case, the list contains only one + element, ``log4j``, which is a predefined sink that routes the output to the + Log4j logging framework, which is used in Elasticsearch and Opensearch and which + can be configured further. At the current stage, ``log4j`` is the only available + sink for Info Logging under Elasticsearch/Opensearch and it is not possible + (yet) to provide a custom sink implementation. + + The output in Log4j will look like this (using a file appender): + + .. code-block:: text + :linenos: + + [2021-03-26T13:23:43,006][INFO ][q.e.i.Log4jSink ] [node_s_0]DETAIL[ QUERQY ] {"id":"id-1001","msg":{"common_rules1":[{"APPLIED_RULES":["msg1"]}],"common_rules2":[{"APPLIED_RULES":["msg2"]}]}} + [2021-03-26T13:28:47,454][INFO ][q.e.i.Log4jSink ] [node_s_0]REWRITER_ID[ QUERQY ] {"id":"id-1002","msg":["common_rules"]} + + Let's decompose this. ``DETAIL[ QUERQY ]`` (line 1) and ``REWRITER_ID[ QUERQY ]`` + (line 2) are `Log4j markers + `_ that Querqy + provides and that you can use to `filter log messages `_. + The `DETAIL` and `REWRITER_ID` markers correspond to the output types that you + can set per request and that are described below. They are both children of a + common parent marker `QUERQY`. + + The log message itself is a small JSON document. The ``msg`` element contains + the messages as they were produced by the rewriters with the rewriter IDs + (such as `common_rules1`) as keys and further rewriter-specific information as + values. + + The ``id`` element is an ID that can be passed per request to help trace + requests across nodes (see below). + + .. group-tab:: Solr + + | :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` + | :code:`Content-Type: application/json` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 6-8 + + { + "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + }, + "info_logging": { + "sinks": ["response"] + } + } + + + As you probably recognise at this stage, the example shows the configuration for + a Common Rules Rewriter. Lines 6-8 are new. They contain the configuration for + Info Logging. The ``sink`` property is a list of named sinks to which this + rewriter sends its log messages. + + In this case, the list contains only one element, ``response``, which is a + predefined sink that adds the Info Logging output to the search response + returned by Solr. + + .. _custom_solr_sinks: + + .. rubric:: Expert: Predefined and custom sinks in Solr + + The ``response`` sink is currently the only predefined sink that comes with + Querqy for Solr. However, you can use your own sink by implementing the + ``querqy.infologging.Sink`` interface and making it available by adding the + following configuration to the ``QuerqyRewriterRequestHandler`` in + ``solrconfig.xml``: + + + .. code-block:: xml + + + + + customSink1 + my.name.CustomSinkOne + + + customSink2 + my.name.CustomSinkTwo + + + + + + and then add the mappings to the sink(s) in the rewriter configurations: + + | :code:`POST /solr/mycollection/querqy/rewriter/common_rules?action=save` + | :code:`Content-Type: application/json` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 6-8 + + { + "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + }, + "info_logging": { + "sinks": ["response", "customSink1", "customSink2"] + } + } + + As the sink mappings are configured per rewriter, you can decide per rewriter + to which sink you want to send their Info Logging output and even have one sink + per rewriter. .. _logging-per-request: @@ -446,330 +410,302 @@ per rewriter. Enabling info logging per request _________________________________ -.. rst-class:: solr - -.. raw:: html - -
+.. tabs:: -Once you have set up you sinks and mapped rewriters to sinks, you can start -using it. To trigger the rewriters to send logging output to the sinks, you need -to pass the following request parameters to enable the logging per request: + .. group-tab:: Elasticsearch/OpenSearch -querqy.rewriteLogging.rewriters - A comma-separated list of rewriter IDs for which info logging should be - enabled. Use ``querqy.rewriteLogging.rewriters=*`` if you want to enable it - for all rewriters in the rewrite chain. - Note that not all rewriters have implemented info logging. The are also - expected to remain 'silent' if they did not modify the query. - -querqy.rewriteLogging - Values: ``details`` ``rewriter_id`` ``off`` - - Defines the type of the output. - - * ``details``: Gives you all details that the rewriter produces as logging - output. For example, the CommonRulesRewriter will return information about - the rules that were applied and the log message that was configured. - * ``rewriter_id``: Only returns the IDs of the rewriters. - * ``off``: Returns nothing at all. - - Default: ``off`` - -Examples: - -| :code:`GET https://:/solr//select?q=notebook&querqy.rewriters=common&querqy.rewriteLogging.rewriters=common&querqy.rewriteLogging=rewriter_id` - -returns - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4-10 - - { - "responseHeader":{ }, - "response":{ }, - "querqyRewriteLogging":{ - "rewriteChainLogging":[ - { - "rewriterId":"common" - } - ] - } - } - -provided that rewriter 'common' changed the query. - -The same query with detailed output (setting ``querqy.rewriteLogging=details``): - -| :code:`GET https://:/solr//select?q=notebook&querqy.rewriters=common&querqy.rewriteLogging.rewriters=common&querqy.rewriteLogging=details` - - -.. code-block:: JSON - :linenos: - :emphasize-lines: 4-35 - - { - "responseHeader":{}, - "response":{}, - "querqyRewriteLogging":{ - "rewriteChainLogging":[ + Once you have mapped rewriters to sinks, you can start using Info Logging. To + trigger the rewriters to send logging output to the sinks, you need + to enable Info Logging in your search requests: + + :code:`POST /myindex/_search` + + .. code-block:: JSON + :linenos: + :emphasize-lines: 12-15 + { - "actions":[ - { - "message":"notebook#0", - "match":{ - "term":"notebook", - "type":"exact" - }, - "instructions":[ - { - "type":"down", - "param":"100", - "value":"sleeve" - }, - { - "type":"up", - "param":"100", - "value":"AMD" - }, - { - "type":"synonym", - "value":"laptop" + "query": { + "querqy": { + "matching_query": { + "query": "notebook" + }, + "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], + "rewriters": [ + "word_break", + "common_rules" + ], + "info_logging": { + "id":"REQ-ID-0043", + "type": "DETAIL" + } } - ] } - ], - "rewriterId":"common" } - ] - } - } - -The ``actions`` element is specific to the CommonRulesRewriter. It reflects that -the following block in the rule definition was applied and it should be easy to -map the ``instructions`` output with the following rule -definition: - -.. code-block:: - - notebook => - SYNONYM: laptop - UP(100): AMD - DOWN(50): sleeve - -Should more than one such blocks of rules be applied to a query, they would each -occur as their own object in the ``actions`` list of the Info Logging output. - -Besides ``instructions`` output, we also get a ``match`` and a ``message`` -element. ``match`` tells what input triggered the application of rules and how it was -matched. In this case, *notebook* was matched exactly. - -Had we used a wildcard in the rule, the logging output would -still tell us the full matching term and also that the type is *affix* for -the above query: - -.. code-block:: - - note* => - SYNONYM: laptop - ... - -...would thus produce the following output for query *notebook*: - -.. code-block:: JSON - - { - "match":{ - "term":"notebook", - "type":"affix" - } - } - -The ``message`` element of the ``action`` was auto-generated above: - -.. code-block:: JSON - :linenos: - :emphasize-lines: 7 - - { - "querqyRewriteLogging":{ - "rewriteChainLogging":[ + + + Info Logging is controlled by the properties specified under `info_logging` + (lines 12-15). You can set the properties as follows: + + `type` + Values: ``DETAIL`` ``REWRITER_ID`` ``NONE`` + + Controls whether a logging output is generated at all together with the format + of the output. It can take the values: + + * ``DETAIL`` - Logs all details that the rewriter produces as logging + output. + * ``REWRITER_ID`` - Only logs the IDs of the rewriters. + * ``NONE`` - Logs nothing at all. + + Default: ``NONE`` + + `id` + An identifier. This can be used for identifying search requests. For example, + when you use more than one shard, the same search request will be executed on + more than one shard and create a log message on each shard. You can use this + ID to trace and aggregate the messages across shards. It is up to the client + that makes the search request to supply the ID. + + Default: not set + + For examples of the output format for types ``DETAIL`` and ``REWRITER_ID`` see + the Log4j sink output above. It is up to the individual rewriter what log + message the emit for type ``DETAIL``. + + .. group-tab:: Solr + + Once you have set up you sinks and mapped rewriters to sinks, you can start + using it. To trigger the rewriters to send logging output to the sinks, you need + to pass the following request parameters to enable the logging per request: + + querqy.rewriteLogging.rewriters + A comma-separated list of rewriter IDs for which info logging should be + enabled. Use ``querqy.rewriteLogging.rewriters=*`` if you want to enable it + for all rewriters in the rewrite chain. + Note that not all rewriters have implemented info logging. The are also + expected to remain 'silent' if they did not modify the query. + + querqy.rewriteLogging + Values: ``details`` ``rewriter_id`` ``off`` + + Defines the type of the output. + + * ``details``: Gives you all details that the rewriter produces as logging + output. For example, the CommonRulesRewriter will return information about + the rules that were applied and the log message that was configured. + * ``rewriter_id``: Only returns the IDs of the rewriters. + * ``off``: Returns nothing at all. + + Default: ``off`` + + Examples: + + | :code:`GET https://:/solr//select?q=notebook&querqy.rewriters=common&querqy.rewriteLogging.rewriters=common&querqy.rewriteLogging=rewriter_id` + + returns + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4-10 + { - "actions":[ - { - "message":"notebook#0", - "match": {}, - "instructions": [] - } - - ] - } - ] - } - } - - -``"notebook#0"`` is a generated from the input `notebook` and a count of rule -definition blocks. In this case it is the first block in our rule definitions -(the count starts at 0). - - -.. raw:: html - -
- -.. rst-class:: elasticsearch - -.. raw:: html - -
- -Once you have mapped rewriters to sinks, you can start using Info Logging. To -trigger the rewriters to send logging output to the sinks, you need -to enable Info Logging in your search requests: - -:code:`POST /myindex/_search` - -.. code-block:: JSON - :linenos: - :emphasize-lines: 12-15 - - { - "query": { - "querqy": { - "matching_query": { - "query": "notebook" - }, - "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"], - "rewriters": [ - "word_break", - "common_rules" - ], - "info_logging": { - "id":"REQ-ID-0043", - "type": "DETAIL" + "responseHeader":{ }, + "response":{ }, + "querqyRewriteLogging":{ + "rewriteChainLogging":[ + { + "rewriterId":"common" } + ] } - } - } - - -Info Logging is controlled by the properties specified under `info_logging` -(lines 12-15). You can set the properties as follows: - -`type` - Values: ``DETAIL`` ``REWRITER_ID`` ``NONE`` - - Controls whether a logging output is generated at all together with the format - of the output. It can take the values: - - * ``DETAIL`` - Logs all details that the rewriter produces as logging - output. - * ``REWRITER_ID`` - Only logs the IDs of the rewriters. - * ``NONE`` - Logs nothing at all. - - Default: ``NONE`` - -`id` - An identifier. This can be used for identifying search requests. For example, - when you use more than one shard, the same search request will be executed on - more than one shard and create a log message on each shard. You can use this - ID to trace and aggregate the messages across shards. It is up to the client - that makes the search request to supply the ID. - - Default: not set - -For examples of the output format for types ``DETAIL`` and ``REWRITER_ID`` see -the Log4j sink output above. It is up to the individual rewriter what log -message the emit for type ``DETAIL``. - - -.. raw:: html - -
- -.. rst-class:: solr - -.. raw:: html - -
- - -This default log message can be overridden in the rule definitions using the -``_log`` and ``_id_`` properties: - - -.. code-block:: text - :emphasize-lines: 5,11,16 - - notebook => - SYNONYM: laptop - DELETE: cheap - @_id: "ID1" - @_log: "Log message for notebook" - - samusng => - SYNONYM: samsung - @{ - "_id": "ID2", - "_log": "Log message for samusng typo", - } - - 32g => - SYNONYM: 32gb - @_id: "ID3" - -The query 'samusng notebook 32g' will now produce the following log messages -(we're skipping the `instructions` details): - -.. code-block:: JSON - :emphasize-lines: 7,15,23 - - - { - "querqyRewriteLogging":{ - "rewriteChainLogging":[ + } + + provided that rewriter 'common' changed the query. + + The same query with detailed output (setting ``querqy.rewriteLogging=details``): + + | :code:`GET https://:/solr//select?q=notebook&querqy.rewriters=common&querqy.rewriteLogging.rewriters=common&querqy.rewriteLogging=details` + + + .. code-block:: JSON + :linenos: + :emphasize-lines: 4-35 + { - "actions":[ - { - "message":"Log message for samusng typo", - "match":{ - "term":"samusng", - "type":"exact" - }, - "instructions":[ ] - }, - { - "message":"Log message for notebook", - "match":{ - "term":"notebook", - "type":"exact" - }, - "instructions":[ ] - }, - { - "message":"ID3", - "match":{ - "term":"32g", - "type":"exact" - }, - "instructions":[ ] + "responseHeader":{}, + "response":{}, + "querqyRewriteLogging":{ + "rewriteChainLogging":[ + { + "actions":[ + { + "message":"notebook#0", + "match":{ + "term":"notebook", + "type":"exact" + }, + "instructions":[ + { + "type":"down", + "param":"100", + "value":"sleeve" + }, + { + "type":"up", + "param":"100", + "value":"AMD" + }, + { + "type":"synonym", + "value":"laptop" + } + ] + } + ], + "rewriterId":"common" + } + ] + } + } + + The ``actions`` element is specific to the CommonRulesRewriter. It reflects that + the following block in the rule definition was applied and it should be easy to + map the ``instructions`` output with the following rule + definition: + + .. code-block:: + + notebook => + SYNONYM: laptop + UP(100): AMD + DOWN(50): sleeve + + Should more than one such blocks of rules be applied to a query, they would each + occur as their own object in the ``actions`` list of the Info Logging output. + + Besides ``instructions`` output, we also get a ``match`` and a ``message`` + element. ``match`` tells what input triggered the application of rules and how it was + matched. In this case, *notebook* was matched exactly. + + Had we used a wildcard in the rule, the logging output would + still tell us the full matching term and also that the type is *affix* for + the above query: + + .. code-block:: + + note* => + SYNONYM: laptop + ... + + ...would thus produce the following output for query *notebook*: + + .. code-block:: JSON + + { + "match":{ + "term":"notebook", + "type":"affix" } - ], - "rewriterId":"common" } - ] - } - - } - - -As the third block doesn't have a '_log' property, the ``_id`` property (*ID3*) will be -used as the message, and if that didn't exist, we'd fall back to the -`#` scheme that we saw above. - - -.. raw:: html - -
+ + The ``message`` element of the ``action`` was auto-generated above: + + .. code-block:: JSON + :linenos: + :emphasize-lines: 7 + + { + "querqyRewriteLogging":{ + "rewriteChainLogging":[ + { + "actions":[ + { + "message":"notebook#0", + "match": {}, + "instructions": [] + } + + ] + } + ] + } + } + + + ``"notebook#0"`` is a generated from the input `notebook` and a count of rule + definition blocks. In this case it is the first block in our rule definitions + (the count starts at 0). + + This default log message can be overridden in the rule definitions using the + ``_log`` and ``_id_`` properties: + + + .. code-block:: text + :emphasize-lines: 5,11,16 + + notebook => + SYNONYM: laptop + DELETE: cheap + @_id: "ID1" + @_log: "Log message for notebook" + + samusng => + SYNONYM: samsung + @{ + "_id": "ID2", + "_log": "Log message for samusng typo", + } + + 32g => + SYNONYM: 32gb + @_id: "ID3" + + The query 'samusng notebook 32g' will now produce the following log messages + (we're skipping the `instructions` details): + + .. code-block:: JSON + :emphasize-lines: 7,15,23 + + + { + "querqyRewriteLogging":{ + "rewriteChainLogging":[ + { + "actions":[ + { + "message":"Log message for samusng typo", + "match":{ + "term":"samusng", + "type":"exact" + }, + "instructions":[ ] + }, + { + "message":"Log message for notebook", + "match":{ + "term":"notebook", + "type":"exact" + }, + "instructions":[ ] + }, + { + "message":"ID3", + "match":{ + "term":"32g", + "type":"exact" + }, + "instructions":[ ] + } + ], + "rewriterId":"common" + } + ] + } + + } + + + As the third block doesn't have a '_log' property, the ``_id`` property (*ID3*) will be + used as the message, and if that didn't exist, we'd fall back to the + `#` scheme that we saw above. From ab70e0fa13064e2503220b6d3b01d839fd8e2fdc Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Mon, 26 Feb 2024 13:56:07 -0500 Subject: [PATCH 7/8] fix warnings generated --- docs/source/querqy/more-about-queries.rst | 6 ++---- docs/source/querqy/rewriters/word-break.rst | 3 +-- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/docs/source/querqy/more-about-queries.rst b/docs/source/querqy/more-about-queries.rst index 69ab1f0..31eed66 100644 --- a/docs/source/querqy/more-about-queries.rst +++ b/docs/source/querqy/more-about-queries.rst @@ -374,10 +374,8 @@ Reference 'matching_query'. A 'querqy' query allows to control two main types of boosting queries: - #. ``rewritten_queries`` - boost queries that are produced as part of query - rewriting - #. ``phrase_boosts`` - (partial) phrases that are derived from the query - string for boosting documents that contain corresponding phrase matches + #. ``rewritten_queries`` - boost queries that are produced as part of query rewriting + #. ``phrase_boosts`` - (partial) phrases that are derived from the query string for boosting documents that contain corresponding phrase matches Scores from both types of boosting queries will be *added* to the score of the 'matching_query'. diff --git a/docs/source/querqy/rewriters/word-break.rst b/docs/source/querqy/rewriters/word-break.rst index ca088b3..ef6d20c 100644 --- a/docs/source/querqy/rewriters/word-break.rst +++ b/docs/source/querqy/rewriters/word-break.rst @@ -164,8 +164,7 @@ word splits. For example, the word 'action' will not be split into 'act + ion' as long as the 'act' and 'ion' do not co-occur in the dictionaryField of a document. -.. hint:: When using Solr, words provided on the list of ``protectedWords`` will be exempt from -decompounding. +.. hint:: When using Solr, words provided on the list of ``protectedWords`` will be exempt from decompounding. By default, it is assumed that words that together form compound word From 31882b700c0be70ce939b86893541e3bb426103f Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Mon, 26 Feb 2024 14:04:02 -0500 Subject: [PATCH 8/8] redo release notes --- docs/source/querqy/release-notes.rst | 355 ++++++++++++--------------- 1 file changed, 161 insertions(+), 194 deletions(-) diff --git a/docs/source/querqy/release-notes.rst b/docs/source/querqy/release-notes.rst index e7f858b..b2c4434 100644 --- a/docs/source/querqy/release-notes.rst +++ b/docs/source/querqy/release-notes.rst @@ -4,197 +4,164 @@ Release notes ============= -.. include:: se-section.txt - - -.. rst-class:: solr - -.. raw:: html - -
- -.. warning:: Querqy configuration has changed in an incompatible way with - the introduction of Querqy v5 for Solr. Make sure to follow the documentation - for your Querqy version below. See :doc:`here` for - detailed information about changes and a migration guide to :doc:`Querqy 5 for - Solr ` - -.. rst-class:: solr - - -Major changes in Querqy for Solr 5.5.1 -====================================== - -This version re-implements info logging and introduces some **breaking changes** -that will affect you if - -- you are using Info Logging, or -- rely on the debug output format, or -- you are using a custom rewriter implementation - -Please see :ref:`the documentation ` for -details of Info Logging and debugging. - -Notes on migration -------------------- - -To migrate **info logging**: - -- ``solrconfig.xml``: If you have been using only the built-in info logging that adds - log information to the request response, you can just remove all configuration - related to info logging from solrconfig.xml (the ```` - under the Querqy query parser element). If you have been using a custom sink, please - see section :ref:`custom_solr_sinks` for how to configure it in the new - version. -- Add the rewriter-to-sink mapping to the configuration of each rewriter that - you want to log. For example: - - .. code-block:: JSON - :linenos: - :emphasize-lines: 6-8 - - { - "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", - "config": { - "rules" : "notebook =>\nSYNONYM: laptop" - }, - "info_logging": { - "sinks": ["response"] - } - } - - The `response` sink is predefined and adds log information to the Solr - response. If you are using a custom sink, you will have to add its name to the - list of ``sinks`` here. -- To enable Info Logging per request, the parameter ``querqy.infoLogging=on`` is - no longer used. You can instead just use - ``querqy.rewriteLogging.rewriters=*&querqy.rewriteLogging=details``. Please - see the documentation about :ref:`logging-per-request` for the more - fine-grained control over the response format that these parameters provide. -- The format of the logging information that is being added to the Solr response - has changed. The response key has changed from ``querqy.infoLog`` to - ``querqyRewriteLogging`` and the log payload has changed in content and - structure. - -Changes in **debug** output: - -- The debug output (returned for ``debugQuery=true``) is available in the - response under a new key (``debug/querqy/rewrite``) and has changed in - structure and content. - -Changes affecting **custom Rewriter** implementations: - -- The signature of method ``rewrite(2x)`` of the - ``querqy.rewrite.QueryRewriter`` interface has changed to: - - | :code:`RewriterOutput rewrite(ExpandedQuery query, SearchEngineRequestAdapter searchEngineRequestAdapter)` - - This means that the method no longer returns the rewritten ``ExpandedQuery`` - but returns the ExpandedQuery together with the info logging output wrapped - into a ``RewriterOutput`` object. This implies that the info logging - information is no longer passed to the request context via the - SearchEngineRequestAdapter. - - - -Changes in Querqy for Solr 5.4.1 -================================ - -- Bumping jackson-databind and json-smart versions - `(#348) `__. -- Do not rely on system character encoding settings but assure that input stream - bytes are interpreted as UTF-8 `(#346) `__. - - -Major changes in Querqy for Solr 5.4.0 -====================================== - -- The Common Rules Rewriter can now produce multiplicative UP/DOWN boosts - `(#328) `__. - -Changes in Querqy for Solr 5.3.2 -====================================== - -- Improved scoring for new - :code:`multiMatchTie` `(#327) `__. - -Changes in Querqy for Solr 5.3.1 -====================================== - -- Bugfix related using single term synonyms with new - :code:`multiMatchTie` `(#315) `__. - - - -Major changes in Querqy for Solr 5.3.0 -====================================== - -- The Word Break Rewriter now applies language specific morphology also - for compounding `(#282) `__. See - ``morphology`` in the :ref:`Word Break Rewriter ` - configuration. -- You can now configure the path under which rewriter configurations will be - stored in ZooKeeper `(#263) `__. - For more information, see the :ref:`zkDataDirectory ` - property in the Querqy RequestHandler configuration -- Introduce :code:`multiMatchTie` to avoid higher score if document matches more than - one synonym `(#281) `__ (experimental). - -.. raw:: html - -
- - -.. rst-class:: elasticsearch - -.. raw:: html - -
- -Relase notes for Querqy for **OpenSearch** can be found `here `__. - -Querqy for Elasticsearch 1.6es852.0 -==================================== - - - Release for Elasticsearch 8.5.2 - -Querqy for Elasticsearch 1.6es843.0 -==================================== - - - Release for Elasticsearch 8.4.3 - -Querqy for Elasticsearch 1.6es841.0 -==================================== - - - Release for Elasticsearch 8.4.1 - -Querqy for Elasticsearch 1.6es833.0 -==================================== - - - Release for Elasticsearch 8.3.3 - -Querqy for Elasticsearch 1.6es823.0 -==================================== - - - Release for Elasticsearch 8.2.3 - -Querqy for Elasticsearch 1.6es813.0 -==================================== - - - Release for Elasticsearch 8.1.3 - -Querqy for Elasticsearch 1.6es801.0 -==================================== - - - Release for Elasticsearch 8.0.1 - - Adding compound morphology to WordBreakCompoundRewriter `(#22) `__ - - -Querqy for Elasticsearch 1.5es7172.0 -==================================== - - - Release for Elasticsearch 7.17.2 - -.. raw:: html - -
+.. tabs:: + + .. group-tab:: Elasticsearch + .. rubric:: Querqy for Elasticsearch 1.6es852.0 + + - Release for Elasticsearch 8.5.2 + + .. rubric:: Querqy for Elasticsearch 1.6es843.0 + + - Release for Elasticsearch 8.4.3 + + .. rubric:: Querqy for Elasticsearch 1.6es841.0 + + - Release for Elasticsearch 8.4.1 + + .. rubric:: Querqy for Elasticsearch 1.6es833.0 + + - Release for Elasticsearch 8.3.3 + + .. rubric:: Querqy for Elasticsearch 1.6es823.0 + + - Release for Elasticsearch 8.2.3 + + .. rubric:: Querqy for Elasticsearch 1.6es813.0 + + - Release for Elasticsearch 8.1.3 + + .. rubric:: Querqy for Elasticsearch 1.6es801.0 + + - Release for Elasticsearch 8.0.1 + - Adding compound morphology to WordBreakCompoundRewriter `(#22) `__ + + + .. rubric:: Querqy for Elasticsearch 1.5es7172.0 + + - Release for Elasticsearch 7.17.2 + + .. group-tab:: OpenSearch + + Release notes for Querqy for **OpenSearch** can be found `here `__. + + + .. group-tab:: Solr + + .. warning:: Querqy configuration has changed in an incompatible way with + the introduction of Querqy v5 for Solr. Make sure to follow the documentation + for your Querqy version below. See :doc:`here` for + detailed information about changes and a migration guide to :doc:`Querqy 5 for + Solr ` + + + .. rubric:: Major changes in Querqy for Solr 5.5.1 + + This version re-implements info logging and introduces some **breaking changes** + that will affect you if + + - you are using Info Logging, or + - rely on the debug output format, or + - you are using a custom rewriter implementation + + Please see :ref:`the documentation ` for + details of Info Logging and debugging. + + Notes on migration + + To migrate **info logging**: + + - ``solrconfig.xml``: If you have been using only the built-in info logging that adds + log information to the request response, you can just remove all configuration + related to info logging from solrconfig.xml (the ```` + under the Querqy query parser element). If you have been using a custom sink, please + see section :ref:`custom_solr_sinks` for how to configure it in the new + version. + - Add the rewriter-to-sink mapping to the configuration of each rewriter that + you want to log. For example: + + .. code-block:: JSON + :linenos: + :emphasize-lines: 6-8 + + { + "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory", + "config": { + "rules" : "notebook =>\nSYNONYM: laptop" + }, + "info_logging": { + "sinks": ["response"] + } + } + + The `response` sink is predefined and adds log information to the Solr + response. If you are using a custom sink, you will have to add its name to the + list of ``sinks`` here. + - To enable Info Logging per request, the parameter ``querqy.infoLogging=on`` is + no longer used. You can instead just use + ``querqy.rewriteLogging.rewriters=*&querqy.rewriteLogging=details``. Please + see the documentation about :ref:`logging-per-request` for the more + fine-grained control over the response format that these parameters provide. + - The format of the logging information that is being added to the Solr response + has changed. The response key has changed from ``querqy.infoLog`` to + ``querqyRewriteLogging`` and the log payload has changed in content and + structure. + + Changes in **debug** output: + + - The debug output (returned for ``debugQuery=true``) is available in the + response under a new key (``debug/querqy/rewrite``) and has changed in + structure and content. + + Changes affecting **custom Rewriter** implementations: + + - The signature of method ``rewrite(2x)`` of the + ``querqy.rewrite.QueryRewriter`` interface has changed to: + + | :code:`RewriterOutput rewrite(ExpandedQuery query, SearchEngineRequestAdapter searchEngineRequestAdapter)` + + This means that the method no longer returns the rewritten ``ExpandedQuery`` + but returns the ExpandedQuery together with the info logging output wrapped + into a ``RewriterOutput`` object. This implies that the info logging + information is no longer passed to the request context via the + SearchEngineRequestAdapter. + + + + .. rubric:: Changes in Querqy for Solr 5.4.1 + + - Bumping jackson-databind and json-smart versions + `(#348) `__. + - Do not rely on system character encoding settings but assure that input stream + bytes are interpreted as UTF-8 `(#346) `__. + + + .. rubric:: Major changes in Querqy for Solr 5.4.0 + + - The Common Rules Rewriter can now produce multiplicative UP/DOWN boosts + `(#328) `__. + + .. rubric:: Changes in Querqy for Solr 5.3.2 + + - Improved scoring for new + :code:`multiMatchTie` `(#327) `__. + + .. rubric:: Changes in Querqy for Solr 5.3.1 + + - Bugfix related using single term synonyms with new + :code:`multiMatchTie` `(#315) `__. + + + + .. rubric:: Major changes in Querqy for Solr 5.3.0 + + - The Word Break Rewriter now applies language specific morphology also + for compounding `(#282) `__. See + ``morphology`` in the :ref:`Word Break Rewriter ` + configuration. + - You can now configure the path under which rewriter configurations will be + stored in ZooKeeper `(#263) `__. + For more information, see the :ref:`zkDataDirectory ` + property in the Querqy RequestHandler configuration + - Introduce :code:`multiMatchTie` to avoid higher score if document matches more than + one synonym `(#281) `__ (experimental).