diff --git a/en/reference/schema-reference.html b/en/reference/schema-reference.html index ce2731ea15..2ce0ebe693 100644 --- a/en/reference/schema-reference.html +++ b/en/reference/schema-reference.html @@ -151,6 +151,9 @@

Elements

filter-threshold rank rank-type + weakand + stopword-limit + adjust-target constant onnx-model stemming @@ -1692,6 +1695,17 @@

rank-profile

Zero or more The rank-type of a field in this profile. +weakand + Zero or one + +

+ Tunes the weakAnd algorithm to automatically + exclude terms and documents with expected low query significance based on document frequency + statistics present in the document corpus. This makes matching faster at the cost of potentially + reduced recall. +

+ + @@ -3944,6 +3958,101 @@

rank-type

+

weakand

+

+ Contained in rank-profile. +

+

+ Tunes the weakAnd algorithm to automatically + exclude terms and documents with expected low query significance based on document frequency + statistics present in the document corpus. This makes matching faster at the cost of potentially + reduced recall. +

+
+weakand {
+    [body]
+}
+
+

+ Note that all document frequency calculations are done using content node-local document + statistics (i.e. global significance + does not have an effect). This means results may differ across different content nodes and/or + content node groups. +

+

+The body of a weakand statement consists of: +

+ + + + + + + + + + + + + + + + + + + + +
PropertyOccurrenceDescription
stopword-limitZero to one +

+ A number in the range [0, 1]. + Represents the maximum normalized document frequency a query term can have in the + corpus (i.e. the ratio of all documents where the term occurs at least once) before + it's considered a stopword and dropped entirely from being a part of the + weakAnd evaluation. This makes matching faster at the cost of + producing more hits. Dropped terms are not exposed as part of ranking. +

+

+ Example: +

stopword-limit: 0.60
+ This will drop all query terms that occur in at least 60% of the documents. +

+

+ Using stopword-limit is similar to explicitly removing stopwords + from the query up front, but has the benefit of dynamically adapting to the + actual document corpus and not having to know—or specify—a set of stopwords. +

+
adjust-targetZero to one +

+ A number in the range [0, 1] representing normalized document frequency. + Used to derive a per-query document score threshold, where documents scoring + lower than the threshold will not be considered as potential hits from the + weakAnd operator. The score threshold is selected to be equal to + that of the query term whose document frequency is closest to the + configured adjust-target value. +

+

+ This can be used to efficiently exclude documents that only match terms that + occur very frequently in the document corpus. Such terms are likely to be stopwords + that have low semantic value for the query, and excluding documents only containing + them is likely to only have a minor impact on recall. +

+

+ This makes overall matching faster by reducing the number of hits produced by + the weakAnd operator. +

+

+ Example: +

adjust-target: 0.01
+ This excludes documents that only have terms that occur in more than approximately 1% + of the document corpus. The actual threshold is query-specific and based on the query + term score whose document frequency is closest to 1%. +

+

+ adjust-target can be used together with stopword-limit + to efficiently prune both terms and documents with low significance when processing queries. +

+
+

summary-to

@@ -3961,7 +4070,6 @@

summary-to

-

summary

Contained in field or