diff --git a/en/reference/schema-reference.html b/en/reference/schema-reference.html index ce2731ea15..2ce0ebe693 100644 --- a/en/reference/schema-reference.html +++ b/en/reference/schema-reference.html @@ -151,6 +151,9 @@
+ Tunes the weakAnd algorithm to automatically + exclude terms and documents with expected low query significance based on document frequency + statistics present in the document corpus. This makes matching faster at the cost of potentially + reduced recall. +
++ Contained in rank-profile. +
++ Tunes the weakAnd algorithm to automatically + exclude terms and documents with expected low query significance based on document frequency + statistics present in the document corpus. This makes matching faster at the cost of potentially + reduced recall. +
++weakand { + [body] +} ++
+ Note that all document frequency calculations are done using content node-local document + statistics (i.e. global significance + does not have an effect). This means results may differ across different content nodes and/or + content node groups. +
+
+The body of a weakand
statement consists of:
+
Property | +Occurrence | +Description | +
---|---|---|
stopword-limit | +Zero to one | +
+
+ A number in the range [0, 1].
+ Represents the maximum normalized document frequency a query term can have in the
+ corpus (i.e. the ratio of all documents where the term occurs at least once) before
+ it's considered a stopword and dropped entirely from being a part of the
+ + Example: + stopword-limit: 0.60+ This will drop all query terms that occur in at least 60% of the documents. + +
+ Using |
+
adjust-target | +Zero to one | +
+
+ A number in the range [0, 1] representing normalized document frequency.
+ Used to derive a per-query document score threshold, where documents scoring
+ lower than the threshold will not be considered as potential hits from the
+ + + This can be used to efficiently exclude documents that only match terms that + occur very frequently in the document corpus. Such terms are likely to be stopwords + that have low semantic value for the query, and excluding documents only containing + them is likely to only have a minor impact on recall. + +
+ This makes overall matching faster by reducing the number of hits produced by
+ the + Example: + adjust-target: 0.01+ This excludes documents that only have terms that occur in more than approximately 1% + of the document corpus. The actual threshold is query-specific and based on the query + term score whose document frequency is closest to 1%. + +
+ |
+
@@ -3961,7 +4070,6 @@
Contained in field or