diff --git a/en/reference/schema-reference.html b/en/reference/schema-reference.html index ad42bcc34e..ce2731ea15 100644 --- a/en/reference/schema-reference.html +++ b/en/reference/schema-reference.html @@ -148,6 +148,7 @@

Elements

post-filter-threshold approximate-threshold target-hits-max-adjustment-factor + filter-threshold rank rank-type constant @@ -1659,13 +1660,33 @@

rank-profile

See post-filter-threshold for more details.

- This parameter no effect in streaming search. + This parameter has no effect in streaming search. +

+ + +filter-threshold + Zero or one + +

+ Threshold value (in the range [0.0, 1.0]) deciding when matching in index fields should be treated as filters. + This happens for query terms with estimated hit ratios (in the range [0.0, 1.0]) that are above the filter-threshold. + Use this to optimize query performance when searching large text index fields, + by allowing a per query combination of rank: filter and rank: normal behavior. + This parameter can be overridden per index field, see field-level filter-threshold + for a more detailed description with tradeoffs. +

+

+ In testing with various text datasets (e.g. Wikipedia), a filter-threshold setting of 0.05 has shown to be a good starting point. + +

+

+ This parameter has no effect in streaming search.

rank Zero or more -Specify if the field is used for ranking. +Specify rank settings of a field in this profile. rank-type Zero or more @@ -3807,6 +3828,51 @@

rank

for how to annotate query terms as filters.

+

filter-threshold

+

+ Contained in a rank-profile. + Used to optimize query performance when searching large text index fields, + by allowing a per query combination of rank: filter and rank: normal behavior. + See profile-level filter-threshold for how to use the same value for all index fields. +

+
+rank [field-name] {
+    filter-threshold: 0.05
+}
+
+ + + + + + + +
SettingDescription
filter-threshold +

+ Threshold value (in the range [0.0, 1.0]) deciding when matching in this index field should be treated as a filter. + This happens for query terms with estimated hit ratios (in the range [0.0, 1.0]) that are above the filter-threshold. + Then fast bitvector data structures are used, similar to when the field is set to rank: filter. + This saves CPU and Disk I/O during matching and typically results in faster query evaluation, + with the downside being that only a boolean signal is available for ranking (the document being a match or not). + BM25 handles this by assuming one occurrence of the query term in the document, + and the field length being equal to the average field length. +

+

+ Use this to optimize query performance when searching large text index fields with e.g. + the WeakAND query operator and BM25 ranking. + Query terms that are common in the corpus (e.g. stopwords) are treated as filters with faster matching and simplified ranking, + while other query terms are handled as usual with full ranking. +

+

+ In testing with various text datasets (e.g. Wikipedia), a filter-threshold setting of 0.05 has shown to be a good starting point. + +

+

+ This setting is only relevant for index fields, + and cannot be used in combination with rank: filter. + Has no effect in streaming search. +

+

query-command