diff --git a/en/reference/schema-reference.html b/en/reference/schema-reference.html index ad42bcc34e..ce2731ea15 100644 --- a/en/reference/schema-reference.html +++ b/en/reference/schema-reference.html @@ -148,6 +148,7 @@
- This parameter no effect in streaming search. + This parameter has no effect in streaming search. +
+ + ++ Threshold value (in the range [0.0, 1.0]) deciding when matching in index fields should be treated as filters. + This happens for query terms with estimated hit ratios (in the range [0.0, 1.0]) that are above the filter-threshold. + Use this to optimize query performance when searching large text index fields, + by allowing a per query combination of rank: filter and rank: normal behavior. + This parameter can be overridden per index field, see field-level filter-threshold + for a more detailed description with tradeoffs. +
++ In testing with various text datasets (e.g. Wikipedia), a filter-threshold setting of 0.05 has shown to be a good starting point. + +
++ This parameter has no effect in streaming search.
+ Contained in a rank-profile. + Used to optimize query performance when searching large text index fields, + by allowing a per query combination of rank: filter and rank: normal behavior. + See profile-level filter-threshold for how to use the same value for all index fields. +
++rank [field-name] { + filter-threshold: 0.05 +} ++
Setting | Description |
---|---|
filter-threshold |
+ + Threshold value (in the range [0.0, 1.0]) deciding when matching in this index field should be treated as a filter. + This happens for query terms with estimated hit ratios (in the range [0.0, 1.0]) that are above the filter-threshold. + Then fast bitvector data structures are used, similar to when the field is set to rank: filter. + This saves CPU and Disk I/O during matching and typically results in faster query evaluation, + with the downside being that only a boolean signal is available for ranking (the document being a match or not). + BM25 handles this by assuming one occurrence of the query term in the document, + and the field length being equal to the average field length. + ++ Use this to optimize query performance when searching large text index fields with e.g. + the WeakAND query operator and BM25 ranking. + Query terms that are common in the corpus (e.g. stopwords) are treated as filters with faster matching and simplified ranking, + while other query terms are handled as usual with full ranking. + ++ In testing with various text datasets (e.g. Wikipedia), a filter-threshold setting of 0.05 has shown to be a good starting point. + + ++ This setting is only relevant for index fields, + and cannot be used in combination with rank: filter. + Has no effect in streaming search. + + |