V6.0.1
Note: v6.0.0 was only available for a few minutes due of a nuget publishing error. v6.0.1 should be considered the first official v6 release
There are a couple of breaking changes in this release, most of which are due to renaming of types. Some guidance can be found below for how to deal with them.
New features
- Score boosting!
- Score boosting as part of a query -
grand^3
will boost the score of words matching "grand". - Boosting of object fields -
.WithField("Name", c => c.Name, scoreBoost: 1.5D)
. - Boosting object scores based on a freshness date, e.g. the date it was last updated.
- Boosting object scores based on a magnitude value, e.g. a star rating.
- Score boosting as part of a query -
- Custom stemmers
- Characters can now be escaped in LIFTI queries and field names in LIFTI queries can contain spaces.
- Enhanced query execution logic
- Removed dependency on
System.Collections.Immutable
- only the netstandard2 version of the library now pulls in any dependencies. For net6 to net8, only built in types are used.
Performance increases
There was a significant amount of work done to improve performance and memory usage of building an index, index (de)serialization and searching.
All tests were run with Benchmark.NET:
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3007)
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
The results below are a comparison of the previous v5 version of LIFTI against the code in the v6.0.0 branch, running on .NET 8.
Index construction
Populating an index with 200 Wikipedia entries in a single batch
v5 Mean (μs) | v5 Allocated (KB) | v6 Mean (μs) | v6 Allocated (KB) |
---|---|---|---|
1,134.2 | 567,623.8 | 952.6 | 286,617.6 |
Populating each of the 200 Wikipedia entries one at a time (i.e. a new snapshot created after each document)
v5 Mean (μs) | v5 Allocated (KB) | v6 Mean (μs) | v6 Allocated (KB) |
---|---|---|---|
4,284.4 | 1,370,649.9 | 1,212.4 | 613,540.2 |
Searching
Lots of individual optimisations including:
- Merge sorting results during unions and intersections for queries containing more than one part
- Optimised collection of effected results during wildcard and fuzzy match query parts
- Early application of field filters when matching results
- Weighting of query parts to analyse optimal execution order so that documents can be eliminated from collection in other parts of the query.
make for some nice gains for various query types.
Query | v5 Mean (μs) | v5 Allocated (KB) | v6 Mean (μs) | v6 Allocated (KB) |
---|---|---|---|---|
"also has a" | 169.74 | 379.19 | 52.71 | 122.97 |
(confiscation & th*) | "and they" | 1,203.69 | 1,557.29 | 105.23 | 185.02 |
* | 193,333.07 | 103,612.99 | 62,298.80 | 13,152.30 |
?and ?they ?also | 1,725.66 | 1,658.12 | 439.60 | 243.45 |
and | they | 417.70 | 819.98 | 104.23 |
and ~ they | 132.89 | 294.22 | 42.20 | 95.61 |
and ~10> they | 132.64 | 297.67 | 43.34 | 97.04 |
and > they | 214.03 | 455.75 | 106.16 | 169.17 |
and they also | 283.82 | 565.34 | 56.02 | 109.51 |
co*on | 445.27 | 798.77 | 180.04 | 263.47 |
con??* | 2.21 | 2.30 | 1.96 | 1.97 |
confiscation | 4.03 | 2.70 | 3.66 | 2.29 |
th* | 2,277.00 | 2,914.76 | 569.76 | 412.60 |
Title=?great | 416.08 | 399.17 | 108.86 | 34.50 |
Deprecated:
ItemMetadata.Item
/DocumentMetadata.Item
-> use Key
property
IFullTextIndex.Items
-> use Metadata
property
FullTextIndexBuilder.WithDuplicateItemBehavior
-> use WithDuplicateKeyBehavior
method
IndexOptions.DuplicateItemBehavior
-> use DuplicateKeyBehavior
property
ScoredToken.ItemId
-> use DocumentId
property
QueryTokenMatch.ItemId
-> use DocumentId
property
ItemMetadata.Count
-> IndexMetadata.DocumentCount
ItemMetadata.GetMetadata
-> IndexMetadata.GetDocumentMetadata
Technically breaking
IdPool
and IIdPool
are now internal - These weren't really exposed before anyway
Removed interface IItemMetadata
- just using DocumentMetadata
going forwards
QueryContext
no longer has ApplyTo
method
IIndexNavigator
: added Snapshot
property
IIndexNavigator
: added overloads for GetExactMatches
and GetExactAndChildMatches
that allow for the current QueryContext
to be passed in so unnecessary results are not collected.
IIndexNavigator
: new additional methods AddExactMatches
and AddExactAndChildMatches
that allow you to efficiently collect matches using a DocumentMatchCollector
before converting it to an IntermediateQueryResult
.
IQueryPart
now has double CalculateWeighting(Func<IIndexNavigator> navigatorCreator)
method to help the query processing logic evaluate the most efficient order of execution.
TItem
generic type parameter name has been renamed to TObject
.
All query part types are now sealed
New method IIndexNavigator.ExactMatchCount()
IntermediateQueryResult
constructors are no longer public
Index serialization interfaces have been reworked. This shouldn't affect anyone because it was technically impossible to write your own serializers based upon them due to a lack of publicly accessible methods for rehydrating an index.
IIndexNavigatorBookmark
now implements IDisposable
- you don't technically have to dispose it, but doing so will return it to a pool and allow it to be reused.
Querying changes
ScoredFieldMatch
is now quite different and no longer publicly constructable. The only place you would have encountered this is in a custom scorer, and that's no longer necessary.
Several types that are only likely to have been used internally are gone:
FieldMatch
QueryTokenMatch
CompositeTokenMatchLocation
SingleTokenMatchLocation
ITokenLocationMatch
TokenLocationMatch
Breaking
DuplicateItemBehavior
enum -> renamed to DuplicateKeyBehavior
DuplicateItemBehavior.ReplaceItem
-> use DuplicateKeyBehavior.Replace
instead
IQueryContext
-> Just use concrete QueryContext
this affects IQueryPart.Evaluate
as it now takes QueryContext
IIndexNodeFactory.CreateNode
now takes concrete types ChildNodeMap
and DocumentTokenMatchMap
instead of ImmutableDictionary
and ImmutableList
respectively.
A maximum of 31 different object types can now be configured against a single FullTextIndexBuilder
(i.e. 31 distinct calls to WithObjectTokenization
) - if anyone is actually indexing more that 31 object types, I'd be very interested to understand your scenario!
The rest of these will only affect you if you are explicitly referencing the type names in your code:
ItemPhrases
-> renamed to DocumentPhrases
ItemMetadata
-> renamed to DocumentMetadata
IItemStore
-> renamed to IIndexMetadata