Skip to content

V6.0.1

Compare
Choose a tag to compare
@mikegoatly mikegoatly released this 16 Jan 23:07
· 20 commits to master since this release

Note: v6.0.0 was only available for a few minutes due of a nuget publishing error. v6.0.1 should be considered the first official v6 release

There are a couple of breaking changes in this release, most of which are due to renaming of types. Some guidance can be found below for how to deal with them.

New features

Performance increases

There was a significant amount of work done to improve performance and memory usage of building an index, index (de)serialization and searching.

All tests were run with Benchmark.NET:
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3007)
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
The results below are a comparison of the previous v5 version of LIFTI against the code in the v6.0.0 branch, running on .NET 8.

Index construction

Populating an index with 200 Wikipedia entries in a single batch

v5 Mean (μs) v5 Allocated (KB) v6 Mean (μs) v6 Allocated (KB)
1,134.2 567,623.8 952.6 286,617.6

Populating each of the 200 Wikipedia entries one at a time (i.e. a new snapshot created after each document)

v5 Mean (μs) v5 Allocated (KB) v6 Mean (μs) v6 Allocated (KB)
4,284.4 1,370,649.9 1,212.4 613,540.2

Searching

Lots of individual optimisations including:

  • Merge sorting results during unions and intersections for queries containing more than one part
  • Optimised collection of effected results during wildcard and fuzzy match query parts
  • Early application of field filters when matching results
  • Weighting of query parts to analyse optimal execution order so that documents can be eliminated from collection in other parts of the query.

make for some nice gains for various query types.

Query v5 Mean (μs) v5 Allocated (KB) v6 Mean (μs) v6 Allocated (KB)
"also has a" 169.74 379.19 52.71 122.97
(confiscation & th*) | "and they" 1,203.69 1,557.29 105.23 185.02
* 193,333.07 103,612.99 62,298.80 13,152.30
?and ?they ?also 1,725.66 1,658.12 439.60 243.45
and they 417.70 819.98 104.23
and ~ they 132.89 294.22 42.20 95.61
and ~10> they 132.64 297.67 43.34 97.04
and > they 214.03 455.75 106.16 169.17
and they also 283.82 565.34 56.02 109.51
co*on 445.27 798.77 180.04 263.47
con??* 2.21 2.30 1.96 1.97
confiscation 4.03 2.70 3.66 2.29
th* 2,277.00 2,914.76 569.76 412.60
Title=?great 416.08 399.17 108.86 34.50

Deprecated:

ItemMetadata.Item/DocumentMetadata.Item -> use Key property
IFullTextIndex.Items -> use Metadata property
FullTextIndexBuilder.WithDuplicateItemBehavior -> use WithDuplicateKeyBehavior method
IndexOptions.DuplicateItemBehavior -> use DuplicateKeyBehavior property
ScoredToken.ItemId -> use DocumentId property
QueryTokenMatch.ItemId -> use DocumentId property
ItemMetadata.Count -> IndexMetadata.DocumentCount
ItemMetadata.GetMetadata -> IndexMetadata.GetDocumentMetadata

Technically breaking

IdPool and IIdPool are now internal - These weren't really exposed before anyway
Removed interface IItemMetadata - just using DocumentMetadata going forwards
QueryContext no longer has ApplyTo method
IIndexNavigator: added Snapshot property
IIndexNavigator: added overloads for GetExactMatches and GetExactAndChildMatches that allow for the current QueryContext to be passed in so unnecessary results are not collected.
IIndexNavigator: new additional methods AddExactMatches and AddExactAndChildMatches that allow you to efficiently collect matches using a DocumentMatchCollector before converting it to an IntermediateQueryResult.
IQueryPart now has double CalculateWeighting(Func<IIndexNavigator> navigatorCreator) method to help the query processing logic evaluate the most efficient order of execution.
TItem generic type parameter name has been renamed to TObject.
All query part types are now sealed
New method IIndexNavigator.ExactMatchCount()
IntermediateQueryResult constructors are no longer public
Index serialization interfaces have been reworked. This shouldn't affect anyone because it was technically impossible to write your own serializers based upon them due to a lack of publicly accessible methods for rehydrating an index.
IIndexNavigatorBookmark now implements IDisposable - you don't technically have to dispose it, but doing so will return it to a pool and allow it to be reused.

Querying changes

ScoredFieldMatch is now quite different and no longer publicly constructable. The only place you would have encountered this is in a custom scorer, and that's no longer necessary.

Several types that are only likely to have been used internally are gone:

  • FieldMatch
  • QueryTokenMatch
  • CompositeTokenMatchLocation
  • SingleTokenMatchLocation
  • ITokenLocationMatch
  • TokenLocationMatch

Breaking

DuplicateItemBehavior enum -> renamed to DuplicateKeyBehavior
DuplicateItemBehavior.ReplaceItem -> use DuplicateKeyBehavior.Replace instead
IQueryContext -> Just use concrete QueryContext this affects IQueryPart.Evaluate as it now takes QueryContext
IIndexNodeFactory.CreateNode now takes concrete types ChildNodeMap and DocumentTokenMatchMap instead of ImmutableDictionary and ImmutableList respectively.
A maximum of 31 different object types can now be configured against a single FullTextIndexBuilder (i.e. 31 distinct calls to WithObjectTokenization) - if anyone is actually indexing more that 31 object types, I'd be very interested to understand your scenario!

The rest of these will only affect you if you are explicitly referencing the type names in your code:

ItemPhrases -> renamed to DocumentPhrases
ItemMetadata -> renamed to DocumentMetadata
IItemStore -> renamed to IIndexMetadata