Skip to content

Releases: mikegoatly/lifti

v6.3.0

03 Mar 19:28
Compare
Choose a tag to compare

#14 - Added support for fluent query building, for example:

index.Query()
    .ExactMatch("west")
    .And.ExactMatch("wing")
    .Execute()

Full fluent querying documentation is here.

Full Changelog: v6.2.0...v6.3.0

V6.2.0

01 Mar 23:24
Compare
Choose a tag to compare

Very minor update. I realised while writing some more detailed documentation to explain query plans that the execution plan node kind CompositePositionalIntersect was technically identical to PositionalIntersect. As such, I've obsoleted it for removal in the next major version.

V6.1.0

23 Feb 16:14
16dc5af
Compare
Choose a tag to compare

Adds support for obtaining query execution plans for queries (#110)

The Blazor demo application demonstrates the query execution plans generated for queries executed against it:

image

Technically breaking

Although very unlikely to cause an issue (if they do, please let me know):

New method overloads on IFullTextIndex/FullTextIndex:

Search(IQuery! query, QueryExecutionOptions options = QueryExecutionOptions.None)
Search(string! searchText, QueryExecutionOptions options = QueryExecutionOptions.None)

New method on ISearchResults:
ISearchResults.GetExecutionPlan() -> Lifti.QueryExecutionPlan

V6.0.1

16 Jan 23:07
Compare
Choose a tag to compare

Note: v6.0.0 was only available for a few minutes due of a nuget publishing error. v6.0.1 should be considered the first official v6 release

There are a couple of breaking changes in this release, most of which are due to renaming of types. Some guidance can be found below for how to deal with them.

New features

Performance increases

There was a significant amount of work done to improve performance and memory usage of building an index, index (de)serialization and searching.

All tests were run with Benchmark.NET:
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3007)
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
The results below are a comparison of the previous v5 version of LIFTI against the code in the v6.0.0 branch, running on .NET 8.

Index construction

Populating an index with 200 Wikipedia entries in a single batch

v5 Mean (μs) v5 Allocated (KB) v6 Mean (μs) v6 Allocated (KB)
1,134.2 567,623.8 952.6 286,617.6

Populating each of the 200 Wikipedia entries one at a time (i.e. a new snapshot created after each document)

v5 Mean (μs) v5 Allocated (KB) v6 Mean (μs) v6 Allocated (KB)
4,284.4 1,370,649.9 1,212.4 613,540.2

Searching

Lots of individual optimisations including:

  • Merge sorting results during unions and intersections for queries containing more than one part
  • Optimised collection of effected results during wildcard and fuzzy match query parts
  • Early application of field filters when matching results
  • Weighting of query parts to analyse optimal execution order so that documents can be eliminated from collection in other parts of the query.

make for some nice gains for various query types.

Query v5 Mean (μs) v5 Allocated (KB) v6 Mean (μs) v6 Allocated (KB)
"also has a" 169.74 379.19 52.71 122.97
(confiscation & th*) | "and they" 1,203.69 1,557.29 105.23 185.02
* 193,333.07 103,612.99 62,298.80 13,152.30
?and ?they ?also 1,725.66 1,658.12 439.60 243.45
and they 417.70 819.98 104.23
and ~ they 132.89 294.22 42.20 95.61
and ~10> they 132.64 297.67 43.34 97.04
and > they 214.03 455.75 106.16 169.17
and they also 283.82 565.34 56.02 109.51
co*on 445.27 798.77 180.04 263.47
con??* 2.21 2.30 1.96 1.97
confiscation 4.03 2.70 3.66 2.29
th* 2,277.00 2,914.76 569.76 412.60
Title=?great 416.08 399.17 108.86 34.50

Deprecated:

ItemMetadata.Item/DocumentMetadata.Item -> use Key property
IFullTextIndex.Items -> use Metadata property
FullTextIndexBuilder.WithDuplicateItemBehavior -> use WithDuplicateKeyBehavior method
IndexOptions.DuplicateItemBehavior -> use DuplicateKeyBehavior property
ScoredToken.ItemId -> use DocumentId property
QueryTokenMatch.ItemId -> use DocumentId property
ItemMetadata.Count -> IndexMetadata.DocumentCount
ItemMetadata.GetMetadata -> IndexMetadata.GetDocumentMetadata

Technically breaking

IdPool and IIdPool are now internal - These weren't really exposed before anyway
Removed interface IItemMetadata - just using DocumentMetadata going forwards
QueryContext no longer has ApplyTo method
IIndexNavigator: added Snapshot property
IIndexNavigator: added overloads for GetExactMatches and GetExactAndChildMatches that allow for the current QueryContext to be passed in so unnecessary results are not collected.
IIndexNavigator: new additional methods AddExactMatches and AddExactAndChildMatches that allow you to efficiently collect matches using a DocumentMatchCollector before converting it to an IntermediateQueryResult.
IQueryPart now has double CalculateWeighting(Func<IIndexNavigator> navigatorCreator) method to help the query processing logic evaluate the most efficient order of execution.
TItem generic type parameter name has been renamed to TObject.
All query part types are now sealed
New method IIndexNavigator.ExactMatchCount()
IntermediateQueryResult constructors are no longer public
Index serialization interfaces have been reworked. This shouldn't affect anyone because it was technically impossible to write your own serializers based upon them due to a lack of publicly accessible methods for rehydrating an index.
IIndexNavigatorBookmark now implements IDisposable - you don't technically have to dispose it, but doing so will return it to a pool and allow it to be reused.

Querying changes

ScoredFieldMatch is now quite different and no longer publicly constructable. The only place you would have encountered this is in a custom scorer, and that's no longer necessary.

Several types that are only likely to have been used internally are gone:

  • FieldMatch
  • QueryTokenMatch
  • CompositeTokenMatchLocation
  • SingleTokenMatchLocation
  • ITokenLocationMatch
  • TokenLocationMatch

Breaking

DuplicateItemBehavior enum -> renamed to DuplicateKeyBehavior
DuplicateItemBehavior.ReplaceItem -> use DuplicateKeyBehavior.Replace instead
IQueryContext -> Just use concrete QueryContext this affects IQueryPart.Evaluate as it now takes QueryContext
IIndexNodeFactory.CreateNode now takes concrete types ChildNodeMap and DocumentTokenMatchMap instead of ImmutableDictionary and ImmutableList respectively.
A maximum of 31 different object types can now be configured against a single FullTextIndexBuilder (i.e. 31 distinct calls to WithObjectTokenization) - if anyone is actually indexing more that 31 object types, I'd be very interested to understand your scenario!

The rest of these will only affect you if you are explicitly referencing the type names in your code:

ItemPhrases -> renamed to DocumentPhrases
ItemMetadata -> renamed to DocumentMetadata
IItemStore -> renamed to IIndexMetadata

v5.0.0

05 Jul 20:07
0ddbcf7
Compare
Choose a tag to compare

New features in v5.0.0

  • Dynamic fields
  • More detailed field information
  • Smaller binary serialized files

Acknowledgements

Thanks to @kampilan and @h0lg for their thoughts on the design for dynamic fields!

Dynamic fields

v5.0.0 introduces support for dynamic fields, where fields are dynamically registered with the index as it is populated:

var index = new FullTextIndexBuilder<int>()
    .WithObjectTokenization<Customer>(o => o
        .WithKey(c => c.Id)
        .WithDynamicFields("Tags", c => c.TagDictionary, "Tag_")
        .WithDynamicFields(
            "Questions", 
            c => c.Questions, 
            q => q.QuestionName, 
            q => q.QuestionResponse, 
            "Question_")
    )
    .Build();

Indexing this object against the index:

new Customer 
{
    Tags = new Dictionary<string, string>
    {
        { "Foo", "Some text here" }
    },
    Questions = new List<Question>
    {
        new Question { QuestionName = "FavoriteColor", QuestionResponse = "My favorite color is blue" }
    }
}

Will cause two fields to be registered with text:

Tag_Foo -> "Some text here"
Question_FavouriteColor -> "My favorite color is blue"

More detailed field information

The FieldLookup property of an index now provides additional information about fields.

Smaller binary serialized files

The binary serializer has been rewritten to support dynamic fields. In addition to this it will now write integers in a variable length encoding, using a few bytes as possible. When serialized using this new approach, indexes will be about 30-50% of the size when the old serializer was used.

Old serialized versions of the index can still be read, as long as the index builder definition remains unchanged.

Breaking Changes

None of these should affect you unless you're doing something really off-the-wall and unexpected.

IIndexedFieldLookup has new methods on it, IsKnownField and AllFieldNames.
IndexedFieldDetails has changed from being a struct to an abstract class and no longer implements IEquatable<IndexedFieldDetails>.
The IndexedFieldLookup class is now internal.

v4.0.1

02 Jan 16:42
Compare
Choose a tag to compare

Fixed a bug in the optimised positional intersection logic #65

v4.0.0

22 Dec 09:39
Compare
Choose a tag to compare

New features:

Phrase extraction from search result #57

Use the new CreateMatchPhrasesAsync methods on search results returned by the index to produce the set of matched phrases by combining them with the original source text:

foreach (var result in await results.CreateMatchPhrasesAsync(i => books.First(x => x.BookId == i)))
{
    Console.WriteLine($"{result.SearchResult.Key} ({result.SearchResult.Score})");

    foreach (var fieldPhrase in result.FieldPhrases)
    {
        Console.Write($"  {fieldPhrase.FoundIn}: ");
        Console.WriteLine(string.Join(", ", fieldPhrase.Phrases.Select(x => $"\"{x}\"")));
    }
}

Index thesaurus #63

Define synonym, hyponym and hypernym relationships between words, so that searches can be performed against words that were not in the original source text.

var index = new FullTextIndexBuilder<int>()
    .WithDefaultThesaurus(o => o
        .WithSynonyms("big", "large")
        .WithHyponyms("dog", "poodle", "beagle"))
    .Build();

Ignoring characters during tokenization #59

Configures the tokenizer to ignore certain characters as it is parsing input.

var index = new FullTextIndexBuilder<int>()
    .WithDefaultTokenization(o =>o
        .IgnoreCharacters('<', '>')
    )
    .Build();

Performance improvements

Enforcing the order of matched token locations while processing queries has allowed a couple of optimisations when merging the results of some query parts. You will notice a small perf bump when using operators that require positional matching of words, e.g. sequential words "word1 word2", preceding words word1 ~> word2 and near words word1 ~ word2.

And a new logo!

It looks a bit more professional to have a logo when looking for LIFTI in nuget, so here it is:

android-48x48

Multi-targeted platforms

From the v4 release, the LIFTI package will multi-target different platforms:

image

Behavioral changes

For fuzzy matching queries, maxEditDistance is now defaulted to termLength / 3 (it was termLength / 2). This provides better matches out-of-the box.

Breaking changes

Most of these shouldn't affect the common usage patterns for LIFTI if you're just using out-of-the-box features. The ones to watch for are the change of return type from IFullTextIndex.Search and change to the IQueryParser interface if you're implementing your own query parser.

Async methods

All async methods, can now be passed an optional CancellationToken. This has primarily had an effect on the IFullTextIndex interface.

Overloads have been introduced where async delegates can optionally be provided with the CancellationToken during index building:

  • Reading field text asynchronously
  • Index modification actions (WithIndexModificationAction)

ITokenizer

  • ITokenizer renamed to IIndexTokenizer to improve differentiation between tokenization for indexes and queries
  • New method: IIndexTokenizer: IsSplitCharacter(char character)
  • Return type of Process changed from IReadOnlyList<Token> to IReadOnlyCollection<Token>

IFullTextIndex

  • IFullTextIndex implements new interface IIndexTokenizerProvider
  • Search methods return a new interface ISearchResults<T>. The interface implements IEnumerablethough, so impact should be limited. This allows for the newCreateMatchPhrases` method.
  • Added property IThesaurus DefaultThesaurus to expose the default thesaurus for the index.
  • BeginBatchChange and CommitBatchChangeAsync were implemented by FullTextIndex - they've been added to the interface for parity.

IQueryParser

IQueryParser.Parse signature changed from:

IQuery Parse(IIndexedFieldLookup fieldLookup, string queryText, ITokenizer tokenizer)

to

IQuery Parse(IIndexedFieldLookup fieldLookup, string queryText, IIndexTokenizerProvider tokenizerProvider)

This change allows you to access the tokenizers for different fields as well as the default tokenizer for
the index, which is all that was accessible previously. Custom query parsers can be fixed up by using
var tokenizer = tokenizerProvider.DefaultTokenizer to get the default index tokenizer.

IIndexNavigator

  • New method overload IIndexNavigator.Process(string)

v3.5.2

04 Nov 13:58
7758ffa
Compare
Choose a tag to compare

Fixed a bug when indexing similar words in a single batch where the intra-node text breaks multiple times. #54

v3.5.1

20 Oct 22:05
Compare
Choose a tag to compare

After publishing v3.5.0 I also noticed that wildcard query parts also weren't applying field filters to their results - this release fixes that.

v3.5.0

20 Oct 21:34
Compare
Choose a tag to compare
  • Fixed an issue where fuzzy match search results weren't honoring field filters - queries such as title=?great will now only return fuzzy matched results on the required field (in this example, title), as would be expected.
  • Added an extension method IFullTextIndex.ParseQuery as a convenience wrapper around IFullTextIndex.QueryParser.Parse to save passing in the required dependencies from the index itself.
  • Added ToString on Query - this is useful when you want to get a textual representation of the query itself. Previously you had to call Query.Root.ToString(), which wasn't very discoverable.