Releases · mikegoatly/lifti

03 Mar 19:28

v6.3.0

b66b0d4

v6.3.0 Latest

Latest

#14 - Added support for fluent query building, for example:

index.Query()
    .ExactMatch("west")
    .And.ExactMatch("wing")
    .Execute()

Full fluent querying documentation is here.

Full Changelog: v6.2.0...v6.3.0

Assets 2

0 Join discussion

01 Mar 23:24

mikegoatly

v6.2.0

2b9ebb6

V6.2.0

Very minor update. I realised while writing some more detailed documentation to explain query plans that the execution plan node kind CompositePositionalIntersect was technically identical to PositionalIntersect. As such, I've obsoleted it for removal in the next major version.

Assets 2

0 Join discussion

23 Feb 16:14

mikegoatly

v6.1.0

16dc5af

V6.1.0

Adds support for obtaining query execution plans for queries (#110)

The Blazor demo application demonstrates the query execution plans generated for queries executed against it:

Technically breaking

Although very unlikely to cause an issue (if they do, please let me know):

New method overloads on IFullTextIndex/FullTextIndex:

Search(IQuery! query, QueryExecutionOptions options = QueryExecutionOptions.None)
Search(string! searchText, QueryExecutionOptions options = QueryExecutionOptions.None)

New method on ISearchResults:
ISearchResults.GetExecutionPlan() -> Lifti.QueryExecutionPlan

Assets 2

0 Join discussion

16 Jan 23:07

mikegoatly

v6.0.1

eca9c19

V6.0.1

Note: v6.0.0 was only available for a few minutes due of a nuget publishing error. v6.0.1 should be considered the first official v6 release

There are a couple of breaking changes in this release, most of which are due to renaming of types. Some guidance can be found below for how to deal with them.

New features

Score boosting!
- Score boosting as part of a query - grand^3 will boost the score of words matching "grand".
- Boosting of object fields - .WithField("Name", c => c.Name, scoreBoost: 1.5D).
- Boosting object scores based on a freshness date, e.g. the date it was last updated.
- Boosting object scores based on a magnitude value, e.g. a star rating.
Custom stemmers
Characters can now be escaped in LIFTI queries and field names in LIFTI queries can contain spaces.
Enhanced query execution logic
Removed dependency on System.Collections.Immutable - only the netstandard2 version of the library now pulls in any dependencies. For net6 to net8, only built in types are used.

Performance increases

There was a significant amount of work done to improve performance and memory usage of building an index, index (de)serialization and searching.

All tests were run with Benchmark.NET:
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3007)
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
The results below are a comparison of the previous v5 version of LIFTI against the code in the v6.0.0 branch, running on .NET 8.

Index construction

Populating an index with 200 Wikipedia entries in a single batch

v5 Mean (μs)	v5 Allocated (KB)	v6 Mean (μs)	v6 Allocated (KB)
1,134.2	567,623.8	952.6	286,617.6

Populating each of the 200 Wikipedia entries one at a time (i.e. a new snapshot created after each document)

v5 Mean (μs)	v5 Allocated (KB)	v6 Mean (μs)	v6 Allocated (KB)
4,284.4	1,370,649.9	1,212.4	613,540.2

Searching

Lots of individual optimisations including:

Merge sorting results during unions and intersections for queries containing more than one part
Optimised collection of effected results during wildcard and fuzzy match query parts
Early application of field filters when matching results
Weighting of query parts to analyse optimal execution order so that documents can be eliminated from collection in other parts of the query.

make for some nice gains for various query types.

Query	v5 Mean (μs)	v5 Allocated (KB)	v6 Mean (μs)	v6 Allocated (KB)
"also has a"	169.74	379.19	52.71	122.97
(confiscation & th*) \| "and they"	1,203.69	1,557.29	105.23	185.02
*	193,333.07	103,612.99	62,298.80	13,152.30
?and ?they ?also	1,725.66	1,658.12	439.60	243.45
and	they	417.70	819.98	104.23
and ~ they	132.89	294.22	42.20	95.61
and ~10> they	132.64	297.67	43.34	97.04
and > they	214.03	455.75	106.16	169.17
and they also	283.82	565.34	56.02	109.51
co*on	445.27	798.77	180.04	263.47
con??*	2.21	2.30	1.96	1.97
confiscation	4.03	2.70	3.66	2.29
th*	2,277.00	2,914.76	569.76	412.60
Title=?great	416.08	399.17	108.86	34.50

Deprecated:

ItemMetadata.Item/DocumentMetadata.Item -> use Key property
IFullTextIndex.Items -> use Metadata property
FullTextIndexBuilder.WithDuplicateItemBehavior -> use WithDuplicateKeyBehavior method
IndexOptions.DuplicateItemBehavior -> use DuplicateKeyBehavior property
ScoredToken.ItemId -> use DocumentId property
QueryTokenMatch.ItemId -> use DocumentId property
ItemMetadata.Count -> IndexMetadata.DocumentCount
ItemMetadata.GetMetadata -> IndexMetadata.GetDocumentMetadata

Technically breaking

IdPool and IIdPool are now internal - These weren't really exposed before anyway
Removed interface IItemMetadata - just using DocumentMetadata going forwards
QueryContext no longer has ApplyTo method
IIndexNavigator: added Snapshot property
IIndexNavigator: added overloads for GetExactMatches and GetExactAndChildMatches that allow for the current QueryContext to be passed in so unnecessary results are not collected.
IIndexNavigator: new additional methods AddExactMatches and AddExactAndChildMatches that allow you to efficiently collect matches using a DocumentMatchCollector before converting it to an IntermediateQueryResult.
IQueryPart now has double CalculateWeighting(Func<IIndexNavigator> navigatorCreator) method to help the query processing logic evaluate the most efficient order of execution.
TItem generic type parameter name has been renamed to TObject.
All query part types are now sealed
New method IIndexNavigator.ExactMatchCount()
IntermediateQueryResult constructors are no longer public
Index serialization interfaces have been reworked. This shouldn't affect anyone because it was technically impossible to write your own serializers based upon them due to a lack of publicly accessible methods for rehydrating an index.
IIndexNavigatorBookmark now implements IDisposable - you don't technically have to dispose it, but doing so will return it to a pool and allow it to be reused.

Querying changes

ScoredFieldMatch is now quite different and no longer publicly constructable. The only place you would have encountered this is in a custom scorer, and that's no longer necessary.

Several types that are only likely to have been used internally are gone:

FieldMatch
QueryTokenMatch
CompositeTokenMatchLocation
SingleTokenMatchLocation
ITokenLocationMatch
TokenLocationMatch

Breaking

DuplicateItemBehavior enum -> renamed to DuplicateKeyBehavior
DuplicateItemBehavior.ReplaceItem -> use DuplicateKeyBehavior.Replace instead
IQueryContext -> Just use concrete QueryContext this affects IQueryPart.Evaluate as it now takes QueryContext
IIndexNodeFactory.CreateNode now takes concrete types ChildNodeMap and DocumentTokenMatchMap instead of ImmutableDictionary and ImmutableList respectively.
A maximum of 31 different object types can now be configured against a single FullTextIndexBuilder (i.e. 31 distinct calls to WithObjectTokenization) - if anyone is actually indexing more that 31 object types, I'd be very interested to understand your scenario!

The rest of these will only affect you if you are explicitly referencing the type names in your code:

ItemPhrases -> renamed to DocumentPhrases
ItemMetadata -> renamed to DocumentMetadata
IItemStore -> renamed to IIndexMetadata

Assets 2

0 Join discussion

05 Jul 20:07

mikegoatly

v5.0.0

0ddbcf7

v5.0.0

New features in v5.0.0

Dynamic fields
More detailed field information
Smaller binary serialized files

Acknowledgements

Thanks to @kampilan and @h0lg for their thoughts on the design for dynamic fields!

Dynamic fields

v5.0.0 introduces support for dynamic fields, where fields are dynamically registered with the index as it is populated:

var index = new FullTextIndexBuilder<int>()
    .WithObjectTokenization<Customer>(o => o
        .WithKey(c => c.Id)
        .WithDynamicFields("Tags", c => c.TagDictionary, "Tag_")
        .WithDynamicFields(
            "Questions", 
            c => c.Questions, 
            q => q.QuestionName, 
            q => q.QuestionResponse, 
            "Question_")
    )
    .Build();

Indexing this object against the index:

new Customer 
{
    Tags = new Dictionary<string, string>
    {
        { "Foo", "Some text here" }
    },
    Questions = new List<Question>
    {
        new Question { QuestionName = "FavoriteColor", QuestionResponse = "My favorite color is blue" }
    }
}

Will cause two fields to be registered with text:

Tag_Foo -> "Some text here"
Question_FavouriteColor -> "My favorite color is blue"

More detailed field information

The FieldLookup property of an index now provides additional information about fields.

Smaller binary serialized files

The binary serializer has been rewritten to support dynamic fields. In addition to this it will now write integers in a variable length encoding, using a few bytes as possible. When serialized using this new approach, indexes will be about 30-50% of the size when the old serializer was used.

Old serialized versions of the index can still be read, as long as the index builder definition remains unchanged.

Breaking Changes

None of these should affect you unless you're doing something really off-the-wall and unexpected.

IIndexedFieldLookup has new methods on it, IsKnownField and AllFieldNames.
IndexedFieldDetails has changed from being a struct to an abstract class and no longer implements IEquatable<IndexedFieldDetails>.
The IndexedFieldLookup class is now internal.

Contributors

h0lg and kampilan

Assets 2

02 Jan 16:42

mikegoatly

v4.0.1

7db17f3

v4.0.1

Fixed a bug in the optimised positional intersection logic #65

Assets 2

22 Dec 09:39

mikegoatly

v4.0.0

3b471de

v4.0.0

New features:

Phrase extraction from search result #57

Use the new CreateMatchPhrasesAsync methods on search results returned by the index to produce the set of matched phrases by combining them with the original source text:

foreach (var result in await results.CreateMatchPhrasesAsync(i => books.First(x => x.BookId == i)))
{
    Console.WriteLine($"{result.SearchResult.Key} ({result.SearchResult.Score})");

    foreach (var fieldPhrase in result.FieldPhrases)
    {
        Console.Write($"  {fieldPhrase.FoundIn}: ");
        Console.WriteLine(string.Join(", ", fieldPhrase.Phrases.Select(x => $"\"{x}\"")));
    }
}

Index thesaurus #63

Define synonym, hyponym and hypernym relationships between words, so that searches can be performed against words that were not in the original source text.

var index = new FullTextIndexBuilder<int>()
    .WithDefaultThesaurus(o => o
        .WithSynonyms("big", "large")
        .WithHyponyms("dog", "poodle", "beagle"))
    .Build();

Ignoring characters during tokenization #59

Configures the tokenizer to ignore certain characters as it is parsing input.

var index = new FullTextIndexBuilder<int>()
    .WithDefaultTokenization(o =>o
        .IgnoreCharacters('<', '>')
    )
    .Build();

Performance improvements

Enforcing the order of matched token locations while processing queries has allowed a couple of optimisations when merging the results of some query parts. You will notice a small perf bump when using operators that require positional matching of words, e.g. sequential words "word1 word2", preceding words word1 ~> word2 and near words word1 ~ word2.

And a new logo!

It looks a bit more professional to have a logo when looking for LIFTI in nuget, so here it is:

Multi-targeted platforms

From the v4 release, the LIFTI package will multi-target different platforms:

Behavioral changes

For fuzzy matching queries, maxEditDistance is now defaulted to termLength / 3 (it was termLength / 2). This provides better matches out-of-the box.

Breaking changes

Most of these shouldn't affect the common usage patterns for LIFTI if you're just using out-of-the-box features. The ones to watch for are the change of return type from IFullTextIndex.Search and change to the IQueryParser interface if you're implementing your own query parser.

Async methods

All async methods, can now be passed an optional CancellationToken. This has primarily had an effect on the IFullTextIndex interface.

Overloads have been introduced where async delegates can optionally be provided with the CancellationToken during index building:

Reading field text asynchronously
Index modification actions (WithIndexModificationAction)

ITokenizer

ITokenizer renamed to IIndexTokenizer to improve differentiation between tokenization for indexes and queries
New method: IIndexTokenizer: IsSplitCharacter(char character)
Return type of Process changed from IReadOnlyList<Token> to IReadOnlyCollection<Token>

IFullTextIndex

IFullTextIndex implements new interface IIndexTokenizerProvider
Search methods return a new interface ISearchResults<T>. The interface implements IEnumerablethough, so impact should be limited. This allows for the newCreateMatchPhrases` method.
Added property IThesaurus DefaultThesaurus to expose the default thesaurus for the index.
BeginBatchChange and CommitBatchChangeAsync were implemented by FullTextIndex - they've been added to the interface for parity.

IQueryParser

IQueryParser.Parse signature changed from:

IQuery Parse(IIndexedFieldLookup fieldLookup, string queryText, ITokenizer tokenizer)

IQuery Parse(IIndexedFieldLookup fieldLookup, string queryText, IIndexTokenizerProvider tokenizerProvider)

This change allows you to access the tokenizers for different fields as well as the default tokenizer for
the index, which is all that was accessible previously. Custom query parsers can be fixed up by using
var tokenizer = tokenizerProvider.DefaultTokenizer to get the default index tokenizer.

IIndexNavigator

New method overload IIndexNavigator.Process(string)

Assets 2

04 Nov 13:58

mikegoatly

v3.5.2

7758ffa

v3.5.2

Fixed a bug when indexing similar words in a single batch where the intra-node text breaks multiple times. #54

Assets 2

20 Oct 22:05

mikegoatly

v3.5.1

6dff0f2

v3.5.1

After publishing v3.5.0 I also noticed that wildcard query parts also weren't applying field filters to their results - this release fixes that.

Assets 2

20 Oct 21:34

mikegoatly

v3.5.0

8a351c1

v3.5.0

Fixed an issue where fuzzy match search results weren't honoring field filters - queries such as title=?great will now only return fuzzy matched results on the required field (in this example, title), as would be expected.
Added an extension method IFullTextIndex.ParseQuery as a convenience wrapper around IFullTextIndex.QueryParser.Parse to save passing in the required dependencies from the index itself.
Added ToString on Query - this is useful when you want to get a textual representation of the query itself. Previously you had to call Query.Root.ToString(), which wasn't very discoverable.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technically breaking

New features

Performance increases

Index construction

Searching

Deprecated:

Technically breaking

Querying changes

Breaking

New features in v5.0.0

Acknowledgements

Dynamic fields

More detailed field information

Smaller binary serialized files

Breaking Changes

Contributors

New features:

Phrase extraction from search result #57

Index thesaurus #63

Ignoring characters during tokenization #59

Performance improvements

And a new logo!

Multi-targeted platforms

Behavioral changes

Breaking changes

Async methods

ITokenizer

IFullTextIndex

IQueryParser

IIndexNavigator

Releases: mikegoatly/lifti

v6.3.0

V6.2.0

V6.1.0

Technically breaking

V6.0.1

New features

Performance increases

Index construction

Searching

Deprecated:

Technically breaking

Querying changes

Breaking

v5.0.0

New features in v5.0.0

Acknowledgements

Dynamic fields

More detailed field information

Smaller binary serialized files

Breaking Changes

Contributors

v4.0.1

v4.0.0

New features:

Phrase extraction from search result #57

Index thesaurus #63

Ignoring characters during tokenization #59

Performance improvements

And a new logo!

Multi-targeted platforms

Behavioral changes

Breaking changes

Async methods

ITokenizer

IFullTextIndex

IQueryParser

IIndexNavigator

v3.5.2

v3.5.1

v3.5.0