Skip to content

Commit

Permalink
Merge pull request #43 from mikegoatly/v3.0.0
Browse files Browse the repository at this point in the history
V3.0.0
  • Loading branch information
mikegoatly authored Feb 8, 2022
2 parents f0a4c75 + abe3550 commit 0ae4424
Show file tree
Hide file tree
Showing 94 changed files with 4,279 additions and 1,338 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -330,3 +330,5 @@ ASALocalRun/

# MFractors (Xamarin productivity tool) working folder
.mfractor/

.hugo_build.lock
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ A lightweight full text indexer for .NET

[Read the documentation](https://mikegoatly.github.io/lifti/docs) - there's lots of useful information and examples there.

[Check out some sample code](https://github.com/mikegoatly/lifti/tree/master/samples/TestConsole) - the repo contains examples that can be run as a console application.

## Try it out!

[Use LIFTI in a Blazor app](https://mikegoatly.github.io/lifti/blazor-sample) - try out various queries against Wikipedia content
Expand Down
12 changes: 8 additions & 4 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ trigger:
- test

variables:
majorVersion: 2
minorVersion: 1
patchVersion: 1
majorVersion: 3
minorVersion: 0
patchVersion: 0
project: src/Lifti.Core/Lifti.Core.csproj
testProject: test/Lifti.Tests/Lifti.Tests.csproj
buildConfiguration: 'Release'
Expand All @@ -25,7 +25,11 @@ stages:
pool:
vmImage: 'windows-latest'

steps:
steps:
- task: UseDotNet@2
inputs:
packageType: 'sdk'
version: '6.0.x'
- task: DotNetCoreCLI@2
displayName: "NuGet Restore"
inputs:
Expand Down
36 changes: 36 additions & 0 deletions docs/content/en/docs/Index construction/WithQueryParser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: "WithQueryParser"
linkTitle: "WithQueryParser"
weight: 5
description: >
Prescribes how the QueryParser implementation should be configured for the index.
---

## Providing a complete `IQueryParser` implementation

`FullTextIndexBuilder<TKey> WithQueryParser(IQueryParser queryParser)`

Allows you to provide your own implementation of `IQueryParser` capable of parsing text into an `IQuery`.

## Configuring the default LIFTI `QueryParser`

`FullTextIndexBuilder<TKey> WithQueryParser(Func<QueryParserBuilder, QueryParserBuilder> optionsBuilder)`

By default LIFTI parses query text using the [LIFTI query syntax](../searching). The behavior of the parser can
be tweaked using this overload.

`QueryParserBuilder.AssumeFuzzySearchTerms()`
When used, uses fuzzy matching for any parsed search terms that don't contain
wildcard operators, i.e. you don't need to prefix search terms with `?`.

`QueryParserBuilder.WithQueryParserFactory(Func<QueryParserOptions, IQueryParser>)`
Given a `QueryParserOptions`, creates the implementation of `IQueryParser`. You can use this to provide a
custom query parsing strategy.

### Example usage

``` csharp
var index = new FullTextIndexBuilder<int>()
.WithQueryParser(o => o.AssumeFuzzySearch())
.Build();
```
39 changes: 37 additions & 2 deletions docs/content/en/docs/Scoring/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,41 @@ description: >
How does LIFTI score results?
---

LIFTI uses a version of the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm to score search results. At the simplest level this means that search results will come back ordered by relevance.
LIFTI uses a version of the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm to score search results. At the simplest level this means that search results will come back ordered by relevance. Fuzzy matching affects the
scores for search results depending on the distance between the target word and the search term.

Once nice feature of LIFTI is that in you also get each field scored independently. The overall score for a document is just a sum of these, but you could easily just re-order the results by one field over another should you so wish.
Once nice feature of LIFTI is that in you also get each field scored independently. The overall score for a document is just a sum of these, but you could easily just re-order the results by one field over another should you so wish.

`OrderByField` is a convenience method that can re-order results by a single field:

``` csharp
var index = new FullTextIndexBuilder<int>()
.WithObjectTokenization<Customer>(o => o
.WithKey(c => c.Id)
.WithField("Name", c => c.Name)
.WithField("Profile", c => c.ProfileHtml, textExtractor: new XmlTextExtractor())
)
.Build();

await index.AddAsync(new Customer { Id = 1, Name = "Joe Bloggs", ProfileHtml = "<a>Something else something</a>" });
await index.AddAsync(new Customer { Id = 2, Name = "Joe Something", ProfileHtml = "<a>Something else</a>" });

// Searching for "Something" will result in ID 2 being ordered before ID 1.
// "Something" appears twice in each document overall, however document 2 has fewer words, therefore the matches
// are more statistically significant.
var results = index.Search("something");
PrintSearchResults(results);

// Output
// 2
// 1
// But if you only consider the "Profile" field, then the Something only appears once in document 2,
// therefore document 1 will come first.
results = results.OrderByField("Profile");
PrintSearchResults(results);

// Output
// 1
// 2
```
35 changes: 34 additions & 1 deletion docs/content/en/docs/Searching/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ description: >
Example|Meaning
-|-
West|**West** must appear in the text.
?Wst|Words that [fuzzy match](#fuzzy-matching) with **wst** must appear in the text.
title=West|**West** must appear in the ***title*** field of an indexed object.
doc*|Words that starts with **doc**ument must appear in the text.
doc*|Words that starts with **doc**ument must appear in the text. [See wildcard matching](#wildcard-matching)
%%ing|Words that starts with any two letters and end with **ing**, e.g. *doing*. [See wildcard matching](#wildcard-matching)
west&nbsp;&&nbsp;wing|The words **west** and **wing** must appear in the text.
west&nbsp;wing|The words **west** and **wing** must appear in the text - the default operator is & if none is specified between search words.
west&nbsp;\|&nbsp;wing|The words **west** or **wing** must appear in the text.
Expand All @@ -31,6 +33,37 @@ Example|Meaning
-|-
"west wing" ~ "oval office"|**West wing** must appear near **Oval Office**

### Fuzzy matching

LIFTI uses [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) to perform fuzzy matches between a search term and tokens in the index.
The distance between two words is the number of edits that are required to match them, including:

* insertions: fid would match fi**n**d
* deletions: foood would match food
* substitutions: frnd would match f**i**nd
* transpositions: fnid would match f**in**d - Transpositions are a special case, because although two characters are affected, it is considered a single edit.

The resulting Levenshtein distance between any matched term and the search term is used to reduce the score of the match. This means that documents containing
words that are closer matches will typically be surfaced higher up in the search results.

To prevent a [combinatorial explosion](https://en.wikipedia.org/wiki/Combinatorial_explosion) of potential matches, LIFTI currently limits the maximum number
of allowed edits to 3, and sequential edits to 1. This means that as of now:

* **feed** will *not* match **food** because it requires two sequential edits
* **redy** will *not* match **friendly** because it requires 4 insertions

### Defaulting search terms to fuzzy matching

By default LIFTI will treat a search term as an exact match, however [you can configure the index](../index-construction/withqueryparser/#configuring-the-default-lifti-queryparser) so that any search term (apart from those containing wildcards)
will be treated as a fuzzy match.

### Wildcard matching

Any search term containing `*` or `%` will be considered a wildcard match, where:

* `*` matches zero or more characters
* `%` matches any single character

## Query Operators

### Basic word matches
Expand Down
4 changes: 4 additions & 0 deletions docs/content/en/docs/Searching/search-results.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ description:
Results from `FullTextIndex<T>.Search` are an enumeration of `SearchResult<T>`
---

## Search result order

Search results are returned sorted according to the total document score, in descending order. See [scoring](../../scoring) for more information.

## SearchResult&lt;T&gt;

### T Item { get; }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ description: >
file whenever changes are made to it.
---

First you will need to make sure that the index is deserialized before use, as demonstrated [here](..), and add an [index modification hook](../Index%20construction/WithIndexModificationAction) to serialize the index whenever a new snapshot is created.
First you will need to make sure that the index is deserialized before use, as demonstrated [here](..), and add an [index modification hook](../../index-construction/withindexmodificationaction) to serialize the index whenever a new snapshot is created.
2 changes: 1 addition & 1 deletion docs/content/en/docs/faq/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ Yes, this is managed by two mechanisms:

### Can I automatically serialize an index when it changes?

Yes, you need to add a hook to `FullTextIndexBuilder<TKey>.WithIndexModificationAction`. There's an [example here](../Index%20construction/WithIndexModificationAction).
Yes, you need to add a hook to `FullTextIndexBuilder<TKey>.WithIndexModificationAction`. There's an [example here](../index-construction/withindexmodificationaction).
Loading

0 comments on commit 0ae4424

Please sign in to comment.