Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/selectable analyzer #53

Merged
merged 8 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/Managing-Indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fill out the search index form, populating the fields with your custom values.
- Channel Name - the index will only be triggered by web page item creation or modication in the selected website channel
- Indexing Strategy - the indexing strategy specified in code during dependency registration of a custom indexing strategies.
- If you want the default strategy to appear here, register it explicitly in `IServiceCollection.AddKenticoLucene()` method
- Lucene Analyzer - the Lucene analyzer which indexes use to analyze text.
michalJakubis marked this conversation as resolved.
Show resolved Hide resolved

Now, configure the web page paths and content types that the search index depends on by clicking the Add New Path button
or clicking an existing path in the table at the top of the index configuration form.
Expand All @@ -33,4 +34,4 @@ or clicking an existing path in the table at the top of the index configuration

All reusable content item modifications will trigger an event to generate a `IndexEventReusableItemModel` for your custom index strategy class to process, as long as the content item has a language variant matching one of the languages selected for the index. You can use this to index reusable content items in addition to web page items but returning the reusable content item content as a `IIndexEventItemModel` from the strategy `FindItemsToReindex` method.

> Note: There currently no UI to allow administrators to configure which types of reusable content items trigger indexing. This could be added in a future update.
> Note: There is currently no UI to allow administrators to configure which types of reusable content items trigger indexing. This could be added in a future update.
71 changes: 71 additions & 0 deletions docs/Text-analyzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Select Lucene text Analyzer

In the admin UI you are able to select an `Analyzer` which will be used by selected strategy to rebuild an index. By default the only available `Analyzer` is the `StandardAnalyzer`. In order to use a different analyzer you will have to register it, along with the Lucene, to the application services.

```csharp
// Program.cs

// Registers all services and enables custom indexing behavior
services.AddKenticoLucene(builder =>
{
// Register strategies ...
builder.RegisterAnalyzer<CzechAnalyzer>("Czech analyzer");
});
```

Now the `CzechAnalyzer` will be available for selection in the Admin UI under Lucene Analyzer. This analyzer will be used by selected strategy for reindexing. You can retrieve and use this analyzer for index quierying. The analyzer will be available on the instance of the `LuceneIndex` class under `LuceneAnalyzer`.

```csharp
public class SimpleSearchService
{
// Other class members ...
public LuceneSearchResultModel<DancingGoatSearchResultModel> GlobalSearch(
string indexName,
string? searchText,
int pageSize = 20,
int page = 1)
{
var index = luceneIndexManager.GetRequiredIndex(indexName);
var query = GetTermQuery(searchText, index);
// ...
}

private Query GetTermQuery(string? searchText, LuceneIndex index)
{
// Here we retrieve the analyzer instance which we can use for our queries
var analyzer = index.LuceneAnalyzer;
var queryBuilder = new QueryBuilder(analyzer);

var booleanQuery = new BooleanQuery();

if (!string.IsNullOrWhiteSpace(searchText))
{
booleanQuery = AddToTermQuery(booleanQuery, queryBuilder.CreatePhraseQuery(nameof(DancingGoatSearchResultModel.Title), searchText, PHRASE_SLOP), 5);
booleanQuery = AddToTermQuery(booleanQuery, queryBuilder.CreateBooleanQuery(nameof(DancingGoatSearchResultModel.Title), searchText, Occur.SHOULD), 0.5f);

if (booleanQuery.GetClauses().Count() > 0)
{
return booleanQuery;
}
}

return new MatchAllDocsQuery();
}

// Other class members ...
}
```

The `Analyzer` uses a `LuceneVersion` to match version compatibility accross releases of Lucene. By default we use the latest `LuceneVersion.LUCENE_48`. You can override the version when registering application services as follows.

```csharp
// Program.cs

// Registers all services and enables custom indexing behavior
services.AddKenticoLucene(builder =>
{
// Register strategies ...
// Register analyzers
builder.SetAnalyzerLuceneVersion(LuceneVersion.LUCENE_47);
});
```
4 changes: 4 additions & 0 deletions docs/Usage-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ See [Managing search indexes](Managing-Indexes.md)

See [Search index querying](Search-index-querying.md)

## Using Lucene Analyzer

See [Text analyzing](Text-analyzing.md)

## Implementing document decay

You can score indexed items by "freshness" or "recency" using several techniques, each with different tradeoffs.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
using DancingGoat.Search.Services;

using Lucene.Net.Analysis.Cz;

namespace DancingGoat.Search;

public static class DancingGoatSearchStartupExtensions
Expand All @@ -10,6 +12,7 @@ public static IServiceCollection AddKenticoDancingGoatLuceneServices(this IServi
{
builder.RegisterStrategy<AdvancedSearchIndexingStrategy>("DancingGoatExampleStrategy");
builder.RegisterStrategy<SimpleSearchIndexingStrategy>("DancingGoatMinimalExampleStrategy");
builder.RegisterAnalyzer<CzechAnalyzer>("Czech analyzer");
});

services.AddHttpClient<WebCrawlerService>();
Expand Down
13 changes: 7 additions & 6 deletions examples/DancingGoat/Search/Services/AdvancedSearchService.cs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
using Kentico.Xperience.Lucene.Core.Indexing;
using Kentico.Xperience.Lucene.Core.Search;
using Lucene.Net.Analysis.Standard;

using Lucene.Net.Documents;
using Lucene.Net.Facet;
using Lucene.Net.Search;
Expand All @@ -19,8 +19,9 @@ public class AdvancedSearchService

public AdvancedSearchService(
ILuceneSearchService luceneSearchService,
AdvancedSearchIndexingStrategy strategy,
ILuceneIndexManager luceneIndexManager)
ILuceneIndexManager luceneIndexManager,
AdvancedSearchIndexingStrategy strategy
)
{
this.luceneSearchService = luceneSearchService;
this.strategy = strategy;
Expand All @@ -36,7 +37,7 @@ public LuceneSearchResultModel<DancingGoatSearchResultModel> GlobalSearch(
string? sortBy = null)
{
var index = luceneIndexManager.GetRequiredIndex(indexName);
var query = GetTermQuery(searchText);
var query = GetTermQuery(searchText, index);

var combinedQuery = new BooleanQuery
{
Expand Down Expand Up @@ -105,9 +106,9 @@ private static BooleanQuery AddToTermQuery(BooleanQuery query, Query textQueryPa
return query;
}

private static Query GetTermQuery(string? searchText)
private Query GetTermQuery(string? searchText, LuceneIndex index)
{
var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
var analyzer = index.LuceneAnalyzer;
var queryBuilder = new QueryBuilder(analyzer);

if (string.IsNullOrEmpty(searchText))
Expand Down
12 changes: 7 additions & 5 deletions examples/DancingGoat/Search/Services/SimpleSearchService.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
using Kentico.Xperience.Lucene.Core.Indexing;
using Kentico.Xperience.Lucene.Core.Search;

using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Search;
using Lucene.Net.Util;
Expand All @@ -16,7 +15,10 @@ public class SimpleSearchService
private readonly ILuceneSearchService luceneSearchService;
private readonly ILuceneIndexManager luceneIndexManager;

public SimpleSearchService(ILuceneSearchService luceneSearchService, ILuceneIndexManager luceneIndexManager)
public SimpleSearchService(
ILuceneSearchService luceneSearchService,
ILuceneIndexManager luceneIndexManager
)
{
this.luceneSearchService = luceneSearchService;
this.luceneIndexManager = luceneIndexManager;
Expand All @@ -29,7 +31,7 @@ public LuceneSearchResultModel<DancingGoatSearchResultModel> GlobalSearch(
int page = 1)
{
var index = luceneIndexManager.GetRequiredIndex(indexName);
var query = GetTermQuery(searchText);
var query = GetTermQuery(searchText, index);

var result = luceneSearchService.UseSearcher(
index,
Expand Down Expand Up @@ -72,9 +74,9 @@ private static BooleanQuery AddToTermQuery(BooleanQuery query, Query textQueryPa
return query;
}

private static Query GetTermQuery(string? searchText)
private Query GetTermQuery(string? searchText, LuceneIndex index)
{
var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
var analyzer = index.LuceneAnalyzer;
var queryBuilder = new QueryBuilder(analyzer);

var booleanQuery = new BooleanQuery();
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/xperience-administration-search-index-edit-form.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/xperience-administration-search-index-list.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

using Kentico.Xperience.Admin.Base.FormAnnotations;
using Kentico.Xperience.Admin.Base.Forms;
using Kentico.Xperience.Lucene.Admin.Providers;
using Kentico.Xperience.Lucene.Core.Indexing;

namespace Kentico.Xperience.Lucene.Admin;
Expand All @@ -27,6 +28,9 @@ public class LuceneConfigurationModel
[DropDownComponent(Label = "Indexing Strategy", DataProviderType = typeof(IndexingStrategyOptionsProvider), Order = 4)]
public string StrategyName { get; set; } = "";

[DropDownComponent(Label = "Lucene Analyzer", DataProviderType = typeof(AnalyzerOptionsProvider), Order = 5)]
public string AnalyzerName { get; set; } = "";

[TextInputComponent(Label = "Rebuild Hook")]
public string RebuildHook { get; set; } = "";

Expand All @@ -44,6 +48,7 @@ LuceneIndexModel luceneModel
LanguageNames = luceneModel.LanguageNames;
ChannelName = luceneModel.ChannelName;
StrategyName = luceneModel.StrategyName;
AnalyzerName = luceneModel.AnalyzerName;
RebuildHook = luceneModel.RebuildHook;
Paths = luceneModel.Paths;
}
Expand All @@ -55,6 +60,7 @@ public LuceneIndexModel ToLuceneModel() =>
IndexName = IndexName,
LanguageNames = LanguageNames,
ChannelName = ChannelName,
AnalyzerName = AnalyzerName,
StrategyName = StrategyName,
RebuildHook = RebuildHook,
Paths = Paths
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
using Kentico.Xperience.Admin.Base.FormAnnotations;
using Kentico.Xperience.Lucene.Core.Indexing;

namespace Kentico.Xperience.Lucene.Admin.Providers;

internal class AnalyzerOptionsProvider : IDropDownOptionsProvider
{
public Task<IEnumerable<DropDownOptionItem>> GetOptionItems() =>
Task.FromResult(AnalyzerStorage.Analyzers.Keys.Select(x => new DropDownOptionItem()
{
Value = x,
Text = x
}));
}
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ public override async Task ConfigurePage()
.AddColumn(nameof(LuceneIndexItemInfo.LuceneIndexItemIndexName), "Name", sortable: true, searchable: true)
.AddColumn(nameof(LuceneIndexItemInfo.LuceneIndexItemChannelName), "Channel", searchable: true, sortable: true)
.AddColumn(nameof(LuceneIndexItemInfo.LuceneIndexItemStrategyName), "Index Strategy", searchable: true, sortable: true)
.AddColumn(nameof(LuceneIndexItemInfo.LuceneIndexItemAnalyzerName), "Lucene Analyzer", searchable: true, sortable: true)
// Placeholder field which will be replaced with a customized value
.AddColumn(nameof(LuceneIndexItemInfo.LuceneIndexItemId), "Entries", sortable: true)
.AddColumn(nameof(LuceneIndexItemInfo.LuceneIndexItemId), "Last Updated", sortable: true);
Expand Down
25 changes: 25 additions & 0 deletions src/Kentico.Xperience.Lucene.Core/Indexing/AnalyzerStorage.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Util;

namespace Kentico.Xperience.Lucene.Core.Indexing;

internal static class AnalyzerStorage
{
public static Dictionary<string, Type> Analyzers { get; private set; }
public static LuceneVersion AnalyzerLuceneVersion { get; private set; }
static AnalyzerStorage() => Analyzers = [];


public static void SetAnalyzerLuceneVersion(LuceneVersion matchVersion) => AnalyzerLuceneVersion = matchVersion;


public static void AddAnalyzer<TAnalyzer>(string analyzerName) where TAnalyzer : Analyzer
=> Analyzers.Add(analyzerName, typeof(TAnalyzer));


public static Type GetOrDefault(string analyzerName) =>
Analyzers.TryGetValue(analyzerName, out var type)
? type
: typeof(StandardAnalyzer);
}
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ public bool TryCreateIndex(LuceneIndexModel configuration)
LuceneIndexItemIndexName = configuration.IndexName ?? "",
LuceneIndexItemChannelName = configuration.ChannelName ?? "",
LuceneIndexItemStrategyName = configuration.StrategyName ?? "",
LuceneIndexItemAnalyzerName = configuration.AnalyzerName ?? "",
LuceneIndexItemRebuildHook = configuration.RebuildHook ?? ""
};

Expand Down Expand Up @@ -170,6 +171,7 @@ public async Task<bool> TryEditIndexAsync(LuceneIndexModel configuration)

indexInfo.LuceneIndexItemRebuildHook = configuration.RebuildHook ?? "";
indexInfo.LuceneIndexItemStrategyName = configuration.StrategyName ?? "";
indexInfo.LuceneIndexItemAnalyzerName = configuration.AnalyzerName ?? "";
indexInfo.LuceneIndexItemChannelName = configuration.ChannelName ?? "";
indexInfo.LuceneIndexItemIndexName = configuration.IndexName ?? "";

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public IEnumerable<LuceneIndex> GetAllIndices()
{
var indices = (CacheSettings cs) =>
{
var luceneIndices = storageService.GetAllIndexDataAsync().Result.Select(x => new LuceneIndex(x, StrategyStorage.Strategies));
var luceneIndices = storageService.GetAllIndexDataAsync().Result.Select(x => new LuceneIndex(x, StrategyStorage.Strategies, AnalyzerStorage.Analyzers, AnalyzerStorage.AnalyzerLuceneVersion));

cs.CacheDependency = CacheHelper.GetCacheDependency(GetLuceneDependencyCacheKeys());

Expand All @@ -45,7 +45,7 @@ public IEnumerable<LuceneIndex> GetAllIndices()
}
cs.CacheDependency = CacheHelper.GetCacheDependency(GetLuceneDependencyCacheKeys());

return new LuceneIndex(indexConfiguration, StrategyStorage.Strategies);
return new LuceneIndex(indexConfiguration, StrategyStorage.Strategies, AnalyzerStorage.Analyzers, AnalyzerStorage.AnalyzerLuceneVersion);
};

return progressiveCache.Load(cs => index(cs), new CacheSettings(10, $"customdatasource|index|name|{indexName}"));
Expand All @@ -64,7 +64,7 @@ public IEnumerable<LuceneIndex> GetAllIndices()
}
cs.CacheDependency = CacheHelper.GetCacheDependency(GetLuceneDependencyCacheKeys());

return new LuceneIndex(indexConfiguration, StrategyStorage.Strategies);
return new LuceneIndex(indexConfiguration, StrategyStorage.Strategies, AnalyzerStorage.Analyzers, AnalyzerStorage.AnalyzerLuceneVersion);
};

return progressiveCache.Load(cs => index(cs), new CacheSettings(10, $"customdatasource|index|identifier|{identifier}"));
Expand All @@ -83,7 +83,7 @@ public LuceneIndex GetRequiredIndex(string indexName)
}
cs.CacheDependency = CacheHelper.GetCacheDependency(GetLuceneDependencyCacheKeys());

return new LuceneIndex(indexConfiguration, StrategyStorage.Strategies);
return new LuceneIndex(indexConfiguration, StrategyStorage.Strategies, AnalyzerStorage.Analyzers, AnalyzerStorage.AnalyzerLuceneVersion);
};

return progressiveCache.Load(cs => index(cs), new CacheSettings(10, $"customdatasource|index|identifier|{indexName}"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,21 @@
using Lucene.Net.Facet.Taxonomy.Directory;
using Lucene.Net.Index;
using Lucene.Net.Store;
using Lucene.Net.Util;

using LuceneDirectory = Lucene.Net.Store.Directory;

namespace Kentico.Xperience.Lucene.Core.Indexing;

public class DefaultLuceneIndexService : ILuceneIndexService
{
private const LuceneVersion LUCENE_VERSION = LuceneVersion.LUCENE_48;

public T UseIndexAndTaxonomyWriter<T>(LuceneIndex index, Func<IndexWriter, ITaxonomyWriter, T> useIndexWriter, IndexStorageModel storage, OpenMode openMode = OpenMode.CREATE_OR_APPEND)
{
using LuceneDirectory indexDir = FSDirectory.Open(storage.Path);

var analyzer = index.LuceneAnalyzer;

//Create an index writer
var indexConfig = new IndexWriterConfig(LUCENE_VERSION, index.Analyzer)
var indexConfig = new IndexWriterConfig(AnalyzerStorage.AnalyzerLuceneVersion, analyzer)
{
OpenMode = openMode // create/overwrite index
};
Expand All @@ -33,8 +32,10 @@ public TResult UseWriter<TResult>(LuceneIndex index, Func<IndexWriter, TResult>
{
using LuceneDirectory indexDir = FSDirectory.Open(storage.Path);

var analyzer = index.LuceneAnalyzer;

//Create an index writer
var indexConfig = new IndexWriterConfig(LUCENE_VERSION, index.Analyzer)
var indexConfig = new IndexWriterConfig(AnalyzerStorage.AnalyzerLuceneVersion, analyzer)
{
OpenMode = openMode // create/overwrite index
};
Expand Down
Loading
Loading