Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize IndexingContentHandlers perf #13721

Closed
wants to merge 37 commits into from

Conversation

hyzx86
Copy link
Contributor

@hyzx86 hyzx86 commented May 20, 2023

try to fix #13071

  1. Avoid retrieving content from the database multiple times.

image

Summary by CodeRabbit

  • New Features
    • Enhanced content management capabilities with improved handling and indexing logic across different contexts such as publishing, updating, and creating content items.
  • Refactor
    • Streamlined content indexing processes for Lucene, Azure AI, and Elasticsearch to boost efficiency and clarity.
  • Tests
    • Expanded test coverage to include new utilities and indexing functionalities.

@hyzx86 hyzx86 changed the title avoid retrieving content from the database multiple times fix LuceneIndexingContentHandler Problems May 20, 2023
@hishamco hishamco requested a review from Skrypt May 20, 2023 12:54
@hyzx86
Copy link
Contributor Author

hyzx86 commented May 20, 2023

I haven't changed the rebuildindex function yet, so let's see if I got it right here

@Skrypt
Copy link
Contributor

Skrypt commented May 23, 2023

I'm surprised it is broken. Will test it and analyze the PR.

Copy link
Contributor

This pull request has merge conflicts. Please resolve those before requesting a review.

@Piedone
Copy link
Member

Piedone commented Mar 21, 2024

Do you want to get back to this, @Skrypt?

@Skrypt Skrypt added the Lucene label Mar 22, 2024
@Skrypt
Copy link
Contributor

Skrypt commented Mar 22, 2024

Index Latest option is working on main branch. No issue there. I think that this PR will break it though. It may fix performance issues. We need to make sure that the PreviousItem and ContentItem from the Context is never altered.

Skrypt
Skrypt previously approved these changes Mar 22, 2024
@Skrypt Skrypt changed the title fix LuceneIndexingContentHandler Problems Optimize LuceneIndexingContentHandler perf Mar 22, 2024
@Skrypt Skrypt dismissed their stale review March 22, 2024 02:56

optimizing it

Add IsPublishing flag on PublishContentContext to handle when unpublishing
Copy link
Contributor

@Skrypt Skrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need unit tests to make sure all scenarios work with the DefaultContentManager. Because, while testing this I needed to also fix the use case when we unpublish a content item which was not removing the content item in the index. I'm approving, but we need unit tests.

@Skrypt Skrypt added needs triage Needs Unit Test(s) Unit Tests or Functional Tests are required labels Mar 22, 2024
@Piedone
Copy link
Member

Piedone commented Apr 25, 2024

Does this still need triage, @Skrypt? It seems it was actually triaged, with an all-start reviewer team being there.

@Skrypt
Copy link
Contributor

Skrypt commented Apr 29, 2024

I believe it still needs proper unit tests. If we want to merge then we need to at least create a task/ticket/issue about Lucene Unit Tests.

@Piedone
Copy link
Member

Piedone commented Apr 29, 2024

What do we need the triage for, exactly? Because then we don't merge the PR until it has unit tests.

@Skrypt
Copy link
Contributor

Skrypt commented Apr 30, 2024

Ok, so no merge untill we have proper unit tests just like we said before.

@hishamco
Copy link
Member

hishamco commented May 1, 2024

@hyzx86 can you write a unit test for this to finalize this PR

Let me know if you need help

@hyzx86

This comment was marked as outdated.

@hyzx86
Copy link
Contributor Author

hyzx86 commented May 2, 2024

@hishamco, @Skrypt I need help ! 😅

Do we currently have a way to check that all indexing tasks in the system have been completed?
I noticed that there is a table called IndexingTask.
Can we check this table to make sure that all tasks have been completed?

https://github.com/OrchardCMS/OrchardCore/blob/main/src/OrchardCore.Modules/OrchardCore.Indexing/IndexingTaskManager.cs

But I noticed that will always be some data in this table, I don't know if it is because the unit tests will not wait for
IndexingTaskManager.FlushAsync completes the reason, I will go to test

image

@hishamco
Copy link
Member

hishamco commented May 2, 2024

As I looked at the code the FlushAsync() should handle this

I don't know if it is because the unit tests will not wait for
IndexingTaskManager.FlushAsync completes the reason,

Which unit test do you refer to?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Out of diff range and nitpick comments (1)
test/OrchardCore.Tests/Apis/ContentManagement/DeploymentPlans/ContentStepLuceneQueryTests.cs (1)

Line range hint 1-59: Ensure consistent use of namespaces and imports.

The file includes multiple namespaces and using directives. Ensure that all these are necessary for the current test implementations to avoid unnecessary dependencies and to keep the code clean and maintainable.

@hyzx86 hyzx86 marked this pull request as draft May 2, 2024 07:56
{
Name = "CheckIndexingTask",
Template = "Select Count(1) as Total from IndexingTask"
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hishamco , Here ,
Looks like I found another Bug, and here it will never work because of syntax errors

var deleteCmd = $"delete from {dialect.QuoteForTableName(table, _store.Configuration.Schema)} where {dialect.QuoteForColumnName("ContentItemId")} {dialect.InOperator("@Ids")};";
do
{
var pageOfIds = ids.Take(pageSize).ToArray();
if (pageOfIds.Length > 0)
{
await transaction.Connection.ExecuteAsync(deleteCmd, new { Ids = pageOfIds }, transaction);

Copy link
Contributor Author

@hyzx86 hyzx86 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I adjusted them... The data is still not deleted
ado.net directly use string collections as arguments in the in expressions?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently not, the logic here is only to prevent repeated insertions, and there is no code to clean up the index task after the task is completed 😢


//await contentManager.CreateAsync(firstContent);
//var contentFromDb = await contentManager.GetAsync(firstContent.ContentItemId);
//Assert.NotNull(contentFromDb); faild
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt
Our content item operation is a bit complicated~~

Copy link
Contributor Author

@hyzx86 hyzx86 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Piedone ,need your help , I don't know how to do this unit test..
#15601 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your question, specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as @Skrypt mentioned in this issue at #15601 ,
I understand that we should verify the impact on indexing functionality by separately testing Create, Update, Draft, and Publish actions with ContentManager.

However, based on the current design of DefaultContentManager which requires calling a sequence of methods to save data to the database and trigger Lucene indexing, it seems difficult if not impossible to isolate tests for just the Create and Update actions.

All that aside, am I currently unit testing the right way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to test DefaultContentManager or have a full indexing test. Rather, unit tests on LuceneIndexingContentHandler with mocks is sufficient. I.e, the tests need to check that LuceneIndexingContentHandler behaves correctly based on what it calls on the mocks.

If you search for "mock.setup" and "mock.verify" in the codebase, you'll see how this works. A lot of tests do something similar.

Copy link
Contributor

This pull request has merge conflicts. Please resolve those before requesting a review.

Copy link
Contributor

It seems that this pull request didn't really move for quite a while. Is this something you'd like to revisit any time soon or should we close? Please comment if you'd like to pick it up.

@github-actions github-actions bot added the stale label Jul 21, 2024
Copy link
Contributor

github-actions bot commented Aug 5, 2024

Closing this pull request because it has been stale for very long. If you think this is still relevant, feel free to reopen it.

@github-actions github-actions bot closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lucene merge conflict Needs Unit Test(s) Unit Tests or Functional Tests are required stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Achieve the most efficient data import
6 participants