Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XWIKI-22323: Refactoring operation should wait for the Solr index to be empty before proceeding #3403

Merged
merged 15 commits into from
Oct 14, 2024

Conversation

michitux
Copy link
Contributor

@michitux michitux commented Sep 9, 2024

Jira URL

https://jira.xwiki.org/browse/XWIKI-22323

Changes

Description

  • Introduce a new ReadyIndicator interface that allows waiting for the link index to become ready while getting a progress percentage.
  • In the BackLinkUpdaterListener, wait for the index to become ready when a job is active and display the indexing progress.
  • Provide a ready indicator including indexing progress in the Solr indexer.
  • Add some tests.

Clarifications

  • There are still some TODO's left, the plan is to finish them before merging this PR.
  • The code got quite a lot more difficult than I originally expected as the code attempts to provide accurate progress information.
  • It is basically impossible to provide accurate information how many indexing requests are remaining, that's why the API doesn't really attempt to do that.

Screenshots & Video

Executed Tests

LANG=C.UTF-8 mvn clean install -Pdocker,legacy,integration-tests,snapshotModules,quality -pl $(git diff --name-only origin/master | awk -F'/' '{for(i=NF-1; i>0; i--){if($i ~ /src$/){printf(":%s\n", $(i-1)); break;}}}' | sort -u | paste -sd "," -)

Expected merging strategy

  • Prefers squash: Yes
  • Backport on branches:
    • stable-16.4.x
    • stable-15.10.x

…be empty before proceeding

* Introduce a new ReadyIndicator interface that allows waiting for the
  link index to become ready while getting a progress percentage.
* In the BackLinkUpdaterListener, wait for the index to become ready
  when a job is active and display the indexing progress.
* Provide a ready indicator including indexing progress in the Solr
  indexer.
* Add some tests.
…be empty before proceeding

* Remove leftover LinkIndexingStatus.
…be empty before proceeding

* Fix some math problems in the progress calculation.
* Don't directly inject the link store into the link update listener to
  avoid early initialization.
…be empty before proceeding

* Use Future<Void> for the ready indicator as there is no result value.
…be empty before proceeding

* Move the ReadyIndicator to the store API.
…be empty before proceeding

* Separate the SolrIndexerReadyIndicator from the DefaultSolrIndexer and
  add tests.
* Restore the original coverage.
…be empty before proceeding

* Fix the logic for updating progress steps.
…be empty before proceeding

* Modernize the jobRunner JavaScript code
* Continue polling the job status when the job is waiting to detect when
  a question is answered in the background (by another browser tab or on
  the server).
…be empty before proceeding

* Add support in rename and delete job requests to indicate if the
  job should wait for indexing to finish.
* Ask the user after 10 seconds if the refactoring should wait for link
  indexing to finish.
* Add unit tests.
…be empty before proceeding

* Add an integration test.
…be empty before proceeding

* Update since-versions from 16.8.0RC1 to 16.8.0.
@michitux michitux marked this pull request as ready for review September 18, 2024 14:05
@michitux
Copy link
Contributor Author

After waiting for 10 seconds, a question is now displayed:

grafik

The waiting for the index continues in the background and when the indexing finishes before the user responds, the question is dismissed. This behavior (as well as the two options in the questions) is also tested in an integration test.

This whole thing became a lot bigger than I originally imagined.

* @since 15.10.13
*/
@Unstable
public boolean isWaitForIndexing()
Copy link
Member

@tmortagne tmortagne Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't MoveAttachmentRequest have a true by default #isWaitForIndexing() too ? (since we refactor links to the moved attachments AFAIK, in MovedAttachmentListener, use case which seems to be missing in your pull request right now).

I think a EntityRequest#isWaitForIndexing (false by default) would make sense (DeleteRequest, AbstractCopyOrMoveRequest and MoveAttachmentRequest would just overwrite the default value to be true instead of false). Would things like the code in BackLinkUpdaterListener easier at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the methods to EntityRequest but I set them to true by default as the waiting is only actually happening when links are refactored. Therefore, I think it is okay to have it true by default. Also, I've added the waiting logic to MovedAttachmentListener, refactoring the code to avoid duplicating the logic for it.

@vmassol
Copy link
Member

vmassol commented Sep 19, 2024

After waiting for 10 seconds, a question is now displayed:

grafik

The waiting for the index continues in the background and when the indexing finishes before the user responds, the question is dismissed. This behavior (as well as the two options in the questions) is also tested in an integration test.

This whole thing became a lot bigger than I originally imagined.

This looks good. It would be even nicer to explain that this is about all the pages in the wiki and not just the current page (it' not 100% clear IMO) and to display the indexing counter in the format (N/P where P is the total items to index and N the number of already indexed ones).

Thanks a lot Michael for this work, sorry it's taking more time than planned though.

@michitux
Copy link
Contributor Author

to display the indexing counter in the format (N/P where P is the total items to index and N the number of already indexed ones).

Unfortunately, it's impossible to know how many items need to be indexed. The item to be indexed could be "the farm" and there is no counter how many documents of the farm still need to be added to the indexing queue. More precisely, we have two queues. The first contains the indexing requests like "the farm". The second contains the actual items to be indexed like a document or an XObject of a document. The second queue has a limited size. This means that when, e.g., the whole farm shall be re-indexed in a large wiki, the second queue will basically always be full while the first queue could be empty. In such a situation, we have no information how many items will still be added to the second queue before the next item of the first queue is processed.

For this reason, the progress information is quite approximate, and I think it would be misleading to display absolute numbers. What happens internally is that the code inserts a special item into the first queue and tracks its progress. The progress is divided in two phases. If the second queue is more than 90% full or there are at least two items in the first queue, 50% of the progress is the progress in the first queue. Otherwise, it's just 10% as normally the first queue is fast. Once the special item gets into the second queue, the progress in the second queue is tracked. Progress is tracked by comparing the number of items removed already to the size of the queue when the special item is added. That progress is also approximate as the code doesn't add locks around queue operations and thus in situations of high load there could be inaccuracies (like the progress assuming that all items before the special item were already removed even though there are actually some more to go). This just affects the display of the progress bar and not the actual waiting, but it's another reason why I don't want to display absolute numbers.

…be empty before proceeding

* Move the waiting for link indexing property to EntityRequest
* Move the code for logging into the LinkIndexingWaitingHelper
* Wait for link indexing before adapting links after moving attachments
…be empty before proceeding

* Rename SolrIndexer#getReadyIndicator to SolrIndexer#waitReady.
* Fix the exceptional completion of the ready indicator to not complete
  it twice in the case of an interrupt.
…be empty before proceeding

* Update since-versions.
@michitux michitux added backport stable-15.10.x Used for automatic backport to 15.10.x branch. backport stable-16.4.x and removed backport stable-15.10.x Used for automatic backport to 15.10.x branch. labels Oct 8, 2024
@michitux michitux requested a review from mflorea October 14, 2024 08:25
Copy link
Member

@mflorea mflorea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jobRunner.js changes look good to me. I haven't checked deeply the other changes though.

…be empty before proceeding

* Remove LTS since-versions as we don't plan to backport this.
@michitux michitux merged commit 00b8440 into xwiki:master Oct 14, 2024
1 check passed
@michitux michitux deleted the XWIKI-22323 branch October 14, 2024 14:23
michitux added a commit that referenced this pull request Oct 14, 2024
…be empty before proceeding (#3403)

* Introduce a new ReadyIndicator interface that allows waiting for the
  link index to become ready while getting a progress percentage.
* In the BackLinkUpdaterListener, wait for the index to become ready
  when a job is active and display the indexing progress.
* Provide a ready indicator including indexing progress in the Solr
  indexer.
* Modernize the jobRunner JavaScript code
* Continue polling the job status when the job is waiting to detect when
  a question is answered in the background (by another browser tab or on
  the server).
* Add support in entity requests to indicate if the job should wait for indexing to finish.
* Ask the user after 10 seconds if the refactoring should wait for link
  indexing to finish.
* Wait for link indexing before adapting links after moving attachments
* Add unit and integration tests.
* Adapt the code to Java 11 and older Mockito.
* Backport TestUtils#serializeLocalReference.

(cherry picked from commit 00b8440)
michitux added a commit to michitux/xwiki-platform that referenced this pull request Nov 13, 2024
…be empty before proceeding (xwiki#3403)

* Introduce a new ReadyIndicator interface that allows waiting for the
  link index to become ready while getting a progress percentage.
* In the BackLinkUpdaterListener, wait for the index to become ready
  when a job is active and display the indexing progress.
* Provide a ready indicator including indexing progress in the Solr
  indexer.
* Modernize the jobRunner JavaScript code
* Continue polling the job status when the job is waiting to detect when
  a question is answered in the background (by another browser tab or on
  the server).
* Add support in entity requests to indicate if the job should wait for indexing to finish.
* Ask the user after 10 seconds if the refactoring should wait for link
  indexing to finish.
* Wait for link indexing before adapting links after moving attachments
* Add unit and integration tests.
* Adapt the code to Java 11 and older Mockito.
* Backport TestUtils#serializeLocalReference.

(cherry picked from commit 00b8440)
(cherry picked from commit b16309d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants