Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving template processes is inefficient for a large number of template processes #6267

Open
thomaslow opened this issue Oct 14, 2024 · 5 comments
Labels

Comments

@thomaslow
Copy link
Collaborator

When clicking the "+" button of a project in order to create or import a new process, the page loads forever (>30 seconds).

To Reproduce
Steps to reproduce the behavior:

  1. Go to dashboard
  2. Click on + icon of project
  3. Get a coffee
  4. See the "Create new process page"

Expected behavior
The "Create new process" page should appear without any delay.

Release
Master

What is happening

I traced the problem to the "Process template" dialog that contains a list of all processes:

Screenshot from 2024-10-14 17-11-16

With my test database that contains ~80.000 process, this dialog tries to load all processes and its related entities, see:

public List<Process> getProcessesForChoiceList() {
try {
return ServiceManager.getProcessService().getTemplateProcesses();
} catch (DataException | DAOException e) {
Helper.setErrorMessage(e);
return new ArrayList<>();
}
}

public List<Process> getTemplateProcesses() throws DataException, DAOException {
List<Process> templateProcesses = new ArrayList<>();
BoolQueryBuilder inChoiceListShownQuery = new BoolQueryBuilder();
MatchQueryBuilder matchQuery = matchQuery(ProcessTypeField.IN_CHOICE_LIST_SHOWN.getKey(), true);
inChoiceListShownQuery.must(matchQuery);
for (ProcessDTO processDTO : ServiceManager.getProcessService().findByQuery(matchQuery, true)) {
templateProcesses.add(getById(processDTO.getId()));
}
templateProcesses.sort(Comparator.comparing(Process::getTitle));
return templateProcesses;
}

There are multiple problems:

  • there is no limit on the query
  • processes are queried both from the index and database, even though all required information are available in the index and could be parsed from the DTO objects (process id and title)
  • processes are sorted in Java instead of via the search index or database

By limiting the ElasticSearch query to return at most 100 processes, the "Create new process" page loads fine without any noticeable delay.

Suggested Changes

My suggestion would be to replace this drop down list with a simple search dialog or by asking the user to provide the process id without listing all processes.

@thomaslow thomaslow added the bug label Oct 14, 2024
@henning-gerhardt
Copy link
Collaborator

Did you use an catalogue or not? If I try to create a new process I got first the catalogue dialog and later I can use the process templates to create a new process based on the selected process template.

So far as I know there should only processes are shown which are used as a process template. This process template processes are serial / periodical issues which should not so many times existing in a Kitodo.Production instance (@andre-hohmann correct me if I'm wrong). But your catch is still true but the solution should be discussed.

@thomaslow
Copy link
Collaborator Author

@henning-gerhardt I have only set up one "process template" for each project. If I set up two process templates, there is a selection dialog before the "Create new process" page is loaded. However, the selection has no impact on performance.

I guess the problem is that - a few years ago - I have a auto-generated 1000s of processes with the property "inChoiceListShown" set to true, such that this particular dialog now contains a huge list.

If this is not a problem in live production systems, I will manually edit my test database to fix the problem for me. Thank you for your feedback!

@henning-gerhardt
Copy link
Collaborator

henning-gerhardt commented Oct 14, 2024

You got trapped. There are two "process templates": one as you well known process template of the projects and the other over the "inChoiceListShown" which is displayed in the process create dialog as "process template" chooser - even in your own screenshot below the selected / shown ruleset choice. The shown list of processes in your screenshot are the processes which have the "inChoiceListShown" property set to true in the database / index and they are called "process templates" too but the based on already existing processes and not on the "process template" of a project. There is a better naming at least in English (or maybe even in German) needed here to show the differences.

@thomaslow
Copy link
Collaborator Author

Okay. I'll keep the issue open for a few days, in case somebody else had a similar problem or thinks we need to implement some changes. If nobody else comments, I will close the issue in a few days.

@solth
Copy link
Member

solth commented Oct 14, 2024

I think @henning-gerhardt is mostly correct, just one small remark: to avoid confusion - even though it's obviously only partly successful, based on Henning's description! - the "Templates", which have their own database table, are called "Templates" or "Process templates", while the processes with the specific flag inChoiceListShown set to true are referred to as "Template processes", so the other way round:

Template process ↔️ Process template

Nonetheless I support @thomaslow's suggestion to refactor the retrieval of "Template processes". Even if having so many "template processes" in the system is unlikely, if the current query can be improved, that is exactly what we should do.

@thomaslow thomaslow changed the title Create process page loads extremely slow Retrieving template processes is inefficient for a large number of template processes Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants