Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 308: Create workflow to add a note to all New Bookshelf Items on a regular basis. #311

Merged
merged 5 commits into from
Oct 17, 2023

Conversation

kaladay
Copy link
Contributor

@kaladay kaladay commented Oct 12, 2023

resolves #308

This creates a Workflow named New Bookshelf Items Note that runs on a Cron Job interval at 7:00am each day.

This utilizes approximately day old data from LDP in order to perform SQL queries that are not allowed by the CQL engine (using a granular filter). These SQL queries allow for a more granular fetch of the desired rows to reduce the amount of unnecessary Items being processed. This stale data must be then re-fetch from the appropriate FOLIO end point to get a fresh copy (given that the item data is a day old, at least). Then this Item data must be validated a second time.

The filtering is as follows:

  • The Item must be in the 'Checked out' status.
  • The Item must be under 'Evans nbs'.
  • The Item must not already have a 'Check in' note of 'Place on New Bookshelf'.

The original design of this work was to conditionally check for whether the filter is still valid after fetching a fresh copy from the FOLIO endpoint. The filter check is then used to conditionally determine if the PUT update should happen or not. There have been difficulties where this would not build and there is no clear indication of why the build is failing. A decision was made to not lose any more time investigating this and the conditional logic has been removed. The code in this commit always performs the PUT update even if unnecessary because the conditional logic has been removed. Ideally, this problem should be solved in some future commit and the conditionally logic should be restored.

The original problematic code can be found via this commit hash 2754771.

Additional problems regarding the logLevel.
Errors are being thrown about logLevel not being defined (despite explicitly defining it in the fw configuration settings). The conditional checks on the logging are removed until such time that this problem can be identified and resolved. Once resolved the logLevel checks should be restored.

The logic in this Workflow utilizes declaring the Item (with a variable name of item) being looped over and updated using a different variable called updatedItem. This might be unnecessary but it feels safer to not modify the variable used during the looping process of the Workflow subprocess.

The itemId variable is defined because problems were observed when trying to use the item as both a JSON payload data object and as a Java Object to perform item.id. Using both of these in the request (225fbb68-b311-4623-b397-856357be1c90 "Request Item") ended up being a problem. The item is therefore now used only as JSON data (as updatedItem) and the item.id is extracted out into the itemId process variable.

Warnings were observed regarding the Response from the 0d236b3c-e571-4b37-8114-7570c9d496a4 "Update Item" FOLIO endpoint PUT request having an empty response. To suppress this warning, the outputVariable was removed and replaced with just an empty object {}. This had the ironic effect of presenting a new warning about the outputVariable being an empty object. Either way, we get a warning, so this is left as-is as an empty object.

The default historyTimeToLive of 90 is used because that is common. No decision on what a good or bad history TTL is made.

In order to keep the design simple for this first pass, some values that should otherwise not be hard-coded are hard-coded. These are:

  • Effective Location UUID of 07a4e97e-9941-4f60-ad25-577bb6672c08.
  • The message text for a new bookshelf of Place on New Bookshelf.

To further keep this first pass simple, there is no limit and therefore no looping performed over the total number of items. This could become considerably large and in such cases additiona COUNT queries with a loop over that from the LDP would be needed. Furthermore, this performs the update sub-process loop in a serial manner rather than a parallel manner.

The start time cron jobs are in UTC.
The goal is to be in approximately 7:00am Central Time, which is 12 in UTC.
At -5, this 12 UTC would be 7:00am in Central.
At -6, this 12 UTC would be 6:00am in Central.

Both cases are safely before the usual 8:00am Central Time start of the day.

The LDP uses json and not jsonb like Folio.
The Postgresql decided not to provide support for the contains operator for json.
The solution here is to cast the circulationNotes array into a jsonb structure.
This likely has a performance impact but it is after the other filters are selected.
This should keep the cost down.
The circulationNotes will likely normally be a small array.

Note: The syntax <@ was attempted but the Camunda Freemarker syntax tries to interpret this.
Change the <@ into \l@ then causes the JSON to have an escape problem for the \.
Change the \l@ into \\l@ and somehow the should be resulting \l does not get processed correctly by freemarker and it errors on the slash.
The solution is to instead swap the left and right hand sides using instead @>.

The expanded SQL query used here for selecting from LDP is:

(
  select id from inventory_items
    where effective_location_id = '07a4e97e-9941-4f60-ad25-577bb6672c08' and status__name = 'Checked out' and json_array_length(data->'circulationNotes') = 0
) union (
  select s.id from (
    select id, to_jsonb(data->'circulationNotes') as notes
    from inventory_items where effective_location_id = '07a4e97e-9941-4f60-ad25-577bb6672c08' and status__name = 'Checked out'
  ) s
    where not s.notes @> '[{"note":"Place on New Bookshelf"}]'::jsonb
);

…n a regular basis.

This creates a Workflow named `New Bookshelf Items Note` that runs on a Cron Job interval at 7:00am each day.

This utilizes approximately day old data from LDP in order to perform SQL queries that are not allowed by the CQL engine (using a granular filter).
These SQL queries allow for a more granular fetch of the desired rows to reduce the amount of unnecessary Items being processed.
This stale data must be then re-fetch from the appropriate FOLIO end point to get a fresh copy (given that the item data is a day old, at least).
Then this Item data must be validated a second time.

The filtering is as follows:
- The Item must be in the 'Checked out' status.
- The Item must be under 'Evans nbs'.
- The Item must not already have a 'Check in' note of 'Place on New Bookshelf'.

The original design of this work was to conditionally check for whether the filter is still valid after fetching a fresh copy from the FOLIO endpoint. The filter check is then used to conditionally determine if the PUT update should happen or not. There have been difficulties where this would not build and there is no clear indication of why the build is failing.
A decision was made to not lose any more time investigating this and the conditional logic has been removed.
The code in this commit always performs the PUT update even if unnecessary because the conditional logic has been removed. Ideally, this problem should be solved in some future commit and the conditionally logic should be restored.

The original problematic code can be found via this commit hash 2754771.

Additional problems regarding the `logLevel`.
Errors are being thrown about `logLevel` not being defined (despite explicitly defining it in the fw configuration settings).
The conditional checks on the logging are removed until such time that this problem can be identified and resolved. Once resolved the `logLevel` checks should be restored.

The logic in this Workflow utilizes declaring the Item (with a variable name of `item`) being looped over and updated using a different variable called `updatedItem`. This might be unnecessary but it feels safer to not modify the variable used during the looping process of the Workflow subprocess.

The `itemId` variable is defined because problems were observed when trying to use the `item` as both a JSON payload data object and as a Java Object to perform `item.id`. Using both of these in the request (`225fbb68-b311-4623-b397-856357be1c90` "Request Item") ended up being a problem. The `item` is therefore now used only as JSON data (as `updatedItem`) and the `item.id` is extracted out into the `itemId` process variable.

Warnings were observed regarding the Response from the `0d236b3c-e571-4b37-8114-7570c9d496a4` "Update Item" FOLIO endpoint PUT request having an empty response.
To suppress this warning, the `outputVariable` was removed and replaced with just an empty object `{}`.
This had the ironic effect of presenting a new warning about the `outputVariable` being an empty object.
Either way, we get a warning, so this is left as-is as an empty object.

The default `historyTimeToLive` of `90` is used because that is common.
No decision on what a good or bad history TTL is made.

In order to keep the design simple for this first pass, some values that should otherwise not be hard-coded are hard-coded.
These are:
  - Effective Location UUID of `07a4e97e-9941-4f60-ad25-577bb6672c08`.
  - The message text for a new bookshelf of `Place on New Bookshelf`.

To further keep this first pass simple, there is no limit and therefore no looping performed over the total number of items.
This could become considerably large and in such cases additiona `COUNT` queries with a loop over that from the LDP would be needed.
Furthermore, this performs the update sub-process loop in a serial manner rather than a parallel manner.

The expanded SQL query used here for selecting from LDP is:
```sql
(
  select id from inventory_items
    where effective_location_id = '07a4e97e-9941-4f60-ad25-577bb6672c08' and status__name = 'Checked out' and json_array_length(data->'circulationNotes') = 0
) union (
  select s.id from (
    select id, json_array_elements(data->'circulationNotes')->>'note' as note
    from inventory_items where effective_location_id = '07a4e97e-9941-4f60-ad25-577bb6672c08' and status__name = 'Checked out'
  ) s
  where 'Place on New Bookshelf' <> any (string_to_array(s.note, ';'))
);
```
Copy link
Contributor

@Dbreck-TAMU Dbreck-TAMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

The use of var in javascript isn't recommended though.
See:

  • nbs-items-note/nodes/js/buildItems.js
    -- Line 1
  • nbs-items-note/nodes/js/processItem.js
    -- Line 1, 5, 14

@kaladay
Copy link
Contributor Author

kaladay commented Oct 12, 2023

Looks good.

The use of var in javascript isn't recommended though. See:

  • nbs-items-note/nodes/js/buildItems.js
    -- Line 1
  • nbs-items-note/nodes/js/processItem.js
    -- Line 1, 5, 14

The Javascript engine used here is a limited/out-of-date/modified one.
Not every part of the Javascript language, let alone newer stuff, is supported.

…rd cron.

The first number is not minute, it is second.
This caused the cron job to run on the 7th minute of every hour rather at 7 am on every day.

Note: The last value apparently must be a `?` and not `*` or mod-camunda will fail on:
```
java.text.ParseException: Support for specifying both a day-of-week AND a day-of-month parameter is not implemented.
```
The previous SQL utilized `any()`.
This is apparently a mistake.
The `any()` will select anything that matches.
In the case where there are two things in the array, one that matches and one that doesn't, then the `any()` will match.
This is not the correct behavior.

The new approach is to use the `jsonb` contains operator `<@`.
The LDP uses `json` and not `jsonb` like Folio.
The Postgresql decided not to provide support for the contains operator for `json`.
The solution here is to cast the `circulationNotes` array into a `jsonb` structure.
This likely has a performance impact but it is after the other filters are selected.
This should keep the cost down.
The `circulationNotes` will likely normally be a small array.

Note: The syntax `<@` was attempted but the Camunda Freemarker syntax tries to interpret this.
Change the `<@` into `\l@` then causes the JSON to have an escape problem for the `\`.
Change the `\l@` into `\\l@` and somehow the should be resulting `\l` does not get processed correctly by freemarker and it errors on the slash.
The solution is to instead swap the left and right hand sides using instead `@>`.

This is the SQL query:
```sql
(
  select id from inventory_items
    where effective_location_id = '07a4e97e-9941-4f60-ad25-577bb6672c08' and status__name = 'Checked out' and json_array_length(data->'circulationNotes') = 0
) union (
  select s.id from (
    select id, to_jsonb(data->'circulationNotes') as notes
    from inventory_items where effective_location_id = '07a4e97e-9941-4f60-ad25-577bb6672c08' and status__name = 'Checked out'
  ) s
    where not s.notes @> '[{"note":"Place on New Bookshelf"}]'::jsonb
)
```
…sing multiple insertions of the note.

The note is being inserted multiple times because the existence checks are missing the `[i]` (index).
These incorrect checks always fail and so therefore the note is always added.

Add the missing index to the array so that duplicate notes do not appear.
The goal is to be in approximately 7:00am Central Time.
At -5, this would be 7:00am in Central.
At -6, this would be 6:00am in Central.

Both cases are safely before the usual 8:00am Central Time start of the day.
Copy link
Contributor

@wwtamu wwtamu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about the conditional branching approach failing and being abandoned. Otherwise, the workflow looks good as is and ran successfully.

image image

@kaladay kaladay merged commit 3569af2 into staging Oct 17, 2023
@kaladay kaladay deleted the 308-new_bookshelf_note branch October 17, 2023 15:41
@kaladay kaladay restored the 308-new_bookshelf_note branch October 17, 2023 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A workflow Needs to be created to add a note to all New Bookshelf Items on a regular basis
4 participants