Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migration of nested objects in CRIS entities: improper handling of optional nested fields #398

Open
saschaszott opened this issue Oct 27, 2023 · 7 comments
Labels

Comments

@saschaszott
Copy link

saschaszott commented Oct 27, 2023

In DSC5 we are using nested objects to model affilations in researcher profiles. Each affiliation consists of 4 fields: org unit (ou pointer; mandatory), role (mandatory), start date (optional), end date (optional).

Currently, we have several affilations without start date and / or end date.

For example, in DSC5 we have one RP with 2 affiliations (screen shot)

image

Currently, the CRIS migration procedure (Pentaho transformation) inverts the order of affilations. This is due to step Sort position in entity_migration.krt (ascending = N)

The expected migration result of the given example RP is:

image

Currently, DSC7 produces an invalid migration result in case of affiliations with optional fields (as in the given example):

image

In this example the assignment of end date is not correct.

This bug is caused by the Pentaho migration step Select values 4 which removes nested_object_id and positiondef in each row of the stream. This means that subsequent migration steps cannot determine the correct assignments of nested fields to a given affiliation (nested object).

This bug affects the migration of RPs that have at least 2 affilations with missing nested fields.

We'll provide a bugfix (adaption of the Pentaho migration).

@saschaszott saschaszott changed the title migration of nested objects in CRIS entitites: improper handling of optional nested fields migration of nested objects in CRIS entities: improper handling of optional nested fields Oct 27, 2023
@atarix83
Copy link

Hi @saschaszott

thanks for opening the issue, anyway this should be already resolved with #276.

Feel free to re-open again the issue if is not working for you

@saschaszott
Copy link
Author

Hi @atarix83 , the PR (276) you mentioned, does not fix the problem. We have integrated #276 into our code base and are able to reproduce the bug.

@saschaszott
Copy link
Author

@atarix83 , the problem is raised in the pentaho transformation step named Select values 4. In this early transformation step the important sorting information in nested_object_id and positiondef is removed. We have fixed the Pentaho transformation (requires additional steps) locally. Let me know if you are interested in a PR.

@atarix83 atarix83 reopened this Nov 16, 2023
@atarix83
Copy link

@saschaszott

yes please open a PR when you can, so we can verify. Thanks

@saschaszott
Copy link
Author

@atarix83 , sorry for the long pause, but today I was able to reproduce the problem described above with the latest version of DSC (2023.02.02). To illustrate the problem, I'll give you an example of a nested affiliation object (with 3 entries):

image

In the migrated RP you'll find an incorrect state

image

As you can see in the metadata full view, there is an uneven number of affiliation.startDate and affiliation.endDate fields

image

@saschaszott
Copy link
Author

saschaszott commented Feb 14, 2024

To better illustrate the change in the entity migration transformation, I'll provide a before-after comparison of the change in entity-migration.ktr we propse:

Before

image

After

image

@saschaszott
Copy link
Author

You can find our proposed bugfix in PR #425 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants