Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

841- Relational data - beta feature #1679

Merged
12 commits merged into from May 31, 2020
Merged

841- Relational data - beta feature #1679

12 commits merged into from May 31, 2020

Conversation

ghost
Copy link

@ghost ghost commented May 15, 2020

Description

This pull request introduces the capability to produce relational data for the JSON output format.

Changes

  • Changes to components to allow better composition of components
  • Changes to permit infinite data generation
  • Extension to profile format to support relational data - backward compatible
  • Production of relational data as JSON output for:
    • One to one relationships (nested objects)
    • One to many relationships (nested arrays) with conditional min and max extents
  • Streamed JSON output (to console) and to a file
  • Not performance tested
  • Not every scenario tested (hence being left as a beta feature)
  • Not every change/component has automated tests
  • Documentation produced but not actively linked due to the beta feature status

Additional notes

This is considered to be a beta feature to prove and provide the capability of relational data construction within DataHelix. Only a few scenarios have been considered, more scenarios could require a fundamental change to the implementation and could require a change to the profile format. That said every attempt has been made to ensure the profile format is sufficiently abstract from the implementation to reduce the chance of it requiring a change.

Issue

Related to #841
Related to #1534

@finos-admin

This comment has been minimized.

@ghost ghost marked this pull request as ready for review May 15, 2020 07:29
Copy link
Contributor

@ColinEberhardt ColinEberhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see some very clear documentation on this beta feature - good job 👍

Copy link
Contributor

@MattCline16 MattCline16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and questions in place, overall:

  1. This seems to be a little more about "nested" data than necessarily relational data - I couldn't for example see usage of the parent record data in the child
  2. I've mostly focused on the "relational code" since I'm assuming the infinite streamed generation has been reviewed elsewhere


import java.util.List;

public interface SubGeneratedObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "SubGeneratedObject" holds many generated objects?

import java.util.HashMap;
import java.util.Map;

public class GeneratedRelationalData implements GeneratedObject, RelationalGeneratedObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface inheritance question: can something be a RelationalGeneratedObject without also being a GeneratedObject?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes, in practice no. So the RelationalGeneratedObject interface could extend the GeneratedObject interface. As a matter of course, interface inheritance is something I generally avoid

}

public void addSubObject(Relationship relationship, SubGeneratedObject subObject) {
if (!subObjects.containsKey(relationship.getName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use the "relationship name" instead of the relationship as the key?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub objects are keyed based on how they would be emitted rather than their java instance reference. This models the data rather than the state of the application

import java.util.List;
import java.util.stream.Stream;

public class ExtentAugmentedFields implements Fields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would good to describe what these fields are - as far as I can tell they are extra pseudo-fields that are added to relational fields to control how many child records are generated. And the fields must be named "min" and "max" if they are to be manipulated in the extents section?

int numberOfObjects = getNumberOfObjectsToProduce(range.getMin(), range.getMax());

generatedObject.addSubObject(relationship, new SubGeneratedObject() {
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you creating a new object here from the interface? Is there a better way to do this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes creating an anonymous object that implements the interface.
There could be a concrete type to produce this data, just an alternative way of producing the data as required

return;
}

generatedObject.addSubObject(relationship, new SubGeneratedObject() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar code above in adding this SubGeneratedObject

@ghost
Copy link
Author

ghost commented May 22, 2020

@MattCline-SL

Some comments and questions in place, overall:

  1. This seems to be a little more about "nested" data than necessarily relational data - I couldn't for example see usage of the parent record data in the child

Yes. You could say that, nested objects are a means to create relational data.

  1. I've mostly focused on the "relational code" since I'm assuming the infinite streamed generation has been reviewed elsewhere

The other changes are in this PR also as the refactoring was required to be able to achieve relational/nested data production

@ghost ghost merged commit c441840 into finos:master May 31, 2020
@ghost ghost deleted the poc/relational-data branch May 31, 2020 07:16
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants