Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array filtering support (Part #4): Postgres array filter impl. #191

Merged
merged 36 commits into from
Jan 16, 2024

Conversation

suresh-prakash
Copy link
Contributor

So far, we do not have support for filtering by LHS arrays.
The PRs in this series aim to support LHS arrays in filters.

An LHS array can contain either primitives or documents.
In the case of a primitive LHS array, we can apply a relational operator on each element.
In the case of a document LHS array, we can apply a generic filter (including relational expression or logical expression or even another array filter expression)

Sample data:

[
  {
    "key": 3,
    "planets": [
      {
        "name": "Planet 1",
        "elements": [
          "Oxygen",
          "Water",
          "Nitrogen"
        ]
      },
      {
        "name": "Planet 2",
        "elements": [
          "Oxygen",
          "Helium",
          "Water"
        ]
      }
    ]
  },
  {
    "key": 1,
    "planets": [
      {
        "name": "Mercury"
      },
      {
        "name": "Venus"
      },
      {
        "name": "Earth",
        "elements": [
          "Oxygen",
          "Nitrogen"
        ]
      },
      {
        "name": "Mars",
        "elements": [
          "Iron"
        ]
      },
      {
        "name": "Jupiter",
        "elements": []
      },
      {
        "name": "Saturn",
        "elements": []
      },
      {
        "name": "Uranus",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane"
        ]
      },
      {
        "name": "Neptune",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane",
          "Ammonia"
        ]
      }
    ]
  },
  {
    "key": 2,
    "planets": [
      {
        "name": "Mercury",
        "elements": [
          "Silicate",
          "Aluminum"
        ]
      },
      {
        "name": "Venus",
        "elements": [
          "Carbon Dioxide",
          "Sulfuric Acid"
        ]
      },
      {
        "name": "Earth",
        "elements": [
          "Oxygen",
          "Nitrogen",
          "Water"
        ]
      },
      {
        "name": "Mars",
        "elements": [
          "Iron",
          "Silicate"
        ]
      },
      {
        "name": "Jupiter",
        "elements": [
          "Hydrogen",
          "Helium"
        ]
      },
      {
        "name": "Saturn",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane"
        ]
      },
      {
        "name": "Uranus",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane",
          "Ammonia"
        ]
      },
      {
        "name": "Neptune",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane",
          "Ammonia",
          "Methane Hydrate"
        ]
      }
    ]
  }
]

Input query: Get a list of solar systems with at least one planet not having either oxygen or water.

SELECT * 
FROM solar_systems 
WHERE ANY(planets) [NOT ((ANY(elements) = "Oxygen") OR (ANY(elements) = "Water"))]

The filter API would look like

FilterTypeExpression filter = DocumentArrayFilterExpression.builder()
    .operator(ANY)
    .arraySource(IdentifierExpression.of("planets"))
    .filter(
       LogicalExpression.builder()
           .operator(NOT)
           .operand(
                 LogicalExpression.builder()
                       .operator(OR)
                       .operand(
                             ArrayRelationalFilterExpression.builder()
                                  .operator(ANY)
                                  .filter(
                                       RelationalExpression.of(
                                            IdentifierExpression.of("elements"),
                                            EQ,
                                            ConstantExpression.of("Oxygen"))))
                        .operand(
                             ArrayRelationalFilterExpression.builder()
                                  .operator(ANY)
                                  .filter(
                                       RelationalExpression.of(
                                            IdentifierExpression.of("elements"),
                                            EQ,
                                            ConstantExpression.of("Water"))))))
    .build();

Sample MongoDB match stage query to be generated

{
  "$expr":
  {
    "$anyElementTrue":
    {
      "$map":
      {
        "input":
        {
          "$ifNull": [
            "$planets",
            []
          ]
        },
        "as": "planet",
        "in":
        {
          "$not":
          {
            "$or": [
              {
                "$anyElementTrue":
                {
                  "$map":
                  {
                    "input":
                    {
                      "$ifNull": [
                        "$$planet.elements",
                        []
                      ]
                    },
                    "as": "element",
                    "in":
                    {
                      "$eq": ["$$element","Oxygen"]
                    }
                  }
                }
              },
              {
                "$anyElementTrue":
                {
                  "$map":
                  {
                    "input":
                    {
                      "$ifNull": [
                        "$$planet.elements",
                        []
                      ]
                    },
                    "as": "element",
                    "in":
                    {
                      "$eq": ["$$element","Water"]
                    }
                  }
                }
              }
            ]
          }
        }
      }
    }
  }
}

Sample Postgres Query:

SELECT *  
FROM solar_systems  
WHERE 
EXISTS 
(SELECT 1 
 FROM  jsonb_array_elements(COALESCE(str->'planets', '[]'::jsonb)) AS planet 
 WHERE 
 NOT 
 (
 EXISTS 
 (SELECT 1 
  FROM jsonb_array_elements(COALESCE(planet->'elements', '[]'::jsonb)) AS elements 
  WHERE TRIM('"' FROM elements::text) = 'Oxygen'
 )
 OR 
 EXISTS 
 (SELECT 1 
  FROM jsonb_array_elements(COALESCE(planet->'elements', '[]'::jsonb)) AS elements 
  WHERE TRIM('"' FROM elements::text) = 'Water'
 )
 )
);

Notes:

  1. Only ANY operator is supported for now
  2. A future PR would introduce a NOT logical operator
  3. ALL can be achieved by using NOT twice to the ANY filter (before and after). E.g.: Fetch solar systems with all planets containing both oxygen and water is equivalent to Fetch solar systems with no planet without both oxygen and water
  4. Similarly NONE can be achieved by using NOT once before ANY, because NONE = NOT ANY
  5. The examples given in this description are for illustrative purposes only. For practical applications, the queries would be simpler (E.g.: Use IN instead of OR and =, static import not and or logical operators, etc.)

This PR contains

  • Postgres array filter implementation

Previous PRs
#173
#188
#189

* Introduce schema for ArrayFilterExpression
* Refactor Mongo relational filter parser
@suresh-prakash suresh-prakash changed the title Array filtering support (Part #4): Postgres relational filter impl. Array filtering support (Part #4): Postgres array filter impl. Jan 7, 2024
Copy link

codecov bot commented Jan 7, 2024

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Comparison is base (88a0440) 79.57% compared to head (e63860b) 79.58%.

Files Patch % Lines
...1/vistors/PostgresFilterTypeExpressionVisitor.java 80.00% 5 Missing and 4 partials ⚠️
...s/PostgresIdentifierTrimmingExpressionVisitor.java 54.54% 3 Missing and 2 partials ⚠️
.../PostgresIdentifierReplacingExpressionVisitor.java 71.42% 1 Missing and 1 partial ⚠️
...sArrayRelationalWrappingFilterVisitorProvider.java 66.66% 1 Missing ⚠️
.../PostgresIdentifierAccessingExpressionVisitor.java 90.00% 1 Missing ⚠️
...elationalIdentifierAccessingExpressionVisitor.java 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main     #191   +/-   ##
=========================================
  Coverage     79.57%   79.58%           
- Complexity      956      957    +1     
=========================================
  Files           182      188    +6     
  Lines          4539     4623   +84     
  Branches        373      379    +6     
=========================================
+ Hits           3612     3679   +67     
- Misses          669      678    +9     
- Partials        258      266    +8     
Flag Coverage Δ
integration 79.58% <78.40%> (+<0.01%) ⬆️
unit 56.86% <10.22%> (-0.90%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jan 7, 2024

Test Results

 38 files  ±0   38 suites  ±0   32s ⏱️ ±0s
229 tests +1  229 ✅ +1  0 💤 ±0  0 ❌ ±0 
475 runs  +1  475 ✅ +1  0 💤 ±0  0 ❌ ±0 

Results for commit e63860b. ± Comparison against base commit 88a0440.

This pull request removes 1 and adds 2 tests. Note that renamed tests count towards both.
org.hypertrace.core.documentstore.ArrayFiltersQueryTest ‑ [1] Mongo
org.hypertrace.core.documentstore.ArrayFiltersQueryIntegrationTest ‑ [1] Mongo
org.hypertrace.core.documentstore.ArrayFiltersQueryIntegrationTest ‑ [2] Postgres

♻️ This comment has been updated with latest results.


@Override
public PostgresQueryParser getPostgresQueryParser() {
return baseVisitor.getPostgresQueryParser();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: In the PostgresIdentifierReplacingExpressionVisitor class, we initially verify the existence of the postgresQueryParser. Afterward, we reference the baseVisitor as follows:

postgresQueryParser != null ? postgresQueryParser : baseVisitor.getPostgresQueryParser()

Why aren't we doing the same here and directly accessing it from the baseVisitor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that is also not required as we are not passing in any baseVisitor in the constructor. Updated there to get rid of the unnecessary conditional statement.


@Override
public PostgresQueryParser getPostgresQueryParser() {
return baseVisitor.getPostgresQueryParser();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above: #191 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that is also not required as we are not passing in any baseVisitor in the constructor. Updated there to get rid of the unnecessary conditional statement.

// Convert 'elements' to planets->'elements' where planets could be an alias for an upper
// level array filter
// Also, for the first time (if this was not under any nesting), use the field identifier
// visitor to make it document->'elements'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this document->'elements' correct? should it be `document->'planets'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document->'elements' is correct. I've updated the code comments to be more clear.

.getArraySource()
.accept(new PostgresIdentifierExpressionVisitor(postgresQueryParser));

// If the field name is 'elements.inner', just pick the last part as the alias ('inner')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be an issue if there are two array expressions, both having the same last part, such as inner in planets.inner and elements.inner?

If so, creating an alias with the full path, like elements_inner, might be a helpful solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the alias is used within a smaller scope, so it shouldn't be an issue. But, yeah, it still makes sense to update it for clarity. Will do.

)
*/
switch (expression.getOperator()) {
case ANY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are more operators, we should move the code for each operator to its dedicated function for readability? For instance, moving code of any operator to getFilterExpressionForAnyOperator() here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Base automatically changed from postgres_filter_parser_refactoring to main January 16, 2024 07:24
@suresh-prakash suresh-prakash merged commit 1078778 into main Jan 16, 2024
6 of 7 checks passed
@suresh-prakash suresh-prakash deleted the postgres_array_filter_impl branch January 16, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants