Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array filtering support (Part #3): Postgres relational filter refactoring #189

Merged
merged 28 commits into from
Jan 16, 2024

Conversation

suresh-prakash
Copy link
Contributor

@suresh-prakash suresh-prakash commented Dec 31, 2023

So far, we do not have support for filtering by LHS arrays.
The PRs in this series aim to support LHS arrays in filters.

An LHS array can contain either primitives or documents.
In the case of a primitive LHS array, we can apply a relational operator on each element.
In the case of a document LHS array, we can apply a generic filter (including relational expression or logical expression or even another array filter expression)

Sample data:

[
  {
    "key": 3,
    "planets": [
      {
        "name": "Planet 1",
        "elements": [
          "Oxygen",
          "Water",
          "Nitrogen"
        ]
      },
      {
        "name": "Planet 2",
        "elements": [
          "Oxygen",
          "Helium",
          "Water"
        ]
      }
    ]
  },
  {
    "key": 1,
    "planets": [
      {
        "name": "Mercury"
      },
      {
        "name": "Venus"
      },
      {
        "name": "Earth",
        "elements": [
          "Oxygen",
          "Nitrogen"
        ]
      },
      {
        "name": "Mars",
        "elements": [
          "Iron"
        ]
      },
      {
        "name": "Jupiter",
        "elements": []
      },
      {
        "name": "Saturn",
        "elements": []
      },
      {
        "name": "Uranus",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane"
        ]
      },
      {
        "name": "Neptune",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane",
          "Ammonia"
        ]
      }
    ]
  },
  {
    "key": 2,
    "planets": [
      {
        "name": "Mercury",
        "elements": [
          "Silicate",
          "Aluminum"
        ]
      },
      {
        "name": "Venus",
        "elements": [
          "Carbon Dioxide",
          "Sulfuric Acid"
        ]
      },
      {
        "name": "Earth",
        "elements": [
          "Oxygen",
          "Nitrogen",
          "Water"
        ]
      },
      {
        "name": "Mars",
        "elements": [
          "Iron",
          "Silicate"
        ]
      },
      {
        "name": "Jupiter",
        "elements": [
          "Hydrogen",
          "Helium"
        ]
      },
      {
        "name": "Saturn",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane"
        ]
      },
      {
        "name": "Uranus",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane",
          "Ammonia"
        ]
      },
      {
        "name": "Neptune",
        "elements": [
          "Hydrogen",
          "Helium",
          "Methane",
          "Ammonia",
          "Methane Hydrate"
        ]
      }
    ]
  }
]

Input query: Get a list of solar systems with at least one planet not having either oxygen or water.

SELECT * 
FROM solar_systems 
WHERE ANY(planets) [NOT ((ANY(elements) = "Oxygen") OR (ANY(elements) = "Water"))]

The filter API would look like

FilterTypeExpression filter = DocumentArrayFilterExpression.builder()
    .operator(ANY)
    .arraySource(IdentifierExpression.of("planets"))
    .filter(
       LogicalExpression.builder()
           .operator(NOT)
           .operand(
                 LogicalExpression.builder()
                       .operator(OR)
                       .operand(
                             ArrayRelationalFilterExpression.builder()
                                  .operator(ANY)
                                  .filter(
                                       RelationalExpression.of(
                                            IdentifierExpression.of("elements"),
                                            EQ,
                                            ConstantExpression.of("Oxygen"))))
                        .operand(
                             ArrayRelationalFilterExpression.builder()
                                  .operator(ANY)
                                  .filter(
                                       RelationalExpression.of(
                                            IdentifierExpression.of("elements"),
                                            EQ,
                                            ConstantExpression.of("Water"))))))
    .build();

Sample MongoDB match stage query to be generated

{
  "$expr":
  {
    "$anyElementTrue":
    {
      "$map":
      {
        "input":
        {
          "$ifNull": [
            "$planets",
            []
          ]
        },
        "as": "planet",
        "in":
        {
          "$not":
          {
            "$or": [
              {
                "$anyElementTrue":
                {
                  "$map":
                  {
                    "input":
                    {
                      "$ifNull": [
                        "$$planet.elements",
                        []
                      ]
                    },
                    "as": "element",
                    "in":
                    {
                      "$eq": ["$$element","Oxygen"]
                    }
                  }
                }
              },
              {
                "$anyElementTrue":
                {
                  "$map":
                  {
                    "input":
                    {
                      "$ifNull": [
                        "$$planet.elements",
                        []
                      ]
                    },
                    "as": "element",
                    "in":
                    {
                      "$eq": ["$$element","Water"]
                    }
                  }
                }
              }
            ]
          }
        }
      }
    }
  }
}

Sample Postgres Query:

SELECT *  
FROM solar_systems  
WHERE 
EXISTS 
(SELECT 1 
 FROM  jsonb_array_elements(COALESCE(str->'planets', '[]'::jsonb)) AS planet 
 WHERE 
 NOT 
 (
 EXISTS 
 (SELECT 1 
  FROM jsonb_array_elements(COALESCE(planet->'elements', '[]'::jsonb)) AS elements 
  WHERE TRIM('"' FROM elements::text) = 'Oxygen'
 )
 OR 
 EXISTS 
 (SELECT 1 
  FROM jsonb_array_elements(COALESCE(planet->'elements', '[]'::jsonb)) AS elements 
  WHERE TRIM('"' FROM elements::text) = 'Water'
 )
 )
);

Notes:

  1. Only ANY operator is supported for now
  2. A future PR would introduce a NOT logical operator
  3. ALL can be achieved by using NOT twice to the ANY filter (before and after). E.g.: Fetch solar systems with all planets containing both oxygen and water is equivalent to Fetch solar systems with no planet without both oxygen and water
  4. Similarly NONE can be achieved by using NOT once before ANY, because NONE = NOT ANY
  5. The examples given in this description are for illustrative purposes only. For practical applications, the queries would be simpler (E.g.: Use IN instead of OR and =, static import not and or logical operators, etc.)

This PR contains

  • Refactoring of Postgres filter parsers

Previous PRs
#173
#188

@suresh-prakash suresh-prakash changed the title Postgres relational filter refactoring Array filtering support (Part #3): Postgres relational filter refactoring Dec 31, 2023
Copy link

codecov bot commented Dec 31, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (614e082) 80.48% compared to head (19cbb1a) 79.57%.

Files Patch % Lines
...filter/PostgresContainsRelationalFilterParser.java 61.90% 6 Missing and 2 partials ⚠️
...ser/filter/PostgresLikeRelationalFilterParser.java 16.66% 5 Missing ⚠️
...r/filter/PostgresExistsRelationalFilterParser.java 50.00% 1 Missing and 2 partials ⚠️
...ilter/PostgresNotExistsRelationalFilterParser.java 66.66% 1 Missing and 1 partial ⚠️
.../parser/filter/PostgresRelationalFilterParser.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #189      +/-   ##
============================================
- Coverage     80.48%   79.57%   -0.91%     
- Complexity      937      956      +19     
============================================
  Files           169      182      +13     
  Lines          4438     4539     +101     
  Branches        368      373       +5     
============================================
+ Hits           3572     3612      +40     
- Misses          615      669      +54     
- Partials        251      258       +7     
Flag Coverage Δ
integration 79.57% <82.75%> (-0.91%) ⬇️
unit 57.76% <69.82%> (-0.33%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Dec 31, 2023

Test Results

 38 files  ±0   38 suites  ±0   31s ⏱️ -1s
228 tests ±0  228 ✅ ±0  0 💤 ±0  0 ❌ ±0 
474 runs  ±0  474 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 19cbb1a. ± Comparison against base commit 614e082.

♻️ This comment has been updated with latest results.

Base automatically changed from array_filter_mongo_impl to main January 12, 2024 09:11
import org.hypertrace.core.documentstore.postgres.query.v1.vistors.PostgresSelectTypeExpressionVisitor;

public interface PostgresSelectExpressionParserBuilder {
PostgresSelectTypeExpressionVisitor buildFor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Change buildFor to build (Is there a specific reason for choosing buildFor)?

  2. Is it necessary for PostgresQueryParser to be included as part of the interface? Could it instead be a member of the implemented builder object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sure. (No specific reason)
  2. Yes, can be. Will update.


@Override
public PostgresRelationalFilterParser parser(
final RelationalExpression expression, final PostgresRelationalFilterContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not see the usage of PostgresRelationalFilterContext. Is it necessary to include it as part of the interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required. Will update.


public interface PostgresRelationalFilterParserFactory {
PostgresRelationalFilterParser parser(
final RelationalExpression expression, final PostgresRelationalFilterContext context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if context is not required, we can remove it from an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, yeah.

import org.hypertrace.core.documentstore.postgres.query.v1.vistors.PostgresSelectTypeExpressionVisitor;

public interface PostgresRelationalFilterParser {
String parse(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will every parse request have a different context? If not, shall we consider moving it as part of the Parser constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will every parse request have a different context?

Yes. That's the idea in the continuation PR @ #191

value -> {
paramsBuilder.addObjectParam(value).addObjectParam(value);
return String.format(
"((jsonb_typeof(to_jsonb(%s)) = 'array' AND to_jsonb(%s) @> jsonb_build_array(?)) OR (jsonb_build_array(%s) @> jsonb_build_array(?)))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What did the previous IN request look like?
  • Won't this result in N OR requests for N RHS values? For each of them, it will apply the json_build_array function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the definition of IN is very loose when the LHS is an array.
MongoDB treats, IN as a non-empty intersection between the LHS array and the RHS array. The Postgres behaviour was inconsistent with this definition breaking the parity. This PR https://github.com/hypertrace/document-store/pull/186/files addressed that by considering non-empty intersection between the LHS array and the RHS array for IN for Postgres also. So, in this refactoring PR, I didn't change the behavior.

@suresh-prakash suresh-prakash merged commit 88a0440 into main Jan 16, 2024
6 of 7 checks passed
@suresh-prakash suresh-prakash deleted the postgres_filter_parser_refactoring branch January 16, 2024 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants