Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make OPTIONAL MATCH and WHERE interaction easier to understand #191

Open
Mats-SX opened this issue Feb 27, 2017 · 11 comments
Open

Make OPTIONAL MATCH and WHERE interaction easier to understand #191

Mats-SX opened this issue Feb 27, 2017 · 11 comments
Labels

Comments

@Mats-SX
Copy link
Member

Mats-SX commented Feb 27, 2017

CIR-2017-191

The fact that WHERE does not filter rows when part of an OPTIONAL MATCH has confused many users. Semantically, OPTIONAL MATCH behaves like an outer join, and will only ever be additive (>= 0) to the cardinality of the preceding query. Consider the below example query, and note the comment:

MATCH (a)
OPTIONAL MATCH (a)-->(b)
WHERE false // does not filter, only makes b null
RETURN b

In contrast, moving the WHERE subclause to be attached to the MATCH clause, the query will return an empty result:

MATCH (a)
WHERE false // filters everything
OPTIONAL MATCH (a)-->(b)
RETURN b

It has not been clear to users that WHERE is intimately attached to its preceding clause, and that it only affects the cardinality of that clause, rather than the whole query (although this difference does not exist in the cases of WITH and MATCH). This CIR requests some way of making this more clear to the user.

Idea 1

Prohibit WHERE subclauses that do not depend on variables bound by its superclause. This would make the example queries in the description of this CIR invalid.

This idea could be 'soft', and allow predicate expressions that partially depend on the bound variables:

MATCH (a)
WHERE a.prop AND false
...

Or it could be 'strict', and only allow predicate expressions without constant parts.

Idea 2

Replace the OPTIONAL MATCH clause with a subquery clause; OPTIONAL. The first query from the description would be expressed like so:

MATCH (a)
OPTIONAL { // explicitly states the semantics of its sub-part
    MATCH (a)-->(b)
    WHERE false // normal filtering
    RETURN b
}
RETURN b

Ideas 1 and 2 are not mutually exclusive.

@Mats-SX Mats-SX added the CIR label Feb 27, 2017
@petraselmer
Copy link

+1 on this CIR. Although Idea 1 has the advantage of being far less drastic in terms of existing usage, I must say the synergy of Idea 2 and subqueries is very compelling indeed...

@InverseFalcon
Copy link

+1 for Idea 2, I like the clarity this provides.

@JanekPo
Copy link

JanekPo commented Mar 3, 2017

I would add Idea 0: "Take no action" - as the semantic of WHERE tied to OPTIONAL MATCH is
strict (and analogous to SQL outer join). Furthermore, everything might change after adding Regular Path Queries to Cypher (#179)

@szarnyasg
Copy link

szarnyasg commented Oct 2, 2018

I stumbled upon this problem recently and just found this issue. I ran a couple of experiments to better understand the semantics of OPTIONAL MATCH.

I used the following toy graph:

CREATE
  (:X {val: 1})-[:E1]->(:Y {val: 2})-[:E2]->(:Z {val: 3}),
  (:X {val: 4})-[:E1]->(:Y {val: 5}),
  (:X {val: 6})

Matching on a single edge is fairly simple:

MATCH (x:X)
OPTIONAL MATCH (x)-[:E1]->(y:Y)
WHERE x.val < y.val
RETURN x, y
╒═════════╤═════════╕
│"x"      │"y"      │
╞═════════╪═════════╡
│{"val":1}│{"val":2}│
├─────────┼─────────┤
│{"val":4}│{"val":5}│
├─────────┼─────────┤
│{"val":6}│null     │
└─────────┴─────────┘

plan 10

Things get more complicated with two edges, especially for the query plan, which contains an Apply and an Optional operator.

MATCH (x:X)
OPTIONAL MATCH (x)-[:E1]->(y:Y)-[:E2]->(z:Z)
WHERE x.val < z.val
RETURN x, y, z
╒═════════╤═════════╤═════════╕
│"x"      │"y"      │"z"      │
╞═════════╪═════════╪═════════╡
│{"val":1}│{"val":2}│{"val":3}│
├─────────┼─────────┼─────────┤
│{"val":4}│null     │null     │
├─────────┼─────────┼─────────┤
│{"val":6}│null     │null     │
└─────────┴─────────┴─────────┘

plan 11

Splitting the OPTIONAL MATCH into two clauses results in different semantics:

MATCH (x:X)
OPTIONAL MATCH (x)-[:E1]->(y:Y)
OPTIONAL MATCH (y)-[:E2]->(z:Z)
WHERE x.val < z.val
RETURN x, y, z
╒═════════╤═════════╤═════════╕
│"x"      │"y"      │"z"      │
╞═════════╪═════════╪═════════╡
│{"val":1}│{"val":2}│{"val":3}│
├─────────┼─────────┼─────────┤
│{"val":4}│{"val":5}│null     │
├─────────┼─────────┼─────────┤
│{"val":6}│null     │null     │
└─────────┴─────────┴─────────┘

plan 7

Note that the TCK currently does not fully cover this issue. The only that have two edges in an OPTIONAL MATCH are the following - note that none have a WHERE clause:

$ ag 'OPTIONAL MATCH.*\(.*\(.*\(' -C 2
MatchAcceptance2.feature
316-      """
317-      MATCH (a {name: 'A'})
318:      OPTIONAL MATCH (a)-[:KNOWS]->()-[:KNOWS]->(foo)
319-      RETURN foo
320-      """
--
354-      """
355-      MATCH (a {name: 'A'})
356:      OPTIONAL MATCH p = (a)-->(b)-[*]->(c)
357-      RETURN p
358-      """

OptionalMatchAcceptance.feature
258-      """
259-      MATCH (a:Single), (c:C)
260:      OPTIONAL MATCH (a)-->(b)-->(c)
261-      RETURN b
262-      """
--
270-      """
271-      MATCH (a:A), (c:C)
272:      OPTIONAL MATCH (a)-->(b)-->(c)
273-      RETURN b
274-      """

We are happy to write some additional TCK tests if you agree that they'd be useful. The queries above might provide a good starting point for that.

Update: I checked the SIGMOD 2018 Cypher paper on the matter:

image

This, as expected, captures the semantics of the language correctly, but is rather difficult (but not impossible) to translate to standard relational algebra.

@petraselmer
Copy link

@szarnyasg Thanks for raising this, and please do write the additional TCK tests for this (much appreciated).

@Mats-SX
Copy link
Member Author

Mats-SX commented Oct 8, 2018

@szarnyasg Interesting analysis, and yes we would of course be interested in such a TCK contribution (as Petra already mentions).

You write that the TCK 'does not fully cover this issue' -- I wonder to which issue you are referring? The general issue to which this CIR is dedicated (OPT MATCH + WHERE giving allegedly surprising behaviour), or another issue in particular? Are you perhaps looking for being able to express predicates covering multiple OPTIONAL MATCH clauses, such as in your last query?

@szarnyasg
Copy link

@Mats-SX I was referring to the OPTIONAL MATCH + WHERE constellation (which is the focus of this CIR). This code search reveals that there are a few tests for multiple OPTIONAL MATCH clauses:

$ ag 'OPTIONAL MATCH.*\n.*OPTIONAL MATCH' -C 2
MatchAcceptance2.feature
1646-      """
1647-      MATCH (a:A)
1648:      OPTIONAL MATCH (a)-[:FOO]->(b:B)
1649:      OPTIONAL MATCH (b)<-[:BAR*]-(c:B)
1650-      RETURN a, b, c
1651-      """

OptionalMatchAcceptance.feature
97-      """
98-      MATCH (a:Single)
99:      OPTIONAL MATCH (a)-->(b:NonExistent)
100:      OPTIONAL MATCH (a)-->(c:NonExistent)
101-      WITH coalesce(b, c) AS x
102-      MATCH (x)-->(d)
--
282-      """
283-      MATCH (a:A), (b:B)
284:      OPTIONAL MATCH (a)-->(x)
285:      OPTIONAL MATCH (x)-[r]->(b)
286-      RETURN x, r
287-      """
--
309-    When executing query:
310-      """
311:      OPTIONAL MATCH (a:NotThere)
312:      OPTIONAL MATCH (b:NotThere)
313-      WITH a, b
314-      OPTIONAL MATCH (b)-[r:NOR_THIS]->(a)
--
329-    When executing query:
330-      """
331:      OPTIONAL MATCH (f:DoesExist)
332:      OPTIONAL MATCH (n:DoesNotExist)
333-      RETURN collect(DISTINCT n.property) AS a, collect(DISTINCT f.property) AS b
334-      """

@Mats-SX
Copy link
Member Author

Mats-SX commented Oct 8, 2018

So the "issue" is the low level of TCK coverage for the feature? Then I understand.

@szarnyasg
Copy link

I think our "issue" was that OPTIONAL MATCH + WHERE is unintuitive at first sight to many users, and the TCK does not cover some important sub-cases (such as OPTIONAL MATCH on two edges + WHERE). In the absence of these tests, implementations might get this wrong - which is exactly what happened to us when working on similar queries in ingraph.

@JanekPo
Copy link

JanekPo commented Oct 15, 2018

I have checked queries you had posted - the results are expectedly accordant with Cypher's semantic,
because of match clause definitions corresponding to 1-4 formulas that you had pasted:

eval_clause(Match,Graph,BindingTable,MatchBindingTable,Graph) :-
Match = match(MatchModifier,Pattern,Where),
findall(MatchBindingRow,
        (
         member(BindingRow,BindingTable),
         eval_match_(MatchModifier,Graph,Pattern,Where,BindingRow,MatchBindingRow)
        ),
        MatchBindingTable).

eval_match_(no_modifier,Graph,Pattern,Where,BindingRow,MatchBindingRow)
:-
eval_match(BindingRow,Graph,Pattern,MatchBindingRow),
eval_where(Graph,Where,MatchBindingRow).

eval_match_(optional,Graph,Pattern,Where,BindingRow,MatchBindingRow)
:-
findall(MatchBindingRow_,
(
eval_match(BindingRow,Graph,Pattern,MatchBindingRow_),
eval_where(Graph,Where,MatchBindingRow_)
),
MatchBindingRows_),
MatchBindingRows_ = [_|_],
!,
member(MatchBindingRow,MatchBindingRows_).


eval_match_(optional,_Graph,Pattern,_Where,BindingRow,MatchBindingRow)
:-
get_names(Pattern,Names),
list_to_set(Names,NamesSet),
maplist([Name,bind(Name,cypher_null)]>>true,NamesSet,Binds),
extend_binding_row(BindingRow,Binds,MatchBindingRow).

If you wish you can verify your respective query results in Cypher.PL
Please be aware that - due to ambiguity handling - Cypher.PL console is currently running in stateless mode, it means that graph state is not preserved between query execution.
Thus you have to create graph state in the same query e.g.

CREATE //graph state creation
  (:X {val: 1})-[:E1]->(:Y {val: 2})-[:E2]->(:Z {val: 3}),
  (:X {val: 4})-[:E1]->(:Y {val: 5}),
  (:X {val: 6})
with count(*) as unused_variable_
MATCH (x:X) //tested query
OPTIONAL MATCH (x)-[:E1]->(y:Y)
WHERE x.val < y.val
RETURN x, y

Cypher.PL console is available at:
ssh [email protected] #password: cypherpl
In case of any incompatibilities with your results please let me know.

@szarnyasg
Copy link

szarnyasg commented Oct 21, 2018

Thanks, @JanekPo for investigating this with your implementation. Neo4j returns the correct results indeed, but it is worth trying to make the interaction of OPTIONAL MATCH and WHERE more intuitive (and thus making the language more accessible). Simultaneously, the TCK should document such complex interactions of language constructs, which I tried to address with #333.

Mats-SX added a commit that referenced this issue Oct 23, 2018
Add tests for interactions of OPTIONAL MATCH and WHERE #191
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants