Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MERGE does not set properties involving variables #1219

Closed
rafsun42 opened this issue Sep 8, 2023 · 17 comments
Closed

MERGE does not set properties involving variables #1219

rafsun42 opened this issue Sep 8, 2023 · 17 comments
Assignees
Labels
bug Something isn't working

Comments

@rafsun42
Copy link
Member

rafsun42 commented Sep 8, 2023

Note: This task is ONLY assigned to people in the 'refactor create\merge' project.

Describe the bug
If properties in the MERGE clause is set from a previous variable (i.e. .. MERGE ( {key: some_variable} ) ..), that particular key-value pair is not persisted.

How are you accessing AGE (Command line, driver, etc.)?
Command line

What data setup do we need to do?

...
SELECT * FROM cypher('xyz',
$$
CREATE (x:Label1{arr:[1,2,3,4]})
RETURN x 
$$) as (a agtype);
...

What is the necessary configuration info needed?
N/A

What is the command that caused the error?

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

Behavior
The key2: x.arr pair is not persisted in the output.

Output:

                                             a                                             
-------------------------------------------------------------------------------------------
 {"id": 1688849860263937, "label": "Label2", "properties": {"key1": 2, "key3": 3}}::vertex
(1 row)

Expected behavior
Expected output:

                                             a                                             
-------------------------------------------------------------------------------------------
 {"id": 1688849860263937, "label": "Label2", "properties": {"key1": 2, "key2": [1, 2, 3, 4], "key3": 3}}::vertex
(1 row)

Environment (please complete the following information):
PG15.4
Master branch
Ubuntu 22.04

Additional context
N/A

@rafsun42 rafsun42 added the bug Something isn't working label Sep 8, 2023
@ksheroz
Copy link

ksheroz commented Sep 8, 2023

@rafsun42 Assign it to me please

@thehaseebashraf
Copy link

Kindly assign the project to me. Thanks.

@Muhammad-Adil-Shahid-1054

@rafsun42 kindly assign the task to me

@CapnSpek
Copy link
Contributor

Please assign me this task.

@CapnSpek
Copy link
Contributor

CapnSpek commented Sep 18, 2023

Some observations:

  • The same query with CREATE instead of MERGE works just fine
SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
CREATE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);
  • The same query with MERGE twice also works just fine
SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
MERGE (z:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

Next, let us analyze the Query Plan Trees for MATCH-MERGE, MATCH-CREATE, and MATCH-MERGE-MERGE queries.

  • Query plan for MATCH-MERGE:
 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..86.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..86.00 rows=1 width=96)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('18201'::oid, x.id), x.
properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
               ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                     Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(7 rows)
  • Query plan for MATCH-CREATE:
 Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..31.03 rows=1 width=32)
         ->  Seq Scan on "Label1" x  (cost=0.00..31.02 rows=1 width=160)
               Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
(4 rows)
  • Query plan for MATCH-MERGE-MERGE:
 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..49.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..49.01 rows=1 width=160)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause_1.x, '"arr"'::agtype
]), 'key3'::text, '3'::agtype))
               ->  Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=64)
                     ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1  (cost=0.00..86.02 rows=1 width=64)
                           ->  Nested Loop Left Join  (cost=0.00..86.01 rows=1 width=96)
                                 Join Filter: (y.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('1820
1'::oid, x.id), x.properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
                                 ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                                       Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
                                 ->  Seq Scan on "Label2" y  (cost=0.00..28.00 rows=1200 width=64)
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(12 rows)

Cannot draw a conclusion

@ksheroz What do you think?

@CapnSpek
Copy link
Contributor

CapnSpek commented Sep 18, 2023

The same query also fails if instead of an array, we have any other datatype.

If the setup is changed to

SELECT * FROM cypher('xyz',
$$
CREATE (x:Label1{arr:5})
RETURN x 
$$) as (a agtype);

And the query to:

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:5})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

The return value is still
{"id": 1125899906842643, "label": "Label2", "properties": {"key1": 2, "key3": 3}}::vertex

@ksheroz
Copy link

ksheroz commented Sep 18, 2023

Some observations:

  • The same query with CREATE instead of MERGE works just fine
SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
CREATE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);
  • The same query with MERGE twice also works just fine
SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
MERGE (z:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

Next, let us analyze the Query Plan Trees for MATCH-MERGE, MATCH-CREATE, and MATCH-MERGE-MERGE queries.

  • Query plan for MATCH-MERGE:
 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..86.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..86.00 rows=1 width=96)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('18201'::oid, x.id), x.
properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
               ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                     Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(7 rows)
  • Query plan for MATCH-CREATE:
 Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..31.03 rows=1 width=32)
         ->  Seq Scan on "Label1" x  (cost=0.00..31.02 rows=1 width=160)
               Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
(4 rows)
  • Query plan for MATCH-MERGE-MERGE:
 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..49.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..49.01 rows=1 width=160)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause_1.x, '"arr"'::agtype
]), 'key3'::text, '3'::agtype))
               ->  Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=64)
                     ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1  (cost=0.00..86.02 rows=1 width=64)
                           ->  Nested Loop Left Join  (cost=0.00..86.01 rows=1 width=96)
                                 Join Filter: (y.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('1820
1'::oid, x.id), x.properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
                                 ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                                       Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
                                 ->  Seq Scan on "Label2" y  (cost=0.00..28.00 rows=1200 width=64)
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(12 rows)

Cannot draw a conclusion

@ksheroz What do you think?

It seems like that the Join Filter is the issue here. Any idea which function triggers it?

@jrgemignani
Copy link
Contributor

The query also succeeds in these cases -

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3}) RETURN x $$) as (a agtype);
                                            a
-----------------------------------------------------------------------------------------
 {"id": 844424930131971, "label": "Label1", "properties": {"arr": [1, 2, 3, 4]}}::vertex
(1 row)

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (u) return u$$) as (a agtype);
                                                        a
-----------------------------------------------------------------------------------------------------------------
 {"id": 844424930131971, "label": "Label1", "properties": {"arr": [1, 2, 3, 4]}}::vertex
 {"id": 1125899906842636, "label": "Label2", "properties": {"key1": 2, "key2": [1, 2, 3, 4], "key3": 3}}::vertex
(2 rows)

psql-15.4-5432-pgsql=#
psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3}) $$) as (a agtype);
 a
---
(0 rows)

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (u) return u$$) as (a agtype);
                                                        a
-----------------------------------------------------------------------------------------------------------------
 {"id": 844424930131972, "label": "Label1", "properties": {"arr": [1, 2, 3, 4]}}::vertex
 {"id": 1125899906842637, "label": "Label2", "properties": {"key1": 2, "key2": [1, 2, 3, 4], "key3": 3}}::vertex
(2 rows)

psql-15.4-5432-pgsql=#

@rafsun42
Copy link
Member Author

@ksheroz @CapnSpek

Query plans are usually useful. However, for this issue I would prefer using a debugger.

I think this file-src/backend/executor/cypher_merge.c may be worthwhile to step through with a debugger.

@jrgemignani
Copy link
Contributor

jrgemignani commented Sep 21, 2023

It appears that the merge code (including the expression evaluation) is working correctly with what it is given. It is just that sometimes it gets a NULL value instead of the variable x in specific cases. So, I don't think MERGE is the issue here. Something prior to MERGE is messing up x.

It also appears that the transform logic is working correctly, at least up to the RETURN transform for -

MATCH (x:Label1{arr:[1,2,3,4]}) MERGE (y:Label2{key2: id(x)}) RETURN y

It might be possible the planner is removing the variable x.

I compiled and tested PG11 1.1.0 and this issue exists there as well.

@jrgemignani
Copy link
Contributor

I traced it through the code and it appears that the variable from the MATCH is not getting to the MERGE when the variable isn't present in the RETURN list. Unless, there isn't a RETURN, then it works.

@CapnSpek
Copy link
Contributor

I am not very sure but can transform_cypher_merge function have something to do with it?

The comments mention this, and the function is also hit during debugging.

It is proving to be difficult to trace this bug because there seems to be no particular variable where I can check whether x is being passed or not, and no way to check what would be an SQL equivalent of the cypher query, specially when the query plan trees look fine.

Can anyone suggest a way to track the source of this bug?

\*
* 2. If there is a previous clause then the query will have two subqueries.
 * The first query will be for the previous clause that we recursively handle.
 * The second query will be for the path that this MERGE clause defines. The
 * two subqueries will be joined together using a LATERAL LEFT JOIN with the
 * previous query on the left and the MERGE path subquery on the right. Like
 * case 1 the targetList will have all the declared variables and a FuncExpr
 * that represents the MERGE clause with its needed metadata information, that
 * will be caught in the planner phase and converted into a path.
 *
 * This will allow us to be capable of handling the 2 cases that exist with a
 * MERGE clause correctly.
 *
 * Case 1: the path already exists. In this case we do not need to create
 * the path and MERGE will simply pass the tuple information up the execution
 * tree.
 *
 * Case 2: the path does not exist. In this case the LEFT part of the join
 * will not prevent the tuples from the previous clause from being emitted. We
 * can catch when this happens in the execution phase and create the missing
 * data, before passing up the execution tree.
 *
 * It should be noted that both cases can happen in the same query. If the
 * MERGE clause references a variable from a previous clause, it could be that
 * for one tuple the path exists (or there is multiple paths that exist and all
 * paths must be emitted) and for another the path does not exist. This is
 * similar to OPTIONAL MATCH, however with the added feature of creating the
 * path if not there, rather than just emitting NULL.
 */

@jrgemignani
Copy link
Contributor

I "think" the issue might be in transform_cypher_return.

It appears that if the variable is not listed in the RETURN clause, it gets set to NULL. Why? I'm not exactly sure, maybe there is an issue with scope. Because, if the same query, that didn't work properly, is used without the RETURN clause, it works fine.

I was able to manually set the TargetEntry->resjunk value for the unused variable to true to get the query to work. However, doing this programmatically in the RETURN clause, without breaking regression tests, is proving difficult.

@CapnSpek
Copy link
Contributor

CapnSpek commented Oct 30, 2023

Hi, I am not sure if this is the correct place to ask this question, but this is the only issue under create and merge refactor project.

I wanted to ask, for example we have function rescan_cypher_create in cypher_create.c and rescan_cypher_merge in cypher_merge.c and they basically do the same thing.

However, they are in different files.

So is it fine if I merge the functions into a single function, create a new file, include it in the project, and reference that function in both files instead?

Or should I put them in one of the files and reference from the other one?

But in any case, the function will no longer be static.

There are more functions from which common functionalities can be extracted and put in a common function in a new file.

That is going to decrease the overall size of the files.

How should I approach it?

@rafsun42

@rafsun42
Copy link
Member Author

@CapnSpek Discord would be the right place for this question since this is not related to the reported bug.

@jrgemignani
Copy link
Contributor

@rafsun42 PR #1441 was created to address this issue.

@rafsun42
Copy link
Member Author

@jrgemignani Thanks for the PR. The issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

6 participants