MERGE does not set properties involving variables #1219

rafsun42 · 2023-09-08T18:19:31Z

Note: This task is ONLY assigned to people in the 'refactor create\merge' project.

Describe the bug
If properties in the MERGE clause is set from a previous variable (i.e. .. MERGE ( {key: some_variable} ) ..), that particular key-value pair is not persisted.

How are you accessing AGE (Command line, driver, etc.)?
Command line

What data setup do we need to do?

...
SELECT * FROM cypher('xyz',
$$
CREATE (x:Label1{arr:[1,2,3,4]})
RETURN x 
$$) as (a agtype);
...

What is the necessary configuration info needed?
N/A

What is the command that caused the error?

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

Behavior
The key2: x.arr pair is not persisted in the output.

Output:

                                             a                                             
-------------------------------------------------------------------------------------------
 {"id": 1688849860263937, "label": "Label2", "properties": {"key1": 2, "key3": 3}}::vertex
(1 row)

Expected behavior
Expected output:

                                             a                                             
-------------------------------------------------------------------------------------------
 {"id": 1688849860263937, "label": "Label2", "properties": {"key1": 2, "key2": [1, 2, 3, 4], "key3": 3}}::vertex
(1 row)

Environment (please complete the following information):
PG15.4
Master branch
Ubuntu 22.04

Additional context
N/A

The text was updated successfully, but these errors were encountered:

ksheroz · 2023-09-08T18:38:15Z

@rafsun42 Assign it to me please

thehaseebashraf · 2023-09-11T16:23:11Z

Kindly assign the project to me. Thanks.

Muhammad-Adil-Shahid-1054 · 2023-09-11T16:26:11Z

@rafsun42 kindly assign the task to me

CapnSpek · 2023-09-13T23:30:43Z

Please assign me this task.

CapnSpek · 2023-09-18T11:26:44Z

Some observations:

The same query with CREATE instead of MERGE works just fine

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
CREATE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

The same query with MERGE twice also works just fine

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
MERGE (z:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

Next, let us analyze the Query Plan Trees for MATCH-MERGE, MATCH-CREATE, and MATCH-MERGE-MERGE queries.

Query plan for MATCH-MERGE:

 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..86.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..86.00 rows=1 width=96)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('18201'::oid, x.id), x.
properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
               ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                     Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(7 rows)

Query plan for MATCH-CREATE:

 Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..31.03 rows=1 width=32)
         ->  Seq Scan on "Label1" x  (cost=0.00..31.02 rows=1 width=160)
               Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
(4 rows)

Query plan for MATCH-MERGE-MERGE:

 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..49.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..49.01 rows=1 width=160)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause_1.x, '"arr"'::agtype
]), 'key3'::text, '3'::agtype))
               ->  Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=64)
                     ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1  (cost=0.00..86.02 rows=1 width=64)
                           ->  Nested Loop Left Join  (cost=0.00..86.01 rows=1 width=96)
                                 Join Filter: (y.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('1820
1'::oid, x.id), x.properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
                                 ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                                       Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
                                 ->  Seq Scan on "Label2" y  (cost=0.00..28.00 rows=1200 width=64)
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(12 rows)

Cannot draw a conclusion

@ksheroz What do you think?

CapnSpek · 2023-09-18T11:57:21Z

The same query also fails if instead of an array, we have any other datatype.

If the setup is changed to

SELECT * FROM cypher('xyz',
$$
CREATE (x:Label1{arr:5})
RETURN x 
$$) as (a agtype);

And the query to:

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:5})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

The return value is still
{"id": 1125899906842643, "label": "Label2", "properties": {"key1": 2, "key3": 3}}::vertex

ksheroz · 2023-09-18T17:15:55Z

Some observations:

The same query with CREATE instead of MERGE works just fine

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
CREATE (y:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

The same query with MERGE twice also works just fine

SELECT * FROM cypher('xyz', 
$$
MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3})
MERGE (z:Label2{key1:2, key2:x.arr, key3:3})
RETURN y
$$) as (a agtype);

Next, let us analyze the Query Plan Trees for MATCH-MERGE, MATCH-CREATE, and MATCH-MERGE-MERGE queries.

Query plan for MATCH-MERGE:

 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..86.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..86.00 rows=1 width=96)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('18201'::oid, x.id), x.
properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
               ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                     Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(7 rows)

Query plan for MATCH-CREATE:

 Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..31.03 rows=1 width=32)
         ->  Seq Scan on "Label1" x  (cost=0.00..31.02 rows=1 width=160)
               Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
(4 rows)

Query plan for MATCH-MERGE-MERGE:

 Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=32)
   ->  Subquery Scan on _age_default_alias_previous_cypher_clause  (cost=0.00..49.02 rows=1 width=32)
         ->  Nested Loop Left Join  (cost=0.00..49.01 rows=1 width=160)
               Join Filter: (z.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_age_default_alias_previous_cypher_clause_1.x, '"arr"'::agtype
]), 'key3'::text, '3'::agtype))
               ->  Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=64)
                     ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1  (cost=0.00..86.02 rows=1 width=64)
                           ->  Nested Loop Left Join  (cost=0.00..86.01 rows=1 width=96)
                                 Join Filter: (y.properties @> agtype_build_map('key1'::text, '2'::agtype, 'key2'::text, agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(x.id, _label_name('1820
1'::oid, x.id), x.properties), '"arr"'::agtype]), 'key3'::text, '3'::agtype))
                                 ->  Seq Scan on "Label1" x  (cost=0.00..31.00 rows=1 width=40)
                                       Filter: (properties @> agtype_build_map('arr'::text, agtype_build_list('1'::agtype, '2'::agtype, '3'::agtype, '4'::agtype)))
                                 ->  Seq Scan on "Label2" y  (cost=0.00..28.00 rows=1200 width=64)
               ->  Seq Scan on "Label2" z  (cost=0.00..28.00 rows=1200 width=64)
(12 rows)

Cannot draw a conclusion

@ksheroz What do you think?

It seems like that the Join Filter is the issue here. Any idea which function triggers it?

jrgemignani · 2023-09-18T21:31:36Z

The query also succeeds in these cases -

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3}) RETURN x $$) as (a agtype);
                                            a
-----------------------------------------------------------------------------------------
 {"id": 844424930131971, "label": "Label1", "properties": {"arr": [1, 2, 3, 4]}}::vertex
(1 row)

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (u) return u$$) as (a agtype);
                                                        a
-----------------------------------------------------------------------------------------------------------------
 {"id": 844424930131971, "label": "Label1", "properties": {"arr": [1, 2, 3, 4]}}::vertex
 {"id": 1125899906842636, "label": "Label2", "properties": {"key1": 2, "key2": [1, 2, 3, 4], "key3": 3}}::vertex
(2 rows)

psql-15.4-5432-pgsql=#

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (x:Label1{arr:[1,2,3,4]})
MERGE (y:Label2{key1:2, key2:x.arr, key3:3}) $$) as (a agtype);
 a
---
(0 rows)

psql-15.4-5432-pgsql=# SELECT * FROM cypher('xyz', $$ MATCH (u) return u$$) as (a agtype);
                                                        a
-----------------------------------------------------------------------------------------------------------------
 {"id": 844424930131972, "label": "Label1", "properties": {"arr": [1, 2, 3, 4]}}::vertex
 {"id": 1125899906842637, "label": "Label2", "properties": {"key1": 2, "key2": [1, 2, 3, 4], "key3": 3}}::vertex
(2 rows)

psql-15.4-5432-pgsql=#

rafsun42 · 2023-09-18T23:35:35Z

@ksheroz @CapnSpek

Query plans are usually useful. However, for this issue I would prefer using a debugger.

I think this file-src/backend/executor/cypher_merge.c may be worthwhile to step through with a debugger.

jrgemignani · 2023-09-21T01:00:39Z

It appears that the merge code (including the expression evaluation) is working correctly with what it is given. It is just that sometimes it gets a NULL value instead of the variable x in specific cases. So, I don't think MERGE is the issue here. Something prior to MERGE is messing up x.

It also appears that the transform logic is working correctly, at least up to the RETURN transform for -

MATCH (x:Label1{arr:[1,2,3,4]}) MERGE (y:Label2{key2: id(x)}) RETURN y

It might be possible the planner is removing the variable x.

I compiled and tested PG11 1.1.0 and this issue exists there as well.

jrgemignani · 2023-09-22T00:58:08Z

I traced it through the code and it appears that the variable from the MATCH is not getting to the MERGE when the variable isn't present in the RETURN list. Unless, there isn't a RETURN, then it works.

CapnSpek · 2023-09-28T13:04:38Z

I am not very sure but can transform_cypher_merge function have something to do with it?

The comments mention this, and the function is also hit during debugging.

It is proving to be difficult to trace this bug because there seems to be no particular variable where I can check whether x is being passed or not, and no way to check what would be an SQL equivalent of the cypher query, specially when the query plan trees look fine.

Can anyone suggest a way to track the source of this bug?

\*
* 2. If there is a previous clause then the query will have two subqueries.
 * The first query will be for the previous clause that we recursively handle.
 * The second query will be for the path that this MERGE clause defines. The
 * two subqueries will be joined together using a LATERAL LEFT JOIN with the
 * previous query on the left and the MERGE path subquery on the right. Like
 * case 1 the targetList will have all the declared variables and a FuncExpr
 * that represents the MERGE clause with its needed metadata information, that
 * will be caught in the planner phase and converted into a path.
 *
 * This will allow us to be capable of handling the 2 cases that exist with a
 * MERGE clause correctly.
 *
 * Case 1: the path already exists. In this case we do not need to create
 * the path and MERGE will simply pass the tuple information up the execution
 * tree.
 *
 * Case 2: the path does not exist. In this case the LEFT part of the join
 * will not prevent the tuples from the previous clause from being emitted. We
 * can catch when this happens in the execution phase and create the missing
 * data, before passing up the execution tree.
 *
 * It should be noted that both cases can happen in the same query. If the
 * MERGE clause references a variable from a previous clause, it could be that
 * for one tuple the path exists (or there is multiple paths that exist and all
 * paths must be emitted) and for another the path does not exist. This is
 * similar to OPTIONAL MATCH, however with the added feature of creating the
 * path if not there, rather than just emitting NULL.
 */

jrgemignani · 2023-09-28T15:08:44Z

I "think" the issue might be in transform_cypher_return.

It appears that if the variable is not listed in the RETURN clause, it gets set to NULL. Why? I'm not exactly sure, maybe there is an issue with scope. Because, if the same query, that didn't work properly, is used without the RETURN clause, it works fine.

I was able to manually set the TargetEntry->resjunk value for the unused variable to true to get the query to work. However, doing this programmatically in the RETURN clause, without breaking regression tests, is proving difficult.

CapnSpek · 2023-10-30T15:31:20Z

Hi, I am not sure if this is the correct place to ask this question, but this is the only issue under create and merge refactor project.

I wanted to ask, for example we have function rescan_cypher_create in cypher_create.c and rescan_cypher_merge in cypher_merge.c and they basically do the same thing.

However, they are in different files.

So is it fine if I merge the functions into a single function, create a new file, include it in the project, and reference that function in both files instead?

Or should I put them in one of the files and reference from the other one?

But in any case, the function will no longer be static.

There are more functions from which common functionalities can be extracted and put in a common function in a new file.

That is going to decrease the overall size of the files.

How should I approach it?

@rafsun42

rafsun42 · 2023-10-30T22:33:42Z

@CapnSpek Discord would be the right place for this question since this is not related to the reported bug.

jrgemignani · 2023-12-08T21:07:14Z

@rafsun42 PR #1441 was created to address this issue.

rafsun42 · 2023-12-11T16:38:56Z

@jrgemignani Thanks for the PR. The issue is resolved.

rafsun42 added the bug Something isn't working label Sep 8, 2023

rafsun42 assigned ksheroz Sep 8, 2023

rafsun42 assigned Muhammad-Adil-Shahid-1054 and thehaseebashraf Sep 11, 2023

rafsun42 closed this as completed Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MERGE does not set properties involving variables #1219

MERGE does not set properties involving variables #1219

rafsun42 commented Sep 8, 2023

ksheroz commented Sep 8, 2023

thehaseebashraf commented Sep 11, 2023

Muhammad-Adil-Shahid-1054 commented Sep 11, 2023

CapnSpek commented Sep 13, 2023

CapnSpek commented Sep 18, 2023 •

edited

Loading

CapnSpek commented Sep 18, 2023 •

edited

Loading

ksheroz commented Sep 18, 2023

jrgemignani commented Sep 18, 2023

rafsun42 commented Sep 18, 2023

jrgemignani commented Sep 21, 2023 •

edited

Loading

jrgemignani commented Sep 22, 2023

CapnSpek commented Sep 28, 2023

jrgemignani commented Sep 28, 2023

CapnSpek commented Oct 30, 2023 •

edited

Loading

rafsun42 commented Oct 30, 2023

jrgemignani commented Dec 8, 2023

rafsun42 commented Dec 11, 2023

MERGE does not set properties involving variables #1219

MERGE does not set properties involving variables #1219

Comments

rafsun42 commented Sep 8, 2023

ksheroz commented Sep 8, 2023

thehaseebashraf commented Sep 11, 2023

Muhammad-Adil-Shahid-1054 commented Sep 11, 2023

CapnSpek commented Sep 13, 2023

CapnSpek commented Sep 18, 2023 • edited Loading

CapnSpek commented Sep 18, 2023 • edited Loading

ksheroz commented Sep 18, 2023

jrgemignani commented Sep 18, 2023

rafsun42 commented Sep 18, 2023

jrgemignani commented Sep 21, 2023 • edited Loading

jrgemignani commented Sep 22, 2023

CapnSpek commented Sep 28, 2023

jrgemignani commented Sep 28, 2023

CapnSpek commented Oct 30, 2023 • edited Loading

rafsun42 commented Oct 30, 2023

jrgemignani commented Dec 8, 2023

rafsun42 commented Dec 11, 2023

CapnSpek commented Sep 18, 2023 •

edited

Loading

CapnSpek commented Sep 18, 2023 •

edited

Loading

jrgemignani commented Sep 21, 2023 •

edited

Loading

CapnSpek commented Oct 30, 2023 •

edited

Loading