Differences from agensgraph in using the exists syntax #71

kysmou · 2021-05-18T23:46:37Z

kysmou
May 18, 2021

In the pattern query using the exists syntax,
I found the difference between the syntax used in agensgraph and age.
These are different queries that get the same result.

When executing example 1, query execution time is very slow in agensgraph.
This is especially true if the number of data is large.

However, in age, the syntax of example 1 is supported, and an error occurs in the syntax of example 2.

example 1)
-- normal case ( 'b' objects added in match)
SELECT *
FROM cypher('test_graph', $$
MATCH (n:Person{name: 'Andres2'}), (b)
WHERE exists((n)-[:acted_in]->(b))
RETURN n
$$) as (name agtype);

example 2)
-- error case using pattern with exists syntax in age
-- It is normal in agensgraph;
SELECT *
FROM cypher('test_graph', $$
MATCH (n:Person{name: 'Andres2'})
WHERE exists((n)-[:acted_in]->(b))
RETURN n
$$) as (name agtype);
ERROR: variable 'b' does not exist
LINE 4: WHERE exists((n)-[:acted_in]->(b))

Up to this point, it can be understood as the difference in the syntax of the two products,
but when you look at the query plan blow, there are points that need to be reviewed.

query plan at example 1 : syntax in age)
[expected issue]
While referring to the 'b' object in the match clause,
a seq scan may occur for all objects in the graph path, resulting in performance degradation.

-------------------------------- query plan ----------------------------------------------------------------------------------
Nested Loop (cost=0.00..176882.11 rows=10803 width=32)
Join Filter: (SubPlan 1)
-> Append (cost=0.00..66.00 rows=3601 width=8)
-> Seq Scan on ag_vertex b (cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on address b_1 (cost=0.00..22.00 rows=1200 width=8)
-> Seq Scan on person b_2 (cost=0.00..22.00 rows=1200 width=8)
-> Seq Scan on movie b_3 (cost=0.00..22.00 rows=1200 width=8)
-> Materialize (cost=0.00..25.03 rows=6 width=46)
-> Seq Scan on person n (cost=0.00..25.00 rows=6 width=46)
Filter: (properties.'name'::text = '"Andres"'::jsonb)
SubPlan 1
-> Index Only Scan using acted_in_end_idx on acted_in "<0000000003>" (cost=0.15..8.17 rows=1 width=0)
Index Cond: (("end" = b.id) AND (start = n.id))
(13 rows)

example 2 : syntax in agensgraph)

The query plan of example 2 is more efficient and the execution time is searched faster.
Since the object 'b' is referenced in the exists statement, the query plan is created much more efficiently.

-------------------------------- query plan ----------------------------------------------------------------------------------
Seq Scan on person n (cost=0.00..5026.00 rows=3 width=32)
Filter: ((properties.'name'::text = '"Andres"'::jsonb) AND (SubPlan 1))
SubPlan 1
-> Index Only Scan using acted_in_start_idx on acted_in "<0000000004>" (cost=0.15..20.24 rows=5 width=0)
Index Cond: (start = n.id)
(5 rows)

my suggestions)

The results will come out, but I think it will be necessary to review the statements that may cause performance problems.

kysmou · 2021-05-19T01:10:07Z

kysmou
May 19, 2021
Author

There is another syntax that gets the same result in age.
The same applies to agensgraph, It just has a different query plan.
--> Expressed as '()' instead of an unspecified 'b' object
or specify the object.

SELECT *
FROM cypher('test_graph', $$
MATCH (n:Person{name: 'Andres'})
WHERE exists((n)-[:acted_in]->())
RETURN n
$$) as (name agtype);

-------------------------------- query plan-------------------------------------------------------------------------------------------------------
Result (cost=5.03..9763.43 rows=1940 width=32) (actual time=0.165..0.303 rows=5 loops=1)
One-Time Filter: $0
InitPlan 1 (returns $0)
-> Nested Loop (cost=0.00..9748.70 rows=1940 width=0) (actual time=0.099..0.100 rows=1 loops=1)
Join Filter: (graphid_to_agtype(_age_default_alias_0_1.start_id) = age_id(_agtype_build_vertex(n_1.id, _label_name('17797'::oid, n_1.id), n_1.properties)))
-> Seq Scan on acted_in _age_default_alias_0_1 (cost=0.00..19.70 rows=970 width=8) (actual time=0.013..0.013 rows=1 loops=1)
-> Materialize (cost=0.00..30.00 rows=400 width=40) (actual time=0.064..0.065 rows=1 loops=1)
-> Seq Scan on "Person" n_1 (cost=0.00..28.00 rows=400 width=40) (actual time=0.062..0.062 rows=1 loops=1)
Filter: _property_constraint_check(properties, agtype_build_map('name'::text, '"Andres"'::agtype))
Rows Removed by Filter: 15
-> Nested Loop (cost=0.00..9748.70 rows=1940 width=40) (actual time=0.056..0.175 rows=5 loops=1)
Join Filter: (graphid_to_agtype(_age_default_alias_0.start_id) = age_id(_agtype_build_vertex(n.id, _label_name('17797'::oid, n.id), n.properties)))
Rows Removed by Join Filter: 13
-> Seq Scan on acted_in _age_default_alias_0 (cost=0.00..19.70 rows=970 width=8) (actual time=0.002..0.003 rows=6 loops=1)
-> Materialize (cost=0.00..30.00 rows=400 width=40) (actual time=0.007..0.009 rows=3 loops=6)
-> Seq Scan on "Person" n (cost=0.00..28.00 rows=400 width=40) (actual time=0.041..0.050 rows=3 loops=1)
Filter: _property_constraint_check(properties, agtype_build_map('name'::text, '"Andres"'::agtype))
Rows Removed by Filter: 16
Planning Time: 0.507 ms
Execution Time: 0.372 ms
(20 rows)

-> Compared to agensgraph, the query plan seems to be created inefficiently.

If you have any other opinions, please let us know.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences from agensgraph in using the exists syntax #71

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Differences from agensgraph in using the exists syntax #71

kysmou May 18, 2021

Replies: 1 comment

kysmou May 19, 2021 Author

kysmou
May 18, 2021

kysmou
May 19, 2021
Author